Wednesday, November 26, 2014

Slightly Advanced rvest with Help from htmltools + XML + pipeR

Hadley Wickham’s post “rvest: easy web scraping with R” introduces the fine new package rvest very well.  For those now yearning a slightly more advanced example with a little help from pipeR + htmltools + XML, I thought this might fill your yearn.  The code grabs css information running the fancy new site cssstats.com on my blog site.  With the background colors, it makes and labels some swatches and outputs them to the RStudio viewer--if installed--or your browser if not.

library(pipeR)
library(htmltools)
library(rvest)
library(XML)

# some slightly more advanced exercises
# using rvest, XML, and htmltools

# this one takes all the svg nodes in the section
# with id unique-background-colors from the
# site cssstats.com run on timelyportfolio.blogspot.com
# 1) removes attributes
# 2) sizes them at 85px x 64px
# 3) add new text node with fill value
# 4) combines them into a single div
# 5) with some meta information
"http://cssstats.com/stats?url=http%3A%2F%2Ftimelyportfolio.blogspot.com" %>>%
html %>>%
html_nodes( "#unique-background-colors svg" ) %>>%
xmlApply( function(x){
removeAttributes(x)
addAttributes(x,style="display:inline-block;height:85px;width:64px")
fillNode = newXMLNode(
"text"
,html_attr(html_node(x,"rect"),"fill")
,attrs=c(x=0,y=75,style="font-size:70%")
)
addChildren(x,fillNode)
saveXML(x) %>>% HTML
} ) %>>%
(tags$div(
style="display:inline-block;height:100%;width:100%"
,list(
tags$h3(
"Colors of TimelyPortfolio from "
,tags$a(href="http://cssstats.com","cssstats")
)
,.
)
)) %>>%
tagList %>>%
html_print

I just copied the div output below in Windows Live Writer (notably from JJ Allaire and Joe Cheng of RStudio).



Colors of TimelyPortfolio from cssstats

transparent #fff #ffffff #fcfcfc #eeeeee #fcf8e3 #f2dede #dff0d8 #d9edf7 #f5f5f5 #a9dba9 #f9f9f9 #d0e9c6 #ebcccc #faf2cc #c4e3f3 #e5e5e5 #0081c2 #e6e6e6 #cccccc \9 #006dcc #0044cc #003399 \9 #faa732 #f89406 #c67605 \9 #da4f49 #bd362f #942a25 \9 #5bb75b #51a351 #408140 \9 #49afcd #2f96b4 #24748c \9 #363636 #222222 #080808 \9 #0088cc #999999 #fafafa #ededed #1b1b1b #111111 #515151 #0e0e0e #040404 #000000 \9 #000000 #f7f7f7 #b94a48 #953b39 #c67605 #468847 #356635 #3a87ad #2d6987 #333333 #1a1a1a #0e90d2 #149bdf #dd514c #ee5f5b #5eb95e #62c462 #4bb1cf #5bc0de #fbb450 #ccc rgba(255, 255, 255, 0.25) #2288bb #ffff00

Monday, November 24, 2014

Secret to Making Things Appear in RStudio Viewer

I am by no means an authoritative source on this, but I think I found out the secret behind htmltools html_print that chooses the RStudio Viewer browser rather than your default browser like utils::browseURL. Here is a quick code snippet that hopefully explains what is happening. It appears you just need a temp directory with the pattern starting with viewhtml*.

library(htmltools)
library(pipeR)

# htmltools from RStudio very nicely
# makes things appear in the Viewer Window
"<h3>Hello in RStudio Viewer</h3>" %>>%
HTML %>>%
html_print


# there is a little secret to doing the same
# without htmltools
"<h3>Hello in RStudio Viewer</h3>" %>>%
HTML %>>%
# and the secret is to place in a temp directory
# with the pattern viewhtml
(
ht ~
tempfile("viewhtml") %>>%
(~ dir.create(.) ) %>>%
(~ save_html(ht, file = file.path(.,"index.html")) ) %>>%
( getOption("viewer")(file.path(.,"index.html")) )
)

# to see it in action look at the code change
# in this project
utils::browseURL("https://github.com/bryanhanson/exCon/pull/1/files")
The only reason I mention it is that I much prefer to stay in the RStudio environment instead of switching contexts to a browser.  I still firmly recommend using tags from htmltools as the best method to make all this work seamlessly and as intended by the experts at RStudio. 

Pipeline to Plot Annual % Change

image

(thanks tradeblotter for pointing out error in code in first release)

Pipes in R make my life incredibly easy, and I think my code easier to read.   Note, there are a couple different flavors of pipes (see magrittr and pipeR).  For now, I choose pipeR.

library(quantmod)
library(pipeR)
library(ggplot2)

getSymbols("^GSPC",from="1900-01-01",auto.assign=F) %>>% #get S&P 500 from Yahoo!Finance
( .[endpoints(.,"years"),4] ) %>>% #get end of year
ROC( type="discrete", n=1 ) %>>% #get one year rate of change
na.fill(0) %>>% #fill first year with 0
( #make data.frame
data.frame(
date = as.Date(format(index(.),"%Y-01-01"))
,.
)
) %>>%
structure( #hard way to do colnames()
names = c("date","change")
) %>>%
ggplot( #start our plot pipe
aes( y= change, x= date)
) %>>%
+ geom_bar( stat="identity" ) %>>% #choose bar
+ labs( title = "Annual Change of the S&P 500" ) #give plot a title

Or if we wanted to use the new pipeline syntax for the plot portion.

library(quantmod)
library(pipeR)
library(ggplot2)

getSymbols("^GSPC",from="1900-01-01",auto.assign=F) %>>% #get S&P 500 from Yahoo!Finance
( .[endpoints(.,"years"),4] ) %>>% #get end of year
ROC( type="discrete", n=1 ) %>>% #get one year rate of change
na.fill(0) %>>% #fill first year with 0
( #make data.frame
data.frame(
date = as.Date(format(index(.),"%Y-01-01"))
,.
)
) %>>%
structure( #hard way to do colnames()
names = c("date","change")
) %>>%
(
pipeline({
ggplot(., aes( y= change, x= date) )
+ geom_bar( stat="identity" )
+ labs( title = "Annual Change of the S&P 500 (source: Yahoo! Finance)" )
})
)

Wednesday, November 5, 2014

Update on JGBs versus USTs

Given the recent selloff in the Yen, I thought now would be a good time to update my favorite chart from Intended or Unintended Consequences.

image

For a true currency death spiral, rates need to move up rather than down.  It appears we are long way from that.

Long-time readers will know that I have been keenly interested in the Yen for the entire history of this blog http://timelyportfolio.blogspot.com/search?q=yen.

Code for this post: https://gist.github.com/timelyportfolio/5665790

Wednesday, October 22, 2014

Postive Feedback in R with a Little Javascript

Let’s face it, sometimes the struggle in R can become frustrating, depressing, daunting, or just monotonous.  For those moments when you need a little positive feedback, some encouragement, or a pat on the back, I thought this might help.  Maybe I should make this into a package.

I found this from Sweet Alert for Bootstrap forked from Tristan Edwards non-bootstrap SweetAlert.  This builds on the technique used in my previous post SVG + a little extra (d3.js) in RStudio Browser | No Pipes This Time.

 

positive_feedback_sweetalert_r

# give yourself some positive feedback in R
# as you toil away on some difficult, but worthwhile task
# uses javascript sweet-alert https://github.com/t4t5/sweetalert

library(htmltools)
library(pipeR)

tagList(
tags$script(
'
document.addEventListener("DOMContentLoaded", function(event) {
swal("Good job! Brilliant!", "You\'re doing worthwhile things.", "success")
});
'

)
) %>>%
attachDependencies(
htmlDependency(
name="sweet-alert"
,version="0.2.1"
,src=c("href"=
"http://timelyportfolio.github.io/sweetalert/lib"
)
,script = "sweet-alert.min.js"
,style = "sweet-alert.css"
)
) %>>%
html_print

Friday, October 10, 2014

SVG + Javascript Ekholm Decomposition in RStudio Browser

Our topics this week seem unrelated, but in an effort to bridge the two

another random project – make website in R for these SVGs of Portland Vector Bridges
result: Portland Bridges in SVG
code: R to make simple site

Ekholm decomposition

SelectionShare & TimingShare | Masterfully Written by Delightfully Responsive Author
Popular Mutual Funds Decomposed With Ekholm (2014)

Responsive SVG in the browser

Responsive SVG in Your RStudio Browser
SVG + a little extra (d3.js) in RStudio Browser | No Pipes This Time

let’s build a website in R with htmltools to calculate the Ekholm decomposition in Javascript using this nifty simple-statistics.js from the brilliant Tom Macwright.  The result will not be beautiful and I’ll leave out a fancy interactive chart, but that is intentional to reduce the amount of code and dependencies.

r_ekholm_js

I wonder what I’ll get into next week.

Github Repo

library(htmltools)
library(pipeR)
library(jsonlite)
library(Quandl)
library(xts)

# use Quandl Kenneth French Fama/French factors
# http://www.quandl.com/KFRENCH/FACTORS_D
#f <- Quandl("KFRENCH/FACTORS_D",type = "xts", start_date="2010-12-31") / 100

tagList(
#pull in the bridge to span all the week's topics
#Portland Vector Bridges http://timelyportfolio.github.io/portland_vector_bridges
tags$div( style = "height:15%;width:100%"
,readLines(
"http://timelyportfolio.github.io/portland_vector_bridges/Burnside Bridge.svg"
) %>>% HTML
)
,tags$h1( "Sparsest Test in Javascript of Ekholm")
, tags$div( style = "width:100%"
,tags$div( style = "background-color:red;"
,"Note: Date range currently limited to one year, but there is a fairly easy workaround
for the next version."
)
,tags$div(
style = "display:inline-block; width: 25%;float:left;"
,"Mutual Fund Symbol", tags$input( id = "mfsymbol" )
,tags$br()
,"Start Date "
, tags$span( style="font-size:75%;fill:lightgray", "(2013-08-29)" )
, tags$input( type = "date", id = "stdate" )
,tags$br()
,"End Date"
,tags$span( style="font-size:75%;fill:lightgray", "(2014-08-29)" )
, tags$input( type = "date", id= "enddate" )
,tags$br()
,tags$input(
type="submit", id = "calc", value = "Calculate"
)
,tags$br()
)
, tags$div(style = "display:inline-block;height:100%;width:60%;margin-left:30px"
, tags$textarea(id = "results", style="width:100%; height:150px")
)
)
,tags$script(sprintf(
'

var french = %s;
'
, toJSON(data.frame("Date"=index(f),f)) %>>% HTML
))
,tags$script(
'

function calculateEkholm( data ) { // data in form of x,y or fund-rf, mkt-rf
/* get an error with regression.js
var myReg = regression(
"linear",
data
)
*/

// so use the great simple-statistics library
var myReg = ss.linear_regression().data(data);

//get residuals
var resid = data.map(function(p){return myReg.line()(p[0]) - p[1]});

//regress residuals^2 on (mkt-rf)^2
var myReg2 = ss.linear_regression().data(
data.map(function(d,i){
return [ Math.pow(d[0],2), Math.pow(resid[i],2) ]
})
)
//coefficients ^ 1/2 will give us ActiveAlpha and ActiveBeta
var activeAlpha = Math.pow( myReg2.b(), 0.5 );
var activeBeta = Math.pow( myReg2.m(), 0.5 );

//now do the next step to get ActiveShare and SelectionShare
var selectionShare = Math.pow(activeAlpha, 2 ) / ( ss.variance(data.map(function(d){return d[1]})) * (data.length - 1) / data.length )
var timingShare = Math.pow(activeBeta, 2 ) * ss.mean( data.map(function(d){return Math.pow(d[0],2)}) ) / ( ss.variance(data.map(function(d){return d[1]})) * (data.length - 1) / data.length )

//pass correlation result also
var correlation = ss.sample_correlation(data.map(function(d){return d[0]}),data.map(function(d){return d[1]}));

return {
regression: myReg,
correlation: correlation,
activeAlpha: activeAlpha,
activeBeta: activeBeta,
selectionShare: selectionShare,
timingShare: timingShare
}
}


// thanks https://gist.github.com/fincluster/6145995
function getStock(opts, type, complete) {
var defs = {
desc: false,
baseURL: "http://query.yahooapis.com/v1/public/yql?q=",
query: {
quotes: \'select * from yahoo.finance.quotes where symbol = \"{stock}\" | sort(field=\"{sortBy}\", descending=\"{desc}\")\',
historicaldata: \'select * from yahoo.finance.historicaldata where symbol = \"{stock}\" and startDate = \"{startDate}\" and endDate = \"{endDate}\"\'
},
suffixURL: {
quotes: "&env=store://datatables.org/alltableswithkeys&format=json&callback=?",
historicaldata: "&env=store://datatables.org/alltableswithkeys&format=json"
}
};

opts = opts || {};

if (!opts.stock) {
complete("No stock defined");
return;
}

var query = defs.query[type]
.replace("{stock}", opts.stock)
.replace("{sortBy}", defs.sortBy)
.replace("{desc}", defs.desc)
.replace("{startDate}", opts.startDate)
.replace("{endDate}", opts.endDate)

var url = defs.baseURL + query + (defs.suffixURL[type] || "");

return url;
}


d3.select("#calc").on("click",function(){
calculateFund(
d3.select("#mfsymbol")[0][0].value,
d3.select("#stdate")[0][0].value,
d3.select("#enddate")[0][0].value
)
})

function calculateFund( symbol, startdate, enddate ) {

d3.json(getStock({stock:symbol.toUpperCase(),startDate:startdate,endDate:enddate},"historicaldata"), function(e1,fund){


if( e1 || !fund.query.results ) {
updateResults ( {e1:e1, e2:e2, queryresults: "query problems"} );
} else {
var fund_factor = [];

//manipulate data to join fund with factors
//would be nice to have a xts merge in javascript


// query.results.quote will have the data stripped of meta
// also we will sort date ascending
fund = fund.query.results.quote
.sort(function(a,b){
return d3.ascending(
d3.time.format("%Y-%m-%d").parse(a.Date),
d3.time.format("%Y-%m-%d").parse(b.Date)
)
} );




// now lets go period by period with fund.map
fund.map( function(per, i){
if( i > 0 ) {
var frenchThisPer = french.filter(function(d){return d.Date == per.Date})[0];
fund_factor.push([
//Date: per.Date,
//FundPrice:
per.Adj_Close / fund[ i - 1 ].Adj_Close - 1 - frenchThisPer["RF"],
//Rm_Rf:
+frenchThisPer["Mkt.RF"],
//Rf: +frenchThisPer["RF"]/100
])
}
})

updateResults( calculateEkholm( fund_factor ) );
}

})
}

function updateResults( ekholmCalc ){
var ekhArr = [];
Object.keys(ekholmCalc).map(function(k){
ekhArr.push( [ k,": ", ekholmCalc[k] ].join("") )
})
d3.select("#results").text(ekhArr.join("\\n"))
}
'
%>>% HTML )
) %>>%
attachDependencies(
list(
htmlDependency(
name="d3"
,version="3.4"
,src=c("href"="http://d3js.org/")
,script="d3.v3.min.js"
)
,htmlDependency(
name="simple_statistics"
,version="0.1"
,src=c("href"=
"http://timelyportfolio.github.io/rCharts_factor_analytics/js"
)
,script = "simple_statistics.js"
)
)
) %>>% html_print

Thursday, October 9, 2014

SVG + a little extra (d3.js) in RStudio Browser | No Pipes This Time

I’m guessing here, but yesterday’s post Responsive SVG in Your RStudio Browser might have inspired some “but,…)”s, “yes plus I need”s, “what the %>>% with the pipe”s, etc.  I’ll attempt to address a couple of these in this quick post.

First, if you don’t like pipes, here is the non-piped version of the code.  I also made one change, which assumes that you want the SVG to fill the <div> container.  This is helpful if you think you will only have one plot and nothing else in your HTML.

library(SVGAnnotation)
library(htmltools)

respXML <- function( svg_xml, height = NULL, width = "100%", print = T, ... ){
# svg_xml should be an XML document
library(htmltools)
library(XML)

svg <- structure(
ifelse(
length(getDefaultNamespace(svg_xml)) > 0
,getNodeSet(svg_xml,"//x:svg", "x")
,getNodeSet(svg_xml,"//svg")
)
,class="XMLNodeSet"
)

xmlApply(
svg
,function(s){
a = xmlAttrs(s)
removeAttributes(s)
xmlAttrs(s) <- a[-(1:2)]
xmlAttrs(s) <- c(
style = paste0(
"height:100%;width:100%;"
)
)
}
)

svg <- HTML( saveXML( svg_xml) )

svg <- tags$div(
style = paste(
sprintf('width:%s;',width)
,ifelse(!is.null(height),sprintf('height:%s;',height),"")
)
,svg
)

if(print) html_print(svg)

return( invisible( svg ) )
}



Second, I like it but I’m helpless without my Javascript helper libraries, such as d3.js, Snap.svg, RaphaĆ«l, etc.  htmltools makes it fairly easy to attach dependencies.  Let’s add d3.js in this example.


Third, I want to use my helper library to add some script to make something awesome.  I can’t help with the awesome part, but I can show you how to add a little bit of code.  This time we’ll take the simple pan/zoom code from ggplot2 meet d3, and here is the result.  Please understand that this is only 4 lines of Javascript, so the pan/zoom is not nearly as refined as I would expect.


R_svg_d3js



# make our plot here
# since we will need to manipulate to add a g container
# for smoother d3 pan/zoom
sP = respXML(
svgPlot(
dotchart(
t(VADeaths)
, xlim = c(0,100)
, main = "Death Rates in Virginia - 1940"
)
)
, height = "100%"
, print = F
)

# parse the plot html with rvest
sP = html(as.character(sP))
# add a g node to contain the plot
# for smoother d3 pan / zoom
g = newXMLNode("g")
# add the old g to our new g container
addChildren(g, html_nodes(sP,"svg > g"))
# add our new g container to our svg
addChildren(html_nodes(sP,"svg")[[1]],g)

html_print(attachDependencies(
tagList(
# get the div with our modified svg
HTML(saveXML(html_nodes(sP,"div")[[1]]))
, tags$script(
HTML(
'
var g = d3.select("svg > g");
var zoom = d3.behavior.zoom().scaleExtent([1, 8]).on("zoom", zoomed)
g.call(zoom)

function zoomed() {
g.select("g")
.attr(
"transform",
"translate(" + d3.event.translate + ")scale(" + d3.event.scale + ")"
);
}
'

)
)
)
,htmlDependency(
name="d3"
,version="3.0"
,src=c("href"="http://d3js.org/")
,script="d3.v3.js"
)
))

Wednesday, October 8, 2014

Responsive SVG in Your RStudio Browser

For those readers who are unaware, SVG is absolutely amazing, and if you need some convincing see this 2009 paper/talk from David Dailey Why is SVG Going to Be REALLY BIG?  Most R users should be very well acquainted with graphics and plots magically appearing on the screen with certain commands.  These graphics though are rasters, so when you resize, the graphics are re-rendered to scale.  Let’s have a look with a simple plot.

plot(x=1:10,y=1:10,type="b")



R_raster_resize


One of the beauties of SVG is that it will scale without re-rendering.  The old way to create SVG in R was to do something like this which produces an svg file that we can use, adjust, and share.

svg("svgplot.svg")
plot(x=1:10,y=1:10,type="b")
dev.off()

However, the integrated browser window in RStudio combined with the HTML helper tools  from RStudio lets us produce and see SVGs in real-time.  Let’s look again at our simple plot, but this time as an SVG in our RStudio browser window.  We will also use the precocious packages SVGAnnotation and XML from Duncan Temple Lang.


R_svg_1


 


But where is the magic resizability ?


 


This is where we will use some help from



Sara Soueidan - Understanding SVG Coordinate Systems & Transformations (Part 1) – The viewport, viewBox, & preserveAspectRatio



Dudley Storey - Make SVG Responsive


We can make a simple function to help us change the attributes and style to get a fancy responsive SVG real-time.

#even better make it responsive
#use this post as a guide
#http://demosthenes.info/blog/744/Make-SVG-Responsive
respXML <- function( svg_xml, height = NULL, width = "100%", print = T, ... ){
# svg_xml should be an XML document
library(htmltools)
library(pipeR)
library(XML)

tags$div(
style = paste(
sprintf('width:%s;',width)
,ifelse(!is.null(height),sprintf('height:%s;',height),"")
,"display: inline-block;"
,"position: relative;"
,"padding-bottom: 100%;"
,"vertical-align: middle;"
,"overflow: hidden;"
)
, ...
,svg_xml %>>%
(~svg ~
structure(ifelse(
length(getDefaultNamespace(svg)) > 0
,getNodeSet(svg,"//x:svg", "x")
,getNodeSet(svg,"//svg")
),class="XMLNodeSet") %>>%
xmlApply(
function(s){
a = xmlAttrs(s)
removeAttributes(s)
xmlAttrs(s) <- a[-(1:2)]
xmlAttrs(s) <- c(
style = paste0(
#"height:100%;width:100%;"
"display: inline-block;"
#post says use these but will not fit viewer
#,"position: absolute;"
#,"top: 0;"
#,"left: 0;"
)
#,preserveAspectRatio="xMidYMid meet"
)
}
)
) %>>%
saveXML %>>%
HTML
) %>>%
( ~ if(print) html_print(.) ) %>>%
( return( invisible( . ) ) )

Let’s test our fancy new function.





R_svg_2


 


That’s more like it.  Let’s abandon the animated GIFs and embed a SVG below (copied/pasted straight from R into this post).  Resize your browser and test the result.


Actually, it appears I lied. The SVG does not resize like it would outside the Blogger container.  See http://bl.ocks.org/timelyportfolio/raw/560e50e437d4bb1b9142/ for the SVG in a standalone document for resizing.

# example using dotchart documentation
# in R graphics package
# ?graphics::dotchart
svgPlot(
dotchart(
t(VADeaths)
, xlim = c(0,100)
, main = "Death Rates in Virginia - 1940"
)
) %>>%
respXML