SpiderOak Generosity (Take 2)

I’ve been beating the drum for SpiderOak and its generous GB give-aways in the past, and I’m doing so now.

For those not in the know, SpiderOak is a decent alternative to Dropbox. It provides several useful features (some not present in Dropbox):

  • SpiderOak Hive: A Dropbox-like folder
  • Historical versions: All historical versions of a file are available indefinitely. (With a free Dropbox account, historic versions are available for 30 days only.)
  • Strong security model: SO encrypts the files on your computer and doesn’t know your password. Thus it stores on its servers only long strings of encrypted bits that it cannot decrypt (nor the Government nor the NSA). The downside is that if you lose your password, you lose your data. (This is unlike Dropbox, which sends the data via an encrypted connection but stores the files in plain form on its servers.)

SpiderOak now organizes another promotional event in which it generously gives away GBs of free storage. It is called SpiderOak University. It features a series of 10 quizzes, and for each completed quiz you get 1GB in additional free storage. If you answer correctly all the questions in a quiz, then you’re in for 2GB of free additional storage. If you do the math, you could get from 10 to 20 GB just by spending a couple of minutes filling out the quizzes. If you want to get all the quizzes right, then you should follow the video classes available on their website. I don’t know if the SpiderOak University has a time-frame limit.

Additionally, you may get an additional 1GB (and offer me another 1GB in the process) if you created your SO account using my referral link. This is part of the traditional SpiderOak Refer-A-Friend program.


Installing the Optima font for use with LyX

Hermann Zapf is a legendary typeface designer. And one of his best-known works, Palatino, remains a very popular serif font throughout the world of LaTeX (and LyX). In LyX 2.1 (forthcoming) one can easily use Palatino in a document, either by selecting ‘Palatino’ which will usually use a free clone called URW Palladio or by selecting TeX Gyre Pagella, another clone based on URW Palladio but with a wider selection of characters.

The Optima typeface is a lesser-known font by Zapf. To my non-expert eyes it closely resembles the Palatino shapes, although it doesn’t have serifs. Hence I tend to think of this sans serif font as a “Palatino Sans”, and I often use it as a matching sans font for Palatino. I also find Optima useful for Beamer presentations.

In TeX Live one needs to install Optima manually, because of some license issues (the clone is not free as in beer). But TL provides a very easy installation procedure, especially on Linux (not sure about other platforms). As root:

getnonfreefonts-sys -h
getnonfreefonts-sys -l
getnonfreefonts-sys classico

You will actually be installing URW Classico, an Optima clone. Now you can Tools > Reconfigure in LyX, and after restarting in Document > Settings > Fonts > Sans you should have the 'URW Classico (Optima)' entry available.

Troubleshooting

If you encounter the "!pdfTeX error: pdflatex (file uopr8r): Font uopr8r at 720 not found" error message, chances are that something fishy happened while installing the font. As suggested in this forum thread for Ubuntu users, to fix this you can (as root):

updmap-sys --enable Map=uop.map

If the above doesn’t help, then create a new file /etc/texmf/updmap.d/10local.cfg with

Map uop.map

as contents. Then (as root):

update-updmap
updmap-sys

Update (05/10/2013): Include troubleshooting section.


Using Geany for programming in R

I like Geany as a no-nonsense Integrated Development Environment (IDE). It is fast, elegant, intuitive, and lets you get your programming job done. (I certainly find it superior to the more popular Gedit.) You can also use it to program in R, and this page will show off some tips for doing that.

Execute commands in an R session

To send R commands from the editor to the integrated Virtual Terminal Emulator (VTE), you need to download Geany >= 0.19. Then you can set send_selection_unsafe=true in geany.conf and assign “Send selection to terminal” to a ctrl+r or ctrl+enter keybinding (or similar) via Edit > Prefs > Keybindings. As long as your Geany installation has support for the embedded VTE (and to my knowledge it is currently NOT supported on Windows), you’re good to go. Start R in the terminal, write some R code in Geany and send the line or selection to the terminal by using the assigned keybinding.

Be careful, though. The hidden option send_selection_unsafe is called that way and is disabled by default for good reason. If set to true, it does not strip trailing newline characters and even add one if not already present. When no R session is running, you are prone to send stupid commands (e.g. rm -rf your_preferred_folder) for execution to the console. Again, be careful with that.

Improving the R parser in Geany

The R lexer should consider the "." (dot) as part of an object name (as it does for the "_" underscore). By default it doesn’t. To change that, you can proceed as follows (assuming Linux, but it should be very similar on other platforms).

Copy /usr/share/geany/filetypes.r to ~/.config/geany/filedefs/. Then in ~/.config/geany/filedefs/filetypes.r uncomment the wordchars element and add a "." (dot) in that list. Your config file should thus contain the following line:
wordchars=_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

Now when you double-click or ctrl+left/right on an object name (say, make.names() or my_uber.cool_fun()), Geany should select or skip the entire word.

UPDATE: You need Geany >=1.23 for this to work. Otherwise, check the Comments section.

Combining Geany with RStudio

As good as an IDE as Geany is, it is not so well suited for working with R. It comes with no integrated graphics device, help system, or object browser. So Geany is a good choice if you’re OK with working in a bare bones terminal. In this sense RStudio is superior for actually performing statistical analyses in R, as it has many R-specific functionalities that make analyzing data that much easier. Yet again, RStudio is not a full-fledged IDE and lacks certain functionality for heavy code uplifting.

So one thing that you can do is to use both IDEs at the same time on a given file. My workflow is as follows:
– I open a_given_file.R in both Geany and RStudio and regularly save any modifications
– When I analyze data in RStudio, save a_given_file.R and switch to Geany, you will usually be greeted with a “The file ‘untitled.R’ on the disk is more recent than the current buffer. Do you want to reload it?” pop-up dialogue. Say Reload.
– When I do some heavy code uplifting in Geany, save a_given_file.R and switch to RStudio, you will usually automatically get the latest version of the file reloaded from disk.

Be careful: this approach works if you get used to it and try to avoid careless errors that could result in data loss. Otherwise, always save and close the file before switching to the other IDE.

Feedback

If readers have other useful tips on using Geany for R programming, please leave your thoughts in the Comments section.


SpiderOak generosity: Making an offer you can’t refuse

I must admit that this is an impressive PR move by SpiderOak, a file syncing service. Here’s the gist of it:
“We provide a FREE 2 GB account to all our users who sign up. Starting TODAY until the end of the week, we would like to offer all our users an additional 4 GBs of storage!”

So, if you have no SpiderOak account and sign up between 03 Jul up to 7 Jul 2012 and additionally follow the easy steps in the announcement post, then you get a whopping 6GB free storage space. If you are an existing user, follow the steps and get an additional 4GB of storage. Take that Dropbox!

Additionally, you may—I assume—get an additional 1GB (and offer me another 1GB in the process, as if my current 7GB weren’t enough) if you created your account using my referral link. This is part of the traditional SpiderOak Refer-A-Friend program.

All in all very generous. For those not sure what SpiderOak represents: This is yet another file syncing service similar to Dropbox, slightly less simple to use but still very usable. And while all your data are easily accessible to Dropbox employees (or potentially to other 3rd parties), SpiderOak takes a hardcore approach on privacy: They cannot decrypt your data and don’t know your password. For a discussion on the differences between the two approaches, check this post by Babbage at The Economist.

Happy file syncing!


Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R

Ever wondered how to estimate Fama-MacBeth or cluster-robust standard errors in R? It can actually be very easy.

First, for some background information read Kevin Goulding’s blog post, Mitchell Petersen’s programming advice, Mahmood Arai’s paper/note and code (there is an earlier version of the code with some more comments in it). For more formal references you may want to look into Thompson (2011, JFE) and Petersen (2008, WP). Both papers focus on estimating robust SE using Stata.

After extensively discussing this with Giovanni Millo, co-author of 'plm', it turns out that released R packages ('plm', 'lmtest', 'sandwich') can readily estimate clustered SEs. The results are not exactly the same as the Stata output, since in 'plm' the options 'HC0' through 'HC4' for 'vcovHC()' do not use the exact same weighting (by a function of sample size) that Stata uses for small-sample correction. But the results are sensibly similar when using 'HC1'.

It should be easy to (almost exactly) replicate M. Petersen’s benchmark results using the following code.

Import M. Petersen’s test data.

require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")

Estimate linear model using OLS. The second call estimates the Fama-MacBeth regression.

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))
fpmg <- pmg(y~x, test, index=c("year","firmid")) ##Fama-MacBeth

Define a function that would estimate robust SE with double-clustering.

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
    vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) - 
        vcovHC(x, method="white1", ...)
}

Estimate OLS standard errors, White standard errors, standard errors clustered by group, by time, and by group and time. Compare the R output with M. Petersen’s benchmark results from Stata.

> ##OLS, White and clustering
> coeftest(fpm)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028359  1.0466   0.2954    
x           1.034833   0.028583 36.2041   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, method="white1", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028361  1.0465   0.2954    
x           1.034833   0.028395 36.4440   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As Giovanni interestingly pointed out to me (in a privately circulated draft paper), it seems that the Fama-MacBeth estimator is nothing more than what econometricians call the Mean Groups estimator, and 'plm' can readily estimate this. You only need to swap the 'group' and 'time' indices. (See pmg() call above.)

> ##Fama-MacBeth
> coeftest(fpmg)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.031278   0.023356  1.3392   0.1806    
x           1.035586   0.033342 31.0599   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As a last remark, it may be a good idea to introduce a type='HC5', implementing the exact Stata small-sample correction procedure, to allow users to benchmark R output against Stata results.


Let us bow to social pressure

In the world of Facebook, Twitter and LinkedIn, it is almost rude to not have a personal blog. So here I go, bowing to social pressure and writing my first blog post.

In this blog I intend to post various tips and miscellaneous tricks regarding Linux, Xfce, R and LyX. I will also add random thoughts and rants of the day concerning my day-to-day experiences.