Tag Archives: R

Using Geany for programming in R

I like Geany as a no-nonsense Integrated Development Environment (IDE). It is fast, elegant, intuitive, and lets you get your programming job done. (I certainly find it superior to the more popular Gedit.) You can also use it to program in R, and this page will show off some tips for doing that.

Execute commands in an R session

To send R commands from the editor to the integrated Virtual Terminal Emulator (VTE), you need to download Geany >= 0.19. Then you can set send_selection_unsafe=true in geany.conf and assign “Send selection to terminal” to a ctrl+r or ctrl+enter keybinding (or similar) via Edit > Prefs > Keybindings. As long as your Geany installation has support for the embedded VTE (and to my knowledge it is currently NOT supported on Windows), you’re good to go. Start R in the terminal, write some R code in Geany and send the line or selection to the terminal by using the assigned keybinding.

Be careful, though. The hidden option send_selection_unsafe is called that way and is disabled by default for good reason. If set to true, it does not strip trailing newline characters and even add one if not already present. When no R session is running, you are prone to send stupid commands (e.g. rm -rf your_preferred_folder) for execution to the console. Again, be careful with that.

Improving the R parser in Geany

The R lexer should consider the "." (dot) as part of an object name (as it does for the "_" underscore). By default it doesn’t. To change that, you can proceed as follows (assuming Linux, but it should be very similar on other platforms).

Copy /usr/share/geany/filetypes.r to ~/.config/geany/filedefs/. Then in ~/.config/geany/filedefs/filetypes.r uncomment the wordchars element and add a "." (dot) in that list. Your config file should thus contain the following line:
wordchars=_.abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

Now when you double-click or ctrl+left/right on an object name (say, make.names() or my_uber.cool_fun()), Geany should select or skip the entire word.

UPDATE: You need Geany >=1.23 for this to work. Otherwise, check the Comments section.

Combining Geany with RStudio

As good as an IDE as Geany is, it is not so well suited for working with R. It comes with no integrated graphics device, help system, or object browser. So Geany is a good choice if you’re OK with working in a bare bones terminal. In this sense RStudio is superior for actually performing statistical analyses in R, as it has many R-specific functionalities that make analyzing data that much easier. Yet again, RStudio is not a full-fledged IDE and lacks certain functionality for heavy code uplifting.

So one thing that you can do is to use both IDEs at the same time on a given file. My workflow is as follows:
– I open a_given_file.R in both Geany and RStudio and regularly save any modifications
– When I analyze data in RStudio, save a_given_file.R and switch to Geany, you will usually be greeted with a “The file ‘untitled.R’ on the disk is more recent than the current buffer. Do you want to reload it?” pop-up dialogue. Say Reload.
– When I do some heavy code uplifting in Geany, save a_given_file.R and switch to RStudio, you will usually automatically get the latest version of the file reloaded from disk.

Be careful: this approach works if you get used to it and try to avoid careless errors that could result in data loss. Otherwise, always save and close the file before switching to the other IDE.

Feedback

If readers have other useful tips on using Geany for R programming, please leave your thoughts in the Comments section.


Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R

Ever wondered how to estimate Fama-MacBeth or cluster-robust standard errors in R? It can actually be very easy.

First, for some background information read Kevin Goulding’s blog post, Mitchell Petersen’s programming advice, Mahmood Arai’s paper/note and code (there is an earlier version of the code with some more comments in it). For more formal references you may want to look into Thompson (2011, JFE) and Petersen (2008, WP). Both papers focus on estimating robust SE using Stata.

After extensively discussing this with Giovanni Millo, co-author of 'plm', it turns out that released R packages ('plm', 'lmtest', 'sandwich') can readily estimate clustered SEs. The results are not exactly the same as the Stata output, since in 'plm' the options 'HC0' through 'HC4' for 'vcovHC()' do not use the exact same weighting (by a function of sample size) that Stata uses for small-sample correction. But the results are sensibly similar when using 'HC1'.

It should be easy to (almost exactly) replicate M. Petersen’s benchmark results using the following code.

Import M. Petersen’s test data.

require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")

Estimate linear model using OLS. The second call estimates the Fama-MacBeth regression.

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))
fpmg <- pmg(y~x, test, index=c("year","firmid")) ##Fama-MacBeth

Define a function that would estimate robust SE with double-clustering.

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
    vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) - 
        vcovHC(x, method="white1", ...)
}

Estimate OLS standard errors, White standard errors, standard errors clustered by group, by time, and by group and time. Compare the R output with M. Petersen’s benchmark results from Stata.

> ##OLS, White and clustering
> coeftest(fpm)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028359  1.0466   0.2954    
x           1.034833   0.028583 36.2041   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, method="white1", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028361  1.0465   0.2954    
x           1.034833   0.028395 36.4440   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As Giovanni interestingly pointed out to me (in a privately circulated draft paper), it seems that the Fama-MacBeth estimator is nothing more than what econometricians call the Mean Groups estimator, and 'plm' can readily estimate this. You only need to swap the 'group' and 'time' indices. (See pmg() call above.)

> ##Fama-MacBeth
> coeftest(fpmg)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.031278   0.023356  1.3392   0.1806    
x           1.035586   0.033342 31.0599   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As a last remark, it may be a good idea to introduce a type='HC5', implementing the exact Stata small-sample correction procedure, to allow users to benchmark R output against Stata results.