Tuesday, May 27, 2014

An R detour to where I started: trouble with versions and Rtools


(I did not expect the first post to be about technical detours, but I figure this experience was worth documenting to R users who need to deal with different versions. If you like to skip details about the detour, just scroll to the button for a constructive summary.)

The detour
One annoying part of doing quantitative analysis is that one can waste a lot of time on trivial technical issues just to end up where you were. What often makes it worse is when one fails to spot the real culprit at the first place, and goes on to solve the wrong problems.

Much of my Memorial Day was spent on dealing with some unexpected trouble with regard to the lmer models in R. I have been analyzing a couple dozen different lmer models over the last few weeks. I would love to do these mixed models in Stata, but my models have two crossed random-effects each with thousands of groups. For tasks like this, lmer is much more practical than xtmixed's computationally demanding ||_all: R.factor workaround.

So all the trouble started after I installed the lmerTest package the night before. I had hoped it would be useful in estimating the standard errors for the random effect parameters, which xtmixed provides but lmer does not (warning: lmerTest did not help). Then all of sudden, my lmer codes that had previous worked began returning error messages that say something about the CONTROL options. At this moment, I didn't think about why this happened (which I should have) and simply dropped the CONTROL options from my lmer models. This adjustment allowed the models to proceed again, but the output looked different from before. For example, the fixed effect point estimates were now returned in a single row rather than multiple rows, and no standard errors were provided.

This was the second warning sign telling me to figure out the problem. But instead of slowing down and analyzing why the output was different, I dived right in the problem and spent two hours figuring how to extract the fixed effects and random effects by writing additional functions. The fumbling was certainly helpful in improving my R knowledge--I got to learn more about S4 objects and figured out how to calculate standard errors from the matrices. Still, these results took quite a bit of work and were still poorly formatted. It's at this moment that I started to suspect my lmer function might have been altered . After some searching and trial-and-error, including updating everything to the newest version of R just to figure out that was not the problem, I concluded that the lmerTest package updated my lmer to a newer, and substantially different, version.

So R users do speak about old and new lme4. There is apparently an overhaul of package after version 1.0 that is still underdevelopment. Someone even created a new package called lme4.0 to compare results from the old and new versions, which can yield different estimates.

Four hours into this problem, I finally reached the correct diagnosis. But it turned out that this was only the first step towards setting things right. My original lmer commands apply to an older lme4 package. Older lme4s can only be found as source files in .tar format. I could not install them straight up because Win 7 is not equipped with the compiler (another good reason to use Mac or Linux). Fortunately, Duncan Murdoch has created Rtools to help Windows users convert source files.

Building R packages from the source code requires at least the functionality of Rtools' own commands, MinGW (a tool to compile codes in windows; some come along with Rtools), the Inno Setup installer (installer for Windows programs; separate from Rtools), and LaTex (to create documentation). More about the Windows toolset

So I installed Rtools and Inno Setup (I already had LaTeX). I then downloaded an old source file for lme4. Everything seemed pretty straightfoward. I typed install.packages("C:/Work/lme4_0.999375-42.tar.gz", repos = NULL, type="source") in my R console, expecting Rtools to do its magic. But an error message told me R could not run "make.exe" and "sh.exe". Both commands are under \Rtool\bin.

So I went back to the Rtools instruction and focused on the system PATH settings. Basically, Rtools requires specifying the paths for Rtools, MinGW, and other related programs in Windows' environment variable settings. This page (http://www.java.com/en/download/help/path.xml) explains the standard way to access the PATH variables, but I found it easier to use "path" in the terminal to ensure the ordering of paths is as required by Rtools (basically the path must start with Rtools, immediately followed by MinGW).

This is the quote from Rtool.txt. Note there is no space in between the paths and the semi-colons.
"Finally, the Rtools installer will optionally edit your PATH variable as follows:
PATH=c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;<others>...
(where you will substitute appropriate directories for the ones listed above, but please keep the path in the same order as shown. LaTeX and R itself should be installed among the "others".)"

Nothing illustrate my point about trivial details better than the last fix. I thought I had the PATH variables set up properly like this: "C:\Rtools\bin ; C:\Rtools\gcc-4.6.3\bin ;...<other paths>". But R still told me it could not run "gcc.exe", which was under "\Rtools\gcc-4.6.3\bin". Out of desperation, I decided to give it a try and remove all spaces between the paths and the semi-colons. Turns out those spaces (or maybe one of them) were the last thing preventing Rtools from finding that second path MinGW.

So that's it. After an entire day of work, I got R back to the state before lmerTest messed up everything. I missed two warning signs and wasted a couple of hours taking a long detour. Even as I finally figured out what the problem was, a tiny additional space stil got in the way. That's all in a good day's work.

--
Summary
For those with problems related to newer packages, I organized some of the things that I found helpful. I am using lme4 as an example, so substitute it with the name of your own package.

To install and uninstall packages, use
install.packages("lme4")
remove.packages(lme4,libpath)

To load or unload packages, use
require(lme4)
library(lme4)
detach("package:lme4")

To check installed packages and their versions, use
installed.packages()

alternatively, the loaded packages can be verified with
sessionInfo()

To check if a specific package is installed, use,
is.element("lme4", installed.packages()[,1])


Rtools (http://cran.r-project.org/bin/windows/Rtools/)

  1. Check the reference table in the page to make you choose the correct version of Rtools that corresponds with your R
  2. Install Rtools
  3. Install Inno Setup (http://www.jrsoftware.org/isdl.php) and MiKTex (http://miktex.org/) if you haven't.
  4. Check if your PATH variables in your Windows OS is set up correctly. There are different ways to achieve this, but I recommend first using the terminal window (hit Window key+R -> type "cmd" and hit return -> type "path" and hit return). If your Rtools did not alter the PATH to something like "c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;<other paths>" (it has to be in that exact order), then edit the environment variables to make them so. And a visual guide (http://www.computerhope.com/issues/ch000549.htm)on how to edit the path.
  5. Download the source file for a package. 
  6. Open your R and type in the command to install the package. For example, I entered: install.packages("C:/Work/lme4_0.999375-42.tar.gz", repos = NULL, type="source")
  7. The package is properly installed if no error is returned. If you like to check, use "installed.package()" to see if it is there.