Monday, July 21, 2014

Cool graph on global migration flows and more...

Just a few links, mostly on migration:

  • Global migration in two decades: This study has been out a while, but if you have not seen this nice graphic presentation on global migration flows by a group of Austrian geographers/demographers, here is the link. There is so much to talk about in this graph that it deserves its own post. It did first strike me how little within-migration East Asia and South Asia have. Then I quickly realize that the massive rural-to-urban migration in China and India is probably not counted in the data as they are domestic. Apparently there's always limitation, even to some of the most impressive data.
  • Ecuador's "free" border: A report by the Atlantic Magazine on the new immigration policy in Ecuador. A bold (politically calculative?) initiative to not require visas has interesting implications. What kind of migrants does this new policy draw? Asylum seekers, human trafficking, and drug cartels?
  • Central American migrants: I read Joe Klein's well-written op-ed on the recent influx of Central American migrants in the Time Magazine over the weekend. It's a call for bold political leadership, which I am not sure if it is coming in this case. On the same subject, New York Times has a special report. I love these how these works explore the complexity behind this issue.
  • School segregation: on the educational front, PBS recent aired the new Frontline documentary Separate and Unequal. It discusses the issue of persistent school segregation by focusing on a political campaign to carve out part of Baton Rouge and incorporate a new town. This type of movement has been going on for decades, but now here's a documentary that'd be useful for teaching undergraduate students.

Sunday, June 22, 2014

Immigration in the Heartland

Blogging has slowed down with other summer plans, and I still probably won't post anything at length in the next few days as I make final touches on a research proposal and update the qv package for Stata.

Meanwhile, some really nice journalism by Damien Cave and Todd Heisler from the New York Times. They have been traveling along Interstate 35 and reporting on the evolving relationship between immigrants and Heartland America. Their journey and all of the updates can be traced on Twitter at #thewaynorth. The most recent report, titled "Living with Immigration", documents the gradual, sometimes reluctant, acceptance of immigrants in many communities.

What I love about this type of work is how it gives a face to the surge of U.S. immigrants in non-traditional places. Academic research on the this phenomenon, often referred to as the "new destination" literature, has been fruitful. But since I like to sometimes mix in non-academic sources when teaching introductory courses, I always appreciate high quality journalism on social issues.

Monday, June 9, 2014

Brain drain and adaptation of Taiwanese immigrnats in the U.S.

I happen to be re-reading Hsian-Shui Chen's Chinatown No More: Taiwan Immigrants in Contemporary New York (1992, Cornell University Press) this week .  In studying various types of Taiwanese immigrants in New York City during the late '80s, Chen concluded that what appeared to be a homogeneous Chinese enclave on the surface had diverse compositions. The lives of Taiwanese immigrants, in particular, have little interaction with the Manhattan Chinatown and its associated  immigrant.

Looking back at his conclusion two decades later, heterogeneity in immigrant communities is hardly news to us who study international migration now. Still, this book intrigues me the most with the discussion about the scale of brain drain, and immigrants' adaptation in the U.S. society.

Brain Drain
On the first subject, I have to start with this brief on page 129.

"In 1974, some 22,366 college graduates were produced in Taiwan. Of these, 2,285 went abroad to study, and only 486 returned after fishing their studies. Through 1986 [sic], an average of 4,632 Taiwan students went abroad each year, but only 793 came back (World Journal, January 26, 1986)..."

Let me put these numbers in perspective: about one-tenth of the college graduates in Taiwan went abroad during the 70s and 80s, with probably 90% of them studying U.S. institutions. These were not the average young people in Taiwan, but from the more competitive and motivated segment of their cohort. Then over three quarters of them stayed in the States after completing their studies. Even for me lived through part of this history, it is still striking to learn the actual scale of exodus.

On the sending end, this has to have depleted Taiwan of its human capital and wasted the educational investment made on these individuals. Common knowledge in Taiwan was that these young people were primarily from the best universities in Taiwan. So the exodus would be analogous to the U.S. losing half of their Ivy League graduates, or for Britain to send half of those trained in Cambridge and Oxford away for good.

There are also implications on the receiving end. While a few thousand immigrants every year is a drop in a bucket for the entire U.S, the aggregate impact is likely substantial at the ethnic group level. Take a conservative estimate of 2,000 per year for two decades, which yields 40,000 immigrants with post-graduate degrees. Putting this number in context, U.S. Census shows that the total Taiwan-born population in the States was 75,353 in 1980 and 244,102 in 1990 (tripled in a decade!). Even by using the 1990 number, the Taiwanese American group would still have one-sixth of its population with advanced degrees earned in the U.S. And this has not yet factored in the associated chained migration effect that brings along family members that are also likely to be more educated than average, which could further bolster the overall educational level. In fact, Portes and Rumbaut (2001:83) used the 2000 U.S. Census and estimated that 66.7% of the Taiwan-born population in the States have college degrees (second to India's 69.1%). That is much higher than the 35%~45% for the U.S. born.



Adaptation
But these students did not necessarily fare well, at least according to Chen's account. Some of his interviewees with advanced degrees struggled to find or hold on to a professional job. Two of three professionals whose stories Chen detailed had held non-professional jobs, such as clerks and self-employment, before finding stable work in the public sector--one in the IRS and another in the municipal government. It was obvious that their advanced training did not help much in transitioning them into the job market.

Language and culture differences also appeared to played a part, as both interviewees appeared to remain distanced from English-speaking social circles. During times bouncing between jobs, their job search depended much on relatives and social ties in the ethnic community.

The role of ethnic communities is indeed intriguing. The once heated enclave economy debate in sociology (eg. Model 1992; Portes and Jensen 1987; Sanders and Nee 1987; Zhou and Logan 1989) revolved around the issue of whether ethnic sectors of the market provide better returns to human capital. While the debate has examined the question from numerous angles, one overlooked aspect is the potential differentiated effects due to differences in skill levels. In Chen's account of Taiwanese immigrants, the less-skilled pretty much viewed the ethnic sector as their main option for employment. Even when occasionally venturing outside of the sector, they still mostly adhered to paths carved out by predecessors from the ethnic community.

Career options for the U.S. educated, however, appeared to have a different composition. The enclave economy served as a fall back option while they could seek self-employment or low-level white collar jobs. They are rarely, however, satisfactory with the work environment, which resembles a secondary labor market sector. Meanwhile, the U.S. earned credentials offered them the flexibility to search office jobs in formal organizations outside of the ethnic community. Work in the mainstream economy does provide better pay, reasonable hours, and job security, but the language and cultural barriers appear to limit how far the immigrant can go in these organizations.


Tuesday, June 3, 2014

A few R functions to summarize lmer results

As I am wrapping up with the growth curve models at hand, here are a couple of R functions to share with whoever is still using lmer from the pre-1.0 version lme4 (not ready to upgrade as yet--too many codes would require updating). These functions were written to summarize results from different lmer models.

The first three functions separately extract the model summary statistics (lmer.stats), the fixed effect parameters (lmer.fixef), and the random effect parameters (lmer.ranef) into data frames. The last function lmer.append can combine these results into an aggregated data frame, which can then be saved as a spreadsheet using the xlsx package.

Note: the if condition in lmer.ranef needs revision to make the columns consistent if you have more than one covariance term in any of the models. Otherwise R won't be able to aggregate the data frames.

lmer.stats<-function(lmer.name) {
    A<-AIC(lmer.name)
    B<-BIC(lmer.name)
    ll<-logLik(lmer.name)
    dg<-attr(ll,"df")
    dv<-deviance(lmer.name)
    obs.TIME<-length(lmer.name@y)
    obs.CHILD<-sapply(ranef(lmer.name),nrow)[1]
    names(obs.CHILD)<-NULL
    obs.SCHOOL<-sapply(ranef(lmer.name),nrow)[2]
    names(obs.SCHOOL)<-NULL
    label<-deparse(substitute(lmer.name))    # identifier
    df<-data.frame(label, "AIC"=A, "BIC"=B, "LL"=ll, "DEV"=dv, "df"=dg, "N"=obs.TIME, "CHILD"=obs.CHILD, "SCHOOL"=obs.SCHOOL)
}

lmer.ranef<-function(lmer.name){
    re<-data.frame(summary(lmer.name)@REmat)
    re<-subset(re,select=-Name)
    label<-deparse(substitute(lmer.name))     # identifier
    nr<-nrow(summary(lmer.name)@REmat)
    md<-data.frame(rep(label,nr))
    colnames(md)<-"Model"

    dfr<-data.frame(cbind(md,re))

    if (ncol(dfr)==4)    {    # random slope models have more columns
        corr.col<-data.frame(rep(NA,nr))
        colnames(corr.col)<-"Corr"
        V6.col<-data.frame(rep(NA,nr))
        colnames(V6.col)<-"V6"
        dfr<-data.frame(cbind(dfr,corr.col,V6.col))
    }     else {
                dfr<-dfr
        }
}

lmer.fixef<-function(lmer.name){
    beta<-data.frame("Beta"=fixef(lmer.name))
    se<-data.frame("S.E."=sqrt(diag(vcov(lmer.name))))
    vars<-data.frame(row.names(beta))
    colnames(vars)<-"Variable"
    vars$Variable<-gsub("\\)", "", vars$Variable)    # deal with (Intercept)
    vars$Variable<-gsub("\\(", "", vars$Variable)
    label<-deparse(substitute(lmer.name))     # identifier
    md<-data.frame(rep(label,length(lmer.name@fixef)))
    colnames(md)<-"Model"
    row.names(beta)<-NULL
    dff<-data.frame(cbind(md,vars,beta,se))
}

lmer.append<-function(...,append=TRUE)    {
    label<<-deparse(substitute(...))
    if (!append){
        L.stats<<-lmer.stats(...)
        L.ranef<<-lmer.ranef(...)
        L.fixef<<-lmer.fixef(...)
    } else {
        L.stats<<-rbind(L.stats, lmer.stats(...))
        L.ranef<<-rbind(L.ranef, lmer.ranef(...))
        L.fixef<<-rbind(L.fixef, lmer.fixef(...))
    }
}


Added 06/09/2014:
Someone just reminded me that the functionality of lmer.stat is similar to the base routine anova(). Say you have lmer model estimates A1, A2, and A3, anova(A1,A2,A3) returns a data frame that summarizes the degrees of freedom, AIC, BIC, log-likelihood, and results of deviance tests in relation to the first model, in this case, A1. Still, that does not include the other statistics that lmer.stats provides.


Tuesday, May 27, 2014

An R detour to where I started: trouble with versions and Rtools


(I did not expect the first post to be about technical detours, but I figure this experience was worth documenting to R users who need to deal with different versions. If you like to skip details about the detour, just scroll to the button for a constructive summary.)

The detour
One annoying part of doing quantitative analysis is that one can waste a lot of time on trivial technical issues just to end up where you were. What often makes it worse is when one fails to spot the real culprit at the first place, and goes on to solve the wrong problems.

Much of my Memorial Day was spent on dealing with some unexpected trouble with regard to the lmer models in R. I have been analyzing a couple dozen different lmer models over the last few weeks. I would love to do these mixed models in Stata, but my models have two crossed random-effects each with thousands of groups. For tasks like this, lmer is much more practical than xtmixed's computationally demanding ||_all: R.factor workaround.

So all the trouble started after I installed the lmerTest package the night before. I had hoped it would be useful in estimating the standard errors for the random effect parameters, which xtmixed provides but lmer does not (warning: lmerTest did not help). Then all of sudden, my lmer codes that had previous worked began returning error messages that say something about the CONTROL options. At this moment, I didn't think about why this happened (which I should have) and simply dropped the CONTROL options from my lmer models. This adjustment allowed the models to proceed again, but the output looked different from before. For example, the fixed effect point estimates were now returned in a single row rather than multiple rows, and no standard errors were provided.

This was the second warning sign telling me to figure out the problem. But instead of slowing down and analyzing why the output was different, I dived right in the problem and spent two hours figuring how to extract the fixed effects and random effects by writing additional functions. The fumbling was certainly helpful in improving my R knowledge--I got to learn more about S4 objects and figured out how to calculate standard errors from the matrices. Still, these results took quite a bit of work and were still poorly formatted. It's at this moment that I started to suspect my lmer function might have been altered . After some searching and trial-and-error, including updating everything to the newest version of R just to figure out that was not the problem, I concluded that the lmerTest package updated my lmer to a newer, and substantially different, version.

So R users do speak about old and new lme4. There is apparently an overhaul of package after version 1.0 that is still underdevelopment. Someone even created a new package called lme4.0 to compare results from the old and new versions, which can yield different estimates.

Four hours into this problem, I finally reached the correct diagnosis. But it turned out that this was only the first step towards setting things right. My original lmer commands apply to an older lme4 package. Older lme4s can only be found as source files in .tar format. I could not install them straight up because Win 7 is not equipped with the compiler (another good reason to use Mac or Linux). Fortunately, Duncan Murdoch has created Rtools to help Windows users convert source files.

Building R packages from the source code requires at least the functionality of Rtools' own commands, MinGW (a tool to compile codes in windows; some come along with Rtools), the Inno Setup installer (installer for Windows programs; separate from Rtools), and LaTex (to create documentation). More about the Windows toolset

So I installed Rtools and Inno Setup (I already had LaTeX). I then downloaded an old source file for lme4. Everything seemed pretty straightfoward. I typed install.packages("C:/Work/lme4_0.999375-42.tar.gz", repos = NULL, type="source") in my R console, expecting Rtools to do its magic. But an error message told me R could not run "make.exe" and "sh.exe". Both commands are under \Rtool\bin.

So I went back to the Rtools instruction and focused on the system PATH settings. Basically, Rtools requires specifying the paths for Rtools, MinGW, and other related programs in Windows' environment variable settings. This page (http://www.java.com/en/download/help/path.xml) explains the standard way to access the PATH variables, but I found it easier to use "path" in the terminal to ensure the ordering of paths is as required by Rtools (basically the path must start with Rtools, immediately followed by MinGW).

This is the quote from Rtool.txt. Note there is no space in between the paths and the semi-colons.
"Finally, the Rtools installer will optionally edit your PATH variable as follows:
PATH=c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;<others>...
(where you will substitute appropriate directories for the ones listed above, but please keep the path in the same order as shown. LaTeX and R itself should be installed among the "others".)"

Nothing illustrate my point about trivial details better than the last fix. I thought I had the PATH variables set up properly like this: "C:\Rtools\bin ; C:\Rtools\gcc-4.6.3\bin ;...<other paths>". But R still told me it could not run "gcc.exe", which was under "\Rtools\gcc-4.6.3\bin". Out of desperation, I decided to give it a try and remove all spaces between the paths and the semi-colons. Turns out those spaces (or maybe one of them) were the last thing preventing Rtools from finding that second path MinGW.

So that's it. After an entire day of work, I got R back to the state before lmerTest messed up everything. I missed two warning signs and wasted a couple of hours taking a long detour. Even as I finally figured out what the problem was, a tiny additional space stil got in the way. That's all in a good day's work.

--
Summary
For those with problems related to newer packages, I organized some of the things that I found helpful. I am using lme4 as an example, so substitute it with the name of your own package.

To install and uninstall packages, use
install.packages("lme4")
remove.packages(lme4,libpath)

To load or unload packages, use
require(lme4)
library(lme4)
detach("package:lme4")

To check installed packages and their versions, use
installed.packages()

alternatively, the loaded packages can be verified with
sessionInfo()

To check if a specific package is installed, use,
is.element("lme4", installed.packages()[,1])


Rtools (http://cran.r-project.org/bin/windows/Rtools/)

  1. Check the reference table in the page to make you choose the correct version of Rtools that corresponds with your R
  2. Install Rtools
  3. Install Inno Setup (http://www.jrsoftware.org/isdl.php) and MiKTex (http://miktex.org/) if you haven't.
  4. Check if your PATH variables in your Windows OS is set up correctly. There are different ways to achieve this, but I recommend first using the terminal window (hit Window key+R -> type "cmd" and hit return -> type "path" and hit return). If your Rtools did not alter the PATH to something like "c:\Rtools\bin;c:\Rtools\gcc-4.6.3\bin;<other paths>" (it has to be in that exact order), then edit the environment variables to make them so. And a visual guide (http://www.computerhope.com/issues/ch000549.htm)on how to edit the path.
  5. Download the source file for a package. 
  6. Open your R and type in the command to install the package. For example, I entered: install.packages("C:/Work/lme4_0.999375-42.tar.gz", repos = NULL, type="source")
  7. The package is properly installed if no error is returned. If you like to check, use "installed.package()" to see if it is there.