Parallelising with R

I’ve been fitting some generalised linear mixed effects models with 400,000 rows or so and it takes a while even on my quad core 3.4Ghz behemoth. The time has come for me to learn how to use multiple cores with R. The parallel library which comes with the new-ish R 2.14 makes it pretty easy. For my purposes, all I needed to do was pop all of my calculations in a list and then call mclapply on them.

This is my first time parallelising so this might not be the best way to do it, if you know better please say in the comments, but if you are interested to give it a try then this should get you started.


mymodels=list("glmer(AllViol~Clinical*Dir+(1|ID),
              family=binomial, data=testF)",
              "glmer(AllViol~Risk*Dir+(1|ID), 
              family=binomial, data=testF)",
              "glmer(AllViol~Historical*Dir+(1|ID), 
              family=binomial, data=testF)",
              "glmer(SerViol~Clinical*Dir+(1|ID), 
              family=binomial, data=testSer)",
              "glmer(SerViol~Risk*Dir+(1|ID), 
              family=binomial, data=testSer)",
              "glmer(SerViol~Historical*Dir+(1|ID),
              family=binomial, data=testSer)"
              )
  
myfinal=mclapply(mymodels, function(x) eval(parse(text=x)))

One thing I would add is that this by no means gets the full speed out of my CPU, which only ran at about 30% and not even all of the time. I think there’s probably more that I could do to get a speed boost, but this has got this job done and got me thinking about parallel processing.

Leave a Reply