Parallelising with R

I’ve been fitting some generalised linear mixed effects models with 400,000 rows or so and it takes a while even on my quad core 3.4Ghz behemoth. The time has come for me to learn how to use multiple cores with R. The parallel library which comes with the new-ish R 2.14 makes it pretty easy. For my purposes, all I needed to do was pop all of my calculations in a list and then call mclapply on them.

This is my first time parallelising so this might not be the best way to do it, if you know better please say in the comments, but if you are interested to give it a try then this should get you started.

              family=binomial, data=testF)",
              family=binomial, data=testF)",
              family=binomial, data=testF)",
              family=binomial, data=testSer)",
              family=binomial, data=testSer)",
              family=binomial, data=testSer)"
myfinal=mclapply(mymodels, function(x) eval(parse(text=x)))

One thing I would add is that this by no means gets the full speed out of my CPU, which only ran at about 30% and not even all of the time. I think there’s probably more that I could do to get a speed boost, but this has got this job done and got me thinking about parallel processing.

Quick R tip- making scripts work across OS’s

Quick thought for newbies to R. I have spent four years loading in data like this


on my Windows machine and like this


on my Linux machine. I would comment out the one I wasn’t using at the time.

It’s fine for a while but once you start reading multiple dataframes (particularly if they’re not all loaded at the start) it gets very fiddly and annoying. Don’t do this. Just set your working directory. You can do it manually:


Or, even better, the seamless option:


Mmm, I can almost taste the hours I’ll save in the course of a year.