Shiny app running at last

I’m currently taking part in the NHS Institute for Innovation supported Patient Feedback challenge with colleagues at Nottinghamshire Healthcare NHS Trust. We’re delivering our patient feedback in a web system and I have been working with the web developers, the lovely people over at Numiko, they’re doing all the web type stuff and I’m writing R code, running over FastRWeb, which is fantastic and will get a well deserved blog post some time soon.
In order to make sure that my code keeps pace with the interface they design and there’s no nasty surprises when the launch of the website gets near I’ve been doing some prototyping with the wonderful Shiny package from the amazing people over at RStudio, whose IDE I sincerely love.

I’ve done a little video about it for the challenge which I thought I might as well share on here. I’m blown away by how easy it is to use and although I’m using it for prototyping in this case I can imagine building a whole tool with it and getting something really useful and attractive, maybe with a little bit of JavaScript jiggery-pokery.

Video

I won’t share all the server side code because it’s a bit of a mess at the moment, I’ll put a cleaner version up on GitHub once it’s a bit further along, but here’s the UI code just to demonstrate how simple it is (really the server side just picks up one or two of the variables and then does various graphs etc. based on code I already had).


library(shiny)
# Define UI
shinyUI(pageWithSidebar(

# Application title
headerPanel("SUCE results"),

# first set up All/ Division results

sidebarPanel(
selectInput("Division", "Select division", list("Trust" = 9, "Local"= 0, "Forensic"=1, "HP" = 2)),
conditionalPanel(
condition = "input.Division != 9",
uiOutput("divControls")
),
textInput("keyword", "Keyword search: (e.g. food, staff)"),
selectInput("start", "From: ", list("Apr - Jun 11" = 9, "Jul - Sep 11" = 10, "Oct - Dec 11" = 11,
"Jan - Mar 12"= 12, "Apr - Jun 12" = 13, "Jul - Sep 12" = 14)),
selectInput("end", "To: ", list("Apr - Jun 11" = 9, "Jul - Sep 11" = 10, "Oct - Dec 11" = 11,
"Jan - Mar 12"= 12, "Apr - Jun 12" = 13, "Jul - Sep 12" = 14),
selected = "Jul - Sep 12"),
checkboxInput("custom", "Advanced controls", value=FALSE),
conditionalPanel(
condition = "input.custom == true",
selectInput("responder", "Responder type", list("All" = 9, "Service user"= 0, "Carer"=1)),
selectInput("sex", "Gender", list("All" = "All", "Men"= "M", "Women"= "F"))
)

),

# Show the caption and plot of the requested variable
mainPanel(
h3(textOutput("Title")),
tabsetPanel(
tabPanel("Stacked plot", plotOutput("StackPlot")), 
tabPanel("Trend", plotOutput("TrendPlot")), 
tabPanel("Responses", tableOutput("TableResponses"))
))
))
 

But I do want to be data (cross posted)

I’ve got a guest piece on the Patient Opinion Blog which is cross-posted below.

Healthcare scientists will universally recognise the dominance of quantitative methodology over qualitative methodology, with the oft-qutoed hierarchy of evidence featuring such quantitative behemoths as meta-analysis and RCT at the top and qualitative methods further down, rather sniffily described as “Case reports” or “Case series”.

As a rather hardcore quantitative scientist myself with a great deal of good feeling towards my qualitative brethren, it was with joy and horror that I read Paul’s recent piece on the Patient Opinion blog “I don’t want to be data, I want a conversation”, which flips the usual dominance relationship on its head and champions qualitative over quantitative methods. Paul argues that the NHS needs to “step outside its mindset” and stop using words like “captured” and “mined” in regard to patient feedback which should be treated, first and foremost, as a conversation.

One of the funny things about being a data monkey in the NHS is that people think of you as rather like a machine, spewing out graphs and computer code and don’t think about you as getting sick, or tired, or seeing a doctor. But in fact I have rather a lot of chronic illnesses and it seems I’m forever waiting in the GP’s reception or waiting for my consultant to call me. I’ve seen many doctors over the years, some brilliant, some okay, and some really dreadful ones, and I totally identify with the idea that when you feedback to the NHS you want it to listen to you and respond properly rather than giving you a “corporate” response which really just protects them legally and doesn’t commit to any change.

I do think it’s dangerous, though, to conflate the use of surveys, statistics, and data-focused methodologies with poor quality responses from health providers.

Because actually I do want to be data.

I recognise the value of response rate, sample size, reliability, and validity and although I am a big fan of the story-driven approach adopted by Patient Opinion I worry that if we only had two stories from each service user area we wouldn’t know enough about the silent majority. A good example of this would be where specific service user groups had poor engagement with feedback mechanisms. Their voice would never be heard. Using a data-driven approach we can compare the demographic composition of survey respondents with the known demographic composition of service users and ensure we are hearing everybody’s voice. Where we are not hearing everyone’s voice, we can even use complex survey techniques to adjust summary statistics to better reflect the “true” value of the statistic in the population.

I’m an atypical example, of course. Most people don’t want to be data, and I frustrate people in my work and home life by continuously harping on about probability, the ecological fallacy, correlation (it’s NOT CAUSATION!), the presence or absence of control groups, covariates, and so on ad infinitum. But I’m proud to be a data monkey and it’s my heartfelt wish to make sure that we can hear all the voices across all of our health services, analyse, mine, factor, and standardise them, and only then will we be ready to have a truly informed conversation over a nice cup of tea (this stage I’ll leave to the experts, I’ll stay here with my spreadsheets, but it’s white with none, thanks).

Super easy heatmaps of postcodes

Whatever I want to do, there are always intrepid explorers who’ve been there and blogged it, and so the satisfaction of my long held desire to get to know more about how Nottinghamshire Healthcare’s services are spread geographically has been wonderfully expedited by these amazing blog posts.

Special thanks, of course, go to David Kahle and Hadley Wickham, progenitors of the mighty ggmap package and also to the fine folk at geonames who freely distribute postcodes from around the world in .csv format.

With the thanks out of the way, there’s almost no work for me to do at all, and I’ve produced this lovely heatmap with absolutely minimal coding. I can’t tell you what it represents, I’m afraid, because I haven’t cleared the data for release, and actually it doesn’t represent anything particularly interesting at the moment. I need to do some preparation of the data but I naturally did this bit first because it’s more fun.

Click to expand!

library(ggmap)

myUni=mydata[!duplicated(mydata$ClientID),] # produce dataframe with unique individuals

mywhere=merge(myUni, mycodes, by.x="ClientHomePostcode",
              by.y="Postcode", all=FALSE) # merge with postcode data

### Plot!

map.center = geocode("Nottingham, UK") # Centre map on Nottingham

myMap = qmap(c(lon=map.center$lon, lat=map.center$lat),
              source="google", zoom=10) # download map from Google

myMap + stat_bin2d(bins=80, aes(x=Long, y=Lat), alpha=.6, data=mywhere) +
  scale_fill_gradient(low = "blue", high ="red")
  # plot with a bit of transparency

Note finally that you can use Google’s map API to give you latitudes and longitudes from postcode data (using the geocode() function), but you are limited to 2500 queries per day. I had many more than that so I needed to download the postcode data.

Easy tables for publication

I’m writing for a journal article today and it’s the familiar problem of how to export all of the beautiful tables and graphs from LaTeX to Word (using R, of course).

It’s surprisingly easy to get a table from R into Word (or Open Office, or…). Just export the table to a text file and use tab-delimiting:

write.table(mytable1, "Summary.txt", sep="t")

Then simply copy the table into the file, select “Convert text to table”, make sure “tab-delimiting” is selected, which it probably is, and voila.

You can be even sneakier than that though without too much effort.

I’ve been pasting together the familiar “mean (sd)” format used in tables quite easily like so:

varNames=c("H.Tot", "C.tot", "R.tot", "HCRtot")

sapply(varNames, function(m) rbind(paste(round(tapply(mydata[,m], mydata[,"Dir"], mean, na.rm=TRUE), 2),
    " (", round(tapply(mydata[,m], mydata[,"Dir"], sd, na.rm=TRUE), 2), ")", sep="")))

Somehow coming up with generic solutions for transferring tables is a lot less depressing than sitting typing them out or manually tidying up SPSS output.

Apologies for the poor code formatting, incidentally, I can’t seem to work out how to format correctly on the blog. My version on RStudio is a lot more readable.

EDIT:

I’m glad I did this, because I’ve just been told that I put a variable in that I shouldn’t have done and so I have to re-do all the tables! Looks like I was right about generic solutions and typing!

A joke about R

I’m so sorry, I’ve come across an R joke I must share. If you don’t care about R, look away now because this post is definitely not for you.

A scientist and a statistician are working together in the office. The scientist says to the statistician “Which is the fastest way to identify non-zero elements in an array using R?”.

The statistician replies “Yes.”

“No, which is the fastest way?”

“Yes, it is.”

“No, I’m asking you, which is the fastest way?”

“Yes, you’re right, it is.”

Etc…

[if you don’t get it and you’re still reading, here].

Thanks to Hadley Wickham, progenitor of the first R joke I’ve come across.

EDIT: I thought I stole this joke structure from the “Hu is the President of China” bit (I forget where I heard it from) but thanks to the magic of Twitter I now know that they stole it from Abbott and Costello:

Parallelising with R

I’ve been fitting some generalised linear mixed effects models with 400,000 rows or so and it takes a while even on my quad core 3.4Ghz behemoth. The time has come for me to learn how to use multiple cores with R. The parallel library which comes with the new-ish R 2.14 makes it pretty easy. For my purposes, all I needed to do was pop all of my calculations in a list and then call mclapply on them.

This is my first time parallelising so this might not be the best way to do it, if you know better please say in the comments, but if you are interested to give it a try then this should get you started.


mymodels=list("glmer(AllViol~Clinical*Dir+(1|ID),
              family=binomial, data=testF)",
              "glmer(AllViol~Risk*Dir+(1|ID), 
              family=binomial, data=testF)",
              "glmer(AllViol~Historical*Dir+(1|ID), 
              family=binomial, data=testF)",
              "glmer(SerViol~Clinical*Dir+(1|ID), 
              family=binomial, data=testSer)",
              "glmer(SerViol~Risk*Dir+(1|ID), 
              family=binomial, data=testSer)",
              "glmer(SerViol~Historical*Dir+(1|ID),
              family=binomial, data=testSer)"
              )
  
myfinal=mclapply(mymodels, function(x) eval(parse(text=x)))

One thing I would add is that this by no means gets the full speed out of my CPU, which only ran at about 30% and not even all of the time. I think there’s probably more that I could do to get a speed boost, but this has got this job done and got me thinking about parallel processing.

Quick R tip- making scripts work across OS’s

Quick thought for newbies to R. I have spent four years loading in data like this

mydatafirst=read.csv("D:\Dropbox\R-files\Project1\etc.")

on my Windows machine and like this

mydatafirst=read.csv("/home/chris/Dropbox/R-files/Project1/etc.")

on my Linux machine. I would comment out the one I wasn’t using at the time.

It’s fine for a while but once you start reading multiple dataframes (particularly if they’re not all loaded at the start) it gets very fiddly and annoying. Don’t do this. Just set your working directory. You can do it manually:

setwd(“D:\Dropbox\R-files\etc.”)

Or, even better, the seamless option:

setwd(ifelse(Sys.info()['sysname']=="Windows";,
             "D:\Dropbox\R-files\Project1\",
             "~/Dropbox/R-files/Project1/"))

Mmm, I can almost taste the hours I’ll save in the course of a year.