Clearly Better For Analytics and Graphics (My LondonR talk- Developing with Shiny for the Web)

I gave a talk at the LondonR group last night about using Shiny and particularly about integrating it with HTML, CSS, JavaScript and jQuery to go nicely with other web content. Appropriately enough, therefore, I made the whole talk using HTML. Thanks to Romain Francois for his excellent highlight package (surely it should be called highlightR, though?) with which I did the HTML for the highlighting of R syntax and to this website with which I highlighted the HTML, JavaScript and jQuery.

I did promise to bring some free copies of my book, sadly these did not arrive from the publisher in time but I will get them sent out this week, apologies to the three people who won them.

Those who attended my talk will also know that there are three more e-copies up for grabs, so just answer the question I posed at the talk and send it to chrisDOTbeeleyATnottshcDOTnhsDOTuk and the first three will get a code to redeem for an electronic version, which includes pdf, Kindle, and online cloud access.

The talk is here, have a look at the applications and code, hopefully you will find it useful. I should emphasise that I am expert in neither JavaScript nor HTML/ CSS and have written code that merely works so please don’t assume that all the HTML/ CSS is industry-standard, it really isn’t.

Speaking of which, older version of IE do something truly awful to the slides, I’m afraid I have neither the time nor the skill to do anything about this. I have tested it on IE10 as well as up to date versions of Chrome and Firefox so please use one of them instead.

The internet platform of the future

I recently sent the following to the New Statesman in response to a piece they recently carried which weighed up the likely fortunes of Google, Facebook, and Apple, and asked which would be the internet platform of the future. It is quite obvious to me that we already have the internet platform of the future, and it’s called Linux. They ran it, edited, in the letters page, but they don’t publish letters online so here, for the first time, live and unplugged, is the unexpurgated version.

“I read the recent “Is Apple dying?” with great interest- a sound analysis of the woes of the increasingly less mighty Apple. It’s interesting that even in the pages of the New Statesman, however, the value of something is ignored unless it is profitable. For we already have “The internet platform of the future”. We have long had it and it shows no sign of weakening in the future. It’s called Linux. It’s totally free and so turns no profit for shareholders, but it ceaselessly and tirelessly grows and develops, a product of a global community that believes in the true values of the internet. The internet as we know it today would be quite impossible without Linux- could Apple and Facebook say the same? Google perhaps could, but it’s interesting to note their heavy use of Linux, most notably within the Android operating system.”

Free and open source software. Come for the free, stay for the awesome

I just wrote this for the Institute of Mental Health blog so I thought I may as well cross-post here. I promise I will talk about something other than Shiny soon 🙂

Open source software refers to software whose code is available for anybody to view, revise and reuse. This should be contrasted with closed or proprietary code which is known only to the manufacturer, and who will often charge a fee for individuals to use the software (for example, Microsoft Word, or Adobe Photoshop).

There are many types of open source licence but the principle of viewing, revising, and reusing underpins them all. On the face of it, this does not seem like a very exciting proposition, and newcomers to the concept often wonder what difference such an apparently obscure part of software licensing could make.

However, because open source software (often, although not always, available at no cost, and referred to as Free and Open Source Software or FOSS) can be modified by anybody it offers a number of advantages over traditional proprietary software.

Firstly, FOSS is free. I’m writing this blog post on a computer which contains absolutely no software of any kind that costs any money whatsoever. I shall repost it on my self-hosted blog which runs on a server which itself contains no software of any kind that costs any money whatsover. In a time of austerity, the availability of free software that can perform all the tasks of paid software should be a deafening clarion call across the public sector. And indeed FOSS is starting to gain traction in education and health.

However, not only this, but FOSS is often better than its paid counterpart. This is because anybody can change and improve it, and whole communities exist around popular tools, adding new features and fixing bugs continuously with everyone inside and outside of the community benefiting. The FOSS operating system Linux basically runs the internet. The closed source Microsoft Excel, notoriously, features bugs that Microsoft can’t or won’t fix (e.g. here, here). Even the US Military are making ever larger use of open source sotware.

The statistical programming language R is another success story from the FOSS world. Precisely because it is free and open source, a huge community of statisticians, programmers, data visualisers and analysts all contribute to its development. At the time of writing there are nearly 5000 user contributed packages within R, which help users with tasks as diverse as computational chemistry and physics, finance, clinical trials, medical imaging, psychometrics, machine learning, statistical methods, and the production of extremely powerful and flexible statistical graphics. R is rapidly becoming the lingua franca of analytics and is widely used in many leading data science departments, perhaps most notably Google.

I have made extensive use of R in my work on Nottinghamshire Healthcare NHS Trust’s new patient feedback site, using R to serve both static quarterly reports as well as interactive, searchable analysis of all of our feedback past and present.

The searchable reports are made possible by the Shiny package which makes it ridiculously easy to allow users to interact with a dataset. Shiny handles all of the hard work involved in programming a graphical user interface and lets analysts such as myself concentrate on the content of what is delivered to the user.

Although Shiny is never going to replace other methods of programming for very large, fully featured analytics platforms (for example, Google Analytics) it has certainly proved its worth on the patient feedback website and I would hope that R and Shiny would find more and more use in NHS settings up and down the country to allow Trusts to better communicate their data to the communities and service users whom they serve.

There are some demonstrations of what can be done with Shiny on my website, some are little examples I wrote just for the book but some are more fully featured than that and hopefully demonstrate some of the things which can be achieved using Shiny.

Over the next 10 years I hope to see more and more use of FOSS, not only for the Free, but also for the Awesome.

Quick p-values for publication

I’m afraid this is just from a lecture from the school of the totally obvious, but for some unfathomable reason I’ve never got round to doing it. I’m not even telling you the code so much as just saying “you should definitely do this, it’s very easy”. I got the time back after writing one paper and I constantly have to fiddle around with p-values so I imagine I shall save many hours each year.

So if you’re writing for publication you often want to turn p-values such as 3.7-e18 into something more like “p<.0001". Sometimes you have whole tables of them. Don't do it by hand as I have been doing. Do this: [code language="R"] pValueFunction = function(pValue) { if(pValue < .0001) return ("p<.0001") if(pValue < .001) return ("p<.001") if(pValue < .01) return ("p<.01") if(pValue < .05) return ("p<.05") if(pValue < .1) return ("p<.1") if(pValue >= .1) return ("p>.1") } [/code] This is most useful, of course, if you're using Sweave, or knitr or something like that. If you’re not doing that you should start, but that’s a different post.

Or you may just be outputting tables with p-values in as plain text as I wrote about in this post.

Boom! You’ll save enough hours in your life to read the wonderful Don Quixote.

Book launch

My book has just come out, Web Application Development with R using Shiny. It’s an introduction to the marvellous Shiny from the wonderful people at RStudio. Although the online documentation for Shiny does cover most of the things you might want to do with it the people who approached me about writing the book thought that having a more comprehensive guide with some worked examples would be useful. And I was inclined to agree. So I wrote the book in 3 months, which was one of the most brilliant and awful experiences of my entire life.

As I mentioned in a previous post, a lot of the code I wrote for the book lives on GitHub, links from my website. I also host all the applications on my server, so you can play with the actual applications themselves, again links are on the site.

The book takes you all the way from beginning with R, all the way through very basic examples of Shiny, and on through using HTML/ CSS/ JavaScript/ jQuery with Shiny, and various advanced functions like controlling reactivity, uploading and downloading files, all that kind of thing.

It’s a very humble contribution, really, but I would definitely have bought, read and used it when I was starting out, so I hope it goes some way to helping people use Shiny, which really to me represents a complete game-changer in the way that you can write a halfway decent application to let anybody explore your data in a morning.

Patient Opinion data made easy with R

As you may or may not be aware, I recently helped on the data side of Nottinghamshire Healthcare NHS Trust’s feedback website which lives here. Part of this work was collecting the data we receive from the Trust-wide patient survey and bringing it in with other sources of feedback, including stories from our friends over at Patient Opinion.

For a long time I’ve queried their data using their API (documentation is here) and fiddled around with it a bit, but my XML was never quite up to doing a proper extract and getting the data into the same structure as the rest of our patient experience data. The job sort of got shelved under “A computer program could do this, nobody has time to write the computer program, maybe a human will do it” which to be honest is quite a large shelf groaning with the weight of a lot of stuff at the moment.

Anyway, we’re off to show off the website at a big meeting on Thursday and this has spurred me into action. I’ve finally managed to wrangle the data into shape using, what else, the mighty R with assistance from the RCurl and XMLpackages.

I confess coming from a background of not even really knowing what XML was it’s been a bit of a slog but I have got there now so I’m writing this post to help out future explorers who might face this exact problem (hey, every NHS Trust should use Patient Opinion and have an R guy wrangling the data, no question about that) or a similar one. I’m going to show you the dumb way (which I did first) and then the clever way.

Point number 1, which I have to say is quite a common point number 1, is this is a lot easier on Linux than it is on Windows. RCurl and XML both have various problems on Windows which I don’t understand sufficiently to explain, suffice to say if I’m doing this job I boot to Linux and you should too.

Point number 2 is you’re going to need an API key, just email Patient Opinion and they’ll give you one. Mine will show as “XXXXX” in the following code so I’m afraid you won’t be able to run the code as it stands.

These instructions are based on an Ubuntu 13.04/ Linux Mint 15 (I’m cheating on Ubuntu with Mint at the moment) distribution, your mileage may vary, check your package management systems, etc.

The first thing you want to do before you launch R is make sure that the OS is set up with the libraries you need to make the RCurl and XML packages. From a terminal this is just:

sudo apt-get install libxml2-dev

sudo apt-get install libcurl4-gnutls-dev

Assuming these commands run okay, launch R and type at the console:


install.packages(c("RCurl", "XML"))

Assuming both the packages install okay you’re ready to go. Here’s the dumb way first, which I have to say is remarkably easy.

My code here includes the instruction:


getURL("https://www.patientopinion.org.uk/api/v1/opinions?healthservice=rha&apikey=XXXXX&take=100").

This contains two things of interest, firstly the health service code, which in our case is “rha”. This returns just stories about my Trust. It’s pretty simple to figure yours out by looking at the search pages or if you can’t do that just ask Patient Opinion and I’m sure they’d be glad to let you know. The second is my API key, shown here as “XXXXX”. As I mentioned you will need to get your own API key and place it here.

The one final thing to mention is that the code as written here queries all of the stories that have ever been published every time. Because it’s only searching a small part of the tree it does not bother Patient Opinion’s server too much, and there’s only 900 stories at the moment so it doesn’t take too long. You can use the “skip” function within the API (see the documentation) to only download stories you don’t currently have. I haven’t done this because I will probably move this code to a server where the process won’t have write permissions, which would obviously break it, so it just seems like wasted effort. In time I will probably look at getting write permissions for the process on the server (which I don’t manage) and changing the code but it’s not really a priority.


library(RCurl)
library(XML)

lappend <- function(lst, obj) { # this is a function I stole from somewhere years ago to stick list items together

  lst[[length(lst)+1]] <- obj ; return(lst)

}

mylist=list(xmlToDataFrame( # download the first 100 stories

  getURL("https://www.patientopinion.org.uk/api/v1/opinions?healthservice=rha&apikey=XXXXX&take=100")

))

skip = 100 # initialise a variable to download the next 100

result = NULL # this will be used to check for errors in the download, i.e. are there any stories left

while(class(result) != "try-error") { # while no errors, i.e. while API still has more stories

  result = try(
    xmlToDataFrame(getURL(paste("https://www.patientopinion.org.uk/api/v1/opinions?healthservice=rha&apikey=XXXX&skip=",
    skip, "&take=100", sep="")))
)

if (class(result) != "try-error") mylist=lappend(mylist, result)  # if there are stories, stick them onto the end of the list

  skip=skip+100 # get the next 100 stories next time

}

fine=do.call("rbind", mylist) # stick all the stories together in a dataframe

write.csv(fine, file="~/PO.csv") # export a spreadsheet

This is the dumb way, and will return a dataframe with all the stories in, as well as dates, responses, and some other stuff. For the dumb way it’s pretty good but the problem I had was that is stuck all the information about the service (the NACS code which is a unique identifier for each service) together with the other stuff about the service (what it’s called, where it is) together. Pulling the NACS code to match it with our database seemed a bit tricky, but more seriously whenever more than one service is mentioned in a story, the whole thing got stuck together, both NACS codes, addresses, everything. I really didn’t think I could handle these cases accurately. So this weekend I took the plunge and decided to parse the XML properly myself.

I didn’t want much really from all the data, just the story itself, the date, and the NACS code for matching purposes.

In order to make it work, I had to learn a little bit about XPath, which the XML package makes use of, and of which I have never previously heard. There are various tutorials on the internet, such as here and here. It seems like one of those very deep things but you can learn what you need for this task in half an hour.

Once you’ve got the hang of XPath, it’s just a case of downloading 100 stories, stepping through the XML to find the nodes you’re interested in using XPath and then sticking it all together in a dataframe. The problem I had at first is that there are 100 stories exactly, but slightly more NACS codes, because of the multiple services problem I mentioned above. I dealt with this by repeating each story as many times as there are unique NACS codes for that story. This makes the number of stories and NACS codes equal. You can then combine them into two columns of a spreadsheet, each story being identified with as many codes as were identified in that story. Using the same while loop as before gets all the stories off the API.

Finally I have another spreadsheet where I’ve made a lookup table between the NACS codes and our internal service codes we use on the survey. Again, you can either figure out the service codes from Patient Opinion by using the search function and seeing what comes back or just ask Patient Opinion nicely and I’m sure they’ll be glad to help. Merging the two brings back all the stories that match with our database.

Here’s the code:


library(RCurl)
library(XML)

###################################
######## DATA READ FROM API #######
###################################

### set the initial value of skip and make an empty dataframe and empty XML to keep everything

skip = 0

myFrame = data.frame()

myXML=NULL

### for as long as the API returns results, parse XML, convert to dataframe, and stick together dataframes

while(class(myXML) != "try-error") {

  myXML = try(
    getURL(paste("https://www.patientopinion.org.uk/api/v1/opinions?healthservice=rha&apikey=XXXX&skip=",
    skip, "&take=100", sep=""))
  )

  nodes = xmlParse(myXML)

  ### note the use of *[local-name() = 'XXX']
  ### this is because I couldn't get the namespace working

  opinion = xpathApply(nodes, "//*[local-name() = 'Opinion']/*[local-name() = 'Body']", xmlValue)

  date = substr(unlist(xpathApply(nodes, "//*[local-name() = 'Opinion']/*[local-name() = 'dtSubmitted']", xmlValue)), 1, 10)

  service = xpathSApply(nodes, "//*[local-name() = 'NACS']", xmlValue)

  sizes = xpathSApply(nodes, "//*[local-name() = 'HealthServices']", xmlSize)

  ### this function exists because some stories have more than one NACS code.
  ### So what it does is repeats stories (using the size variable)
  ### to make it the same length as the NACS, meaning stories are
  ### doubled next to each unique NACS. Then you can cbind

  listMat = mapply(function(x, y, z){
  
    rep(c(x, z), y)

  }, opinion, sizes, date)

  finalMat = matrix(unlist(listMat), ncol=2, byrow=TRUE)

  colnames(finalMat) = c("Story", "Date")

  final = data.frame(finalMat, "NACS" = unlist(service))

  if (class(myXML) != "try-error") myFrame = rbind(myFrame, final)

  skip=skip+100

}

###################################
######## match up NACS codes ######
###################################

POLookup = read.csv("poLookup.csv")

test = merge(myFrame, POLookup, by = "NACS")

write.csv(test, file = "POFinal.csv")

That’s it! I managed to match about 700 out of 900 stories this way, some of them are matched to rather larger service areas than we would like but honestly I think that’s the data’s fault. I hope that having done this we will now try to set aside a bit of time to work with Patient Opinion to improve the way our services are classified (just to be clear, this task is completely on us and Patient Opinion have been very helpful throughout with this complicated and onerous task).

Some examples of Shiny with HTML, CSS, JavaScript, and jQuery

It’s becoming rather Shiny-centric, this blog, I know, but I just put some little demos about how to use Shiny with HTML, CSS, JavaScript, and jQuery, just some toy examples with links to source, up on the website.

So if you’re interested in Shiny go have a look. Each has a link to the interactive version and the source. Do note that there are easier/ better ways of doing some of these things (row highlighting can be done in CSS rather than jQuery, e.g.); these are really just for educational purposes.

Citrix on Ubuntu 12.04

One for Linux users forced to use Citrix to log in to their corporate email, such as myself. I’d long given up trying to get Citrix to work on Ubuntu 12.04 on my laptop, but a recent success with 13.04 on my main computer led me to have another go. I followed the instructions here https://help.ubuntu.com/community/CitrixICAClientHowTo, got some weird error message, followed the instructions here http://forums.citrix.com/thread.jspa?threadID=306353&tstart=0 and BOOM, I was in.

That’s Linux’s super-awesome hot breath you can feel on your neck, Windoze.

What’s up with chrisbeeley.net?

I’ve had my own server for a little while now, with a very scabby and unattractive webpage on it, some Java I haven’t got round to turning into an applet, a couple of random Shiny apps, and that’s about it.

I’m very busy at the moment writing a book so I’m afraid I haven’t really had time to work on any of these side projects. I have, however, done loads of cool things with the actual server, so I thought I might mention these. Having your own server is like all the best things in life, once you’ve got it you don’t know how you ever got on without it. So, what are all the exciting things that my server does?

  1. It’s of course a place to put up random things that I do. I have already given people access to Shiny applications straight from my server and this has been very useful. I occasionally create other types of web reports in plain HTML that need sharing too, so that’s where they’ve gone
  2. I host my own WordPress server. I used to be over at wordpress.com but now this blog lives right on the root of chrisbeeley.net. This is not useful in any way that I’m aware of but it’s pretty cool nonetheless
  3. I have just this minute set up my own blog aggregator. Google famously just killed Google Reader, and I used The Old Reader for a bit, but they’re going private because they can’t cope with the traffic (quite reasonably, I should add, I take my hat off to them for trying to serve so many users in the first place). So now I have MY VERY OWN RSS server, that I can access through any web browser. And nobody can close it because it’s mine. The name of the program is Tiny Tiny RSS and I salute the creator for open-sourcing
  4. As well as Shiny Server for hosting Shiny applications I have even now set up RStudio server which means I can use R through the RStudio IDE on any web browser in the world. This is very useful for corporate environments, that kind of thing, I don’t think it serves any purpose for me to do it, unless I get into any arguments in net cafes about statistical distributions, but it’s cool and it was pretty easy once I realised my firewall was configured wrongly.

So that’s me at the moment, doing a lot of work, and writing a book, learning a lot of stuff but not having any time to share any of it (got a 2 year old at home, too).

Watch out in September/ October time, I’ll have all sorts of weird and wonderful things to put up by then.

If anybody who knows me reads this and would like an RSS account on the server, just let me know. The loss of Google Reader was very painful to all of us and I’ve got loads of bandwidth going begging.