Shiny application in a morning

Sort of apropos of nothing, really, this, but I was doing some research with some colleagues the other day and to help out I built a shiny application in a morning. Munging the data etc. took ages as it always does but once I actually had everything clean and tidy and working laying an interface on top was the easy bit. What a fantastic tool it really is, completely changed the way we work with data now we can just throw stuff on the internet and let people play around just in the morning. My sincere thanks to all the devs at RStudio.

Nothing Earth shattering really, just a few tick boxes and some binomial distribution numbers, but should prove useful. It’s here, in case you’re curious, source code below (I’ll just post the server.R and ui.R files on top of each other for simplicity, you get the idea.

### server.R




variables = names(shinyData)[4:19]

shinyServer(function(input, output) {
  myValues = reactive({
    if(length(input$prison) > 0){
      subData = subset(shinyData, Prison %in% input$prison)
    } else {
      subData = shinyData
    if(length(input$traits) > 0){
      for(z in input$traits){
        subData = subData[which(subData[,z] == 1),]
  output$outTable <- renderTable({
    Results = sapply(variables, function(x) rbind(prop.table(table(factor(myValues()[,x], levels=0:1))) * 100))[2,]
    CI = 100-sapply(variables, function(y) binom.test(table(factor(myValues()[,y], levels = 0:1)))[[""]])*100
    finalTab = data.frame("Estimate" = Results, t(CI[2:1,]))
    names(finalTab)[2:3] = c("Low CI", "High CI")
    xtable(finalTab, digits=1)
  output$nResponses = renderText({
    paste("n = ", nrow(myValues()))

### ui.R


  # Application title
  headerPanel("Summary of database results"),
                       c("A", "B", "C"),
                       selected = c("A", "B", "C")
                       c("Drug", "Psychosis", "Depression", 
                         "Anxiety", "BPD", "Alcohol", "PTSD", "OCD", "Bipolar", "Paranoid.PD", 
                         "Dependent.PD", "Dissocial.PD", "PD", "Referral", "Any", "NonDrug"),
                       selected = c(NULL)

Change we need to see in Government and public service IT

There’s a lot of material here, including a 22 minute talk, but if you’re even remotely interested in IT in government and public services then please have a look here (you can get all three by clicking through, but I’ve extracted each link for you). a truly open platform

The unacceptable in government IT

Chris Chant on the future

It’s alive!

Well, it finally launched today, the patient feedback website that we created as part of the Patient Feedback Challenge. The evaluation section was part of a partnership with the Involvement Centre of Nottinghamshire Healthcare NHS Trust and the web design agency Numiko to build a website which can help us to collect feedback data, summarise and report on it and share the actions and learning which have resulted. Go have a look.

Part of the philosophy of the feedback challenge was that the good practice should be “spreadable” and the website is all built using free and open source technologies (Linux servers, all data and graphs processed using R etc.). We plan to roll it out in as many Trusts as possible and will charge a small sum for redesigns and changes to data processing which will allow us to repurpose it for different Trusts as well as supporting further development of the site.

The use of free tools as well as the relative simplicity of the data processing ensures that we can add features relatively easily and over the next year we plan to overhaul the site in response to user feedback to make sure it meets the needs of Nottinghamshire Healthcare Trust staff, service users, and carers, as well as members of the public. Putting our feedback on the internet so anybody can view it and making archived data searchable is a powerful tool in ensuring the quality and accountability of services, as well as allowing the Trust to demonstrate its high levels of user satisfaction to other public bodies and members of the public.

There were two bits of coding work that made up the majority of my contribution, the quarterly report function which summarises feedback at all levels of the organisation in a series of hyperlinked webpages and the custom search function which uses the very simple Shiny package to allow users to query different types of data and search by service type, demographic feature, keyword, etc. The code for both can be found at my GitHub page. I hope to make time to comment the code a little better soon when I have a bit of spare time.

Feedback about the site is very welcome and contributions are actively being sought in order to include in a work plan for the site over the next year. First job- rewrite the custom search function to reflect changes in the shiny library since I started the project!

Data in the NHS

I wrote a little piece for an information pack related to the Patient Feedback Challenge so I thought I may as well share it here. I’m a little rushed so I won’t hyperlink the references, apologies for this they are all at the bottom in plain text.

NHS data systems are, quite naturally, built around very strict data management systems which emphasise the security of data and ensuring accountability of data loss or misuse. This is right and proper. However, very frequently, clinicians, managers, auditors and evaluators do not need to access highly sensitive personal information. In certain cases, information can be shared freely over the internet, as with the data collected on patient experience within the Patient Feedback Challenge (survey responses, Patient Opinion stories, and specific extracts from the PALS database).

The skills and technologies to rapidly analyse and disseminate data of this kind where rights access is not an issue is presently lacking in many NHS settings. Data analytic solutions, where they exist, are often large and complex and are complex and expensive to set up and modify. Where solutions are brought in from private sector providers these tend to be inflexible and it can be difficult for an organisation to achieve the results they want using off the shelf technologies (for example, integrating and resharing different types of data, such as the survey, Patient Opinion, and PALS data involved within the Patient Feedback Challenge).

A further limitation in the data architecture of many NHS organisations is the tendency for data to exist in silos, with each department maintaining its own separate database with no capacity to combine or reshare any of the data. Often the existence and location of datasets is unknown to others working elsewhere in a Trust.

Another problem endemic in many NHS settings is Spreadsheet addiction (e.g. Burns, 2004). This has been given as a possible culprit in JP Morgan Chase’s having lost 2 billion dollars in May 2012 (from

‘James Kwak, associate professor at the University of Connecticut School of Law… noted some interesting facts in JP Morgan Chase’s post-mortem investigation of the losses. Specifically, that the Value at Risk (VaR) model that underpinned the hedging strategy:

“operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another”, and “that it should be automated” but never was.’

The process of analysing and presenting Service User and Carer Experience survey data began in a similar fashion, with spreadsheets and macros being used to produce summary graphics, and reports manually cut and pasted together. Fortunately for us, before we lost 2 billion dollars this way there was a mishap with the process and it became obvious that the process needed to go from “should be automated” to “has been automated” in order to meet the reporting timescales. An incremental approach to automation was adopted, and it’s worth noting that three years ago the lead analyst on the survey had no programming experience whatsoever. Each quarter the survey reporting would demand another layer of complexity and the analyst would learn out of books and by accessing helpful online communities how to perform whatever new task was demanded. As with most technological movements, automation did not bring a reduction in workload but rather more sophisticated and better results, and within a few years reporting on survey data had gone from a task that was difficult for one person to perform quarterly alongside other tasks to a level of complexity that would demand more than one individual working full time for the whole of each quarter just to meet each quarterly timescale.

The patient feedback challenge has allowed us (with our friends at Numiko) to clear the final hurdle and to remove the last vestiges of human effort from the regular reporting, which means that quarterly survey reports and custom extracts will be available 24 hours a day, 365 days a year to anyone in the world with an internet connection. Managers will be able to see the latest results from their area and anywhere else in the Trust at any point in the reporting cycle and service users and carers will be able to transparently query and scrutinise all of the patient feedback data that we collect.

Two principles have guided the work on data architecture and processing. Firstly, all the technologies that were used are simple, free, and open source. Using languages such as R and Python and reporting technologies such as HTML and LaTeX means that not only are there no costs associated with buying or licensing software but also that analysis and reporting technologies are simple to use straight away. In a process known as agile software development (Beck et al., 2001) all of the technologies that have been produced have been put to immediate use. Early on, automation was only possible of graphics and manual effort was required to put together the report and write textual summaries. Later graphics and text were both automated and only formatting and final edits were performed by a human being. With the completion of the Patient Feedback Challenge now the formatting can also be automated. But at each stage the results of the technology were being used live in order to meet the demands of the organisation. This has the twin benefit of giving an immediate payoff to investment in workforce costs developing the software as well as giving the opportunity for the technology to be refined in use. The results of each stage of development were obvious when each reporting cycle was complete and work has been undertaken throughout to better serve both the Involvement team who deliver the survey results as well as the clinicians, managers, service users, and carers who make use of them.

The second principle is that all of the work which has taken place within the survey analysis and reporting and the Patient Feedback Challenge has left a legacy not only of technologies that can be used by the organisation but also systems within the organisation and skills and experience within the workforce which allow this work to prosper and flourish. Everyone involved in the data side of the project has put in place new systems to better capture and share data and learned valuable skills which will live within the organisation allowing further development work as well as new projects to be undertaken. A key feature of the whole Patient Feedback Challenge has been the development of staff and volunteers and the work with data and reporting was no different in this respect.


Beck et al. (2001). Manifesto for Agile Software Development. Agile Alliance. Retrieved 20-8-2012

Burns, 2004. Spreadsheet addiction. Retrieved 16-5-2012

Unstable net promoter scores

If you work in the NHS you’ll know that the “net promoter score” (aka the friends and family test- “would you recommend this service to family and friends?”) is coming to NHS services, indeed it’s hit the headlines with the coalition government promoting its use in the face of some pretty stiff opposition (e.g. here).

I have never been all that convinced by it, particularly the way the score is calculated by subtracting the percentage of detractors (those who would not recommend or unlikely to) from the percentage of promoters (those who would recommend). We’ve started using it in our survey and I have had a query because one area’s scores are jumping around a lot. I thought it was high time I looked at the actual scores. We don’t use a 10 point scale as many do, we use a 5 point scale, with the points being equal to “Extremely unlikely”, “Unlikely”, “Neither”, “Likely”, and “Very likely”. Net promoter is calculated by taking the proportion of “Very likely”s and subtracting the proportion of the bottom 3 categories summed.

I’ve looked at the methodology in two ways. Firstly, by comparing the net promoter method with just finding the average (allocating 1 to 5 points from “Extremely unlikely” to “Extremely likely”). It’s arguable how appropriate the average is given the negative skew to the responses, but it will have to do as a comparison. It should be noted that I’ve used two y-axes, one blue on the left for the net promoter methodology and pink on the right for the average methodology. One has to be very careful when using two y-axes, it’s easy to mislead the reader by moving the scales up and down, but really we’re just watching the lines go up and down together over time, so we should be safe here.

Here is the results, with each panel representing a different service (anonymised). Image

Actually, it looks pretty okay to me. You can see the service this query originated with, right at the top left, labelled “V”. The promoter scores jump about quite a bit more, but the inverse “V” shape is evident also in the average

Secondly, I’ve compared the question itself by comparing with our overall “How do you rate the service quality” question using the net promoter methodology. Here are the results of that. 


Again, the results look okay, but you can see again poor service “V” whose family and friends test promoter scores (blue) are jumping around but, interestingly, even with the promoter methodology their Service quality scores are pretty static.

A lot of the objections to the friends and family test (e.g. here) are based on preliminary psychometric investigations rather than “live” data, so this is an interesting addition to the overall debate.

So, if anything, it’s the question that appears to be problematic, I must admit I did think it would be the other way around. I’ll follow up this analysis when we have more data.