Adding line returns in RMarkdown in a loop

Another one that’s for me when I forget. The internet seems strangely reluctant to tell me how to do this, yet here it is buried in the answer to something else.

Sometimes you are writing an RMarkdown document and wish to produce text with line returns between each piece. I can never work out how to do it. It’s very simple. Just two spaces and then \n. Like this ” \n”. Here’s some real code with it in.

  team_numbers %>% 
    mutate(print = paste0(TeamN, TeamC, "  \n  \n")) %>% 
    pull(print) %>% 


Producing RMarkdown reports with Plumber

I wasn’t going to post this until I got it working on the server but I’ve got the wrong train ticket and am stuck in London St Pancras until 7pm so I thought I’d be productive and put it up now.

So I launched a new version of the patient experience dashboard. I forget if I mentioned it on here or not. I think not. It’s here and the code is here (I should add that I’ve launched a bit early because of an event so it will be a little buggy- forgive me for that).

One of the things we’ve done is get rid of the static reports that we used to host on the website, which we’re just generated as HTML, uploaded to the CMS and left there. We have preferred instead to switch to a dynamic reporting system which can generate different reports from a fresh set of data each time (I’ve barely started to implement different reports, so it’s very bare bones at the moment, but that’s the idea, at least).

One of my users, however, liked the way that the old reports would have links to all the team reports on. So she would send out one weblink to service managers and it would have all of the different teams on it and they could just click on each one as they wished. So I need to replicate this somehow. My thinking has gone all round the houses so I’ll spare you the details but basically I thought originally that I could do this with Shiny, generating URLs in each report that would contain query strings that could then be fed into the report generator, which means that clicking each link would bring back each team report. I can’t say for definite that it’s impossible, but I couldn’t figure out a way to do it because Shiny downloads are much more built around the idea that your user clicks a button to get the report. I could have had the link generate a Shiny application with a button in that you press to get the team report, but that’s two clicks and seems a bit clunky. Really, Shiny is not the correct way to solve this problem.

So I hit upon the idea of doing it with Plumber ( There are very few examples that I could find on the internet of using Plumber to generate parameterised reports with RMarkdown, and none at all generating Word documents (which everyone here prefers) so I’m offering this to help out the next person who wants to do this.

Just getting any old RMarkdown to render to HTML is pretty easy. Have a look at the Plumber documentation, obviously, but basically you’re looking at

#* @serializer contentType list(type="application/html")
#* @get /test
  include_rmd("test_report.Rmd", res)

If you run this API on your machine it will be available at localhost:8000/test (or whatever port Plumber tells you it’s running on). You can see where the /test location is defined, just above the function.

Easy so far. I had two problems. The first one was including parameters. For all I know this is possible with the include_rmd function but I couldn’t work it out so I found I had to use rmarkdown::render in the function. Like this:

#* @serializer contentType list(type="application/html")
#* @get /html
  tmp <- tempfile()
  render("team_quarterly.Rmd", tmp, output_format = "html_document",
         params = list(team = team))
  readBin(tmp, "raw",$size)

This API will be available on localhost:8000/html?team=301 (or whatever you want to set team to).

The RMarkdown just looks like this:

title: Quarterly report
output: html_document
  team: NA

`r params$team`

You can see you define the params in the YAML and then they’re available with params$team, which will be set to whatever your ?team=XXX search string is.

Okay, getting there now. The last headache I had was making a Word document. This is only difficult because I didn’t know to put the correct application type in the serializer bit on the first line that defines the API. You just need this:

#* @serializer contentType list(type="application/vnd.openxmlformats-officedocument.wordprocessingml.document")
#* @get /word
  tmp <- tempfile()
  render("team_quarterly.Rmd", tmp, output_format = "word_document",
         params = list(team = team))

  readBin(tmp, "raw",$size)

This will be available at localhost/word?team=301.

That’s it! Easy when you know how. All the code is on Git here

I’m pretty excited about this. I did it to solve a very specific problem that I had for one of my dashboard users but it’s pretty obvious that having an API that can return reports parameterised by query string is going to be a very powerful and flexible tool in a lot of my work.

I’ll get it up on the server over the weekend. If that causes more headaches I guess I’ll be back with another blog post about it next week πŸ™‚

Suppress console output with ggplot, purrr, and RMarkdown

So I posted a while back about producing several plots at once with RMarkdown and purrr and how to suppress the console output in the document.

Well, I just spotted someone on Twitter having a similar problem and it turns out that the solution actually doesn’t work in ggplot! Interesting…

For ggplot, you need to excellent function walk() which is like map() except it’s called for its side effects (like disk access) rather than for its output per se.

Bish bash bosh. Easy

```{r, message = FALSE, echo = FALSE}

walk(c("Plot1", "Plot 2", "Plot 3"), function(x) {
  p <- iris %>%
    ggplot(aes(x = Sepal.Length)) + geom_histogram() +

A world of #plotthedots and… what else?


Reproduced above is a recent exchange on Twitter. I’d better open by saying this blog post is not impugning the work of Samantha Riley or any other plot the dots people. On the contrary, the whole #plotthedots movement is an important part of a cultural change that is happening within the NHS at the moment. But I would like to take up the rhetorical device Samantha uses, to explore the issues that we have understanding data in the NHS.

Let’s take the tweet at face value. A world where every NHS board converted from RAG to SPC, along with CCGs and regulators. It’s worth noting that this would, in fact, be a substantial improvement on current practice. RAG rating presumably has its place as a management tool but as a data analytic instrument it is very lacking. RAG ratings treat 30 week waiting lists the same as 18 month ones- 16 hour waits in A and E the same as 6 hour waits. They give no help with interpretation in regard of trend or variance. In this new world boards will be able to distinguish chance variation from real changes. They will have due regard for both the trend and natural variation in a variable, and be able to adjust their certainty accordingly. This is all to the good.

But let’s think about all the things that we’ve left out here (I don’t doubt the #plotthedots people are quite aware of this, I’m just using the tweet as a jumping off point).

We’ve left out psychometrics. Are the measures reliable and valid? We don’t know.

We’ve left out regression. How much variance does one indicator predict in another? We don’t know.

We’ve left out multiple comparison. We reviewed 15 indicators. One shows a significant change. What is the probability that this is due to chance variation? We don’t know.

We’ve left out experimental design. We’ve reviewed changes in measures collected on four different wards- two of which have implemented a new policy, and two of which have not. Is the experiment well controlled? Are difference due to the intervention? We don’t know.

We’ve left out sampling theory. We have patient experience questionnaires in the main entrance of our hospital and they are distributed on wards on an ad hoc basis. Are the results subject to sampling bias? If yes, how much? We don’t know

We’re interested in producing graphics to help managers understand patient flow throughout our organisation. Excel can’t do it. What can we do? Nothing! In the bin!

I’m obviously exaggerating a little here, for effect, but the sad fact is that the NHS is in a very sorry state as far as its capacity for analytics goes. Many individuals who have the job title “analyst” in fact would more properly be called “data engineers”, in the sense that they can tell you very exactly what happened but don’t have a clue why. There’s nothing wrong with that, data engineering is a highly valuable and difficult skill, but it’s not analysis, and in fact career structure and professional recognition for true “analysts” (or, let’s dream really big here, “data scientists”) is sorely lacking everywhere.

I passionately want proper understanding of and engagement with data and statistical concepts right from board all the way across the organisation, and in fact I am busy at the moment in my own Trust offering in depth tutorials and one off lectures on concepts in data and statistics. I strongly support APHA’s aim to introduce rigorous professional accreditation of analysts.

Citing R packages in RMarkdown

I haven’t really done much in the way of citing papers in the last couple of years, I’ve spent my time either messing around with Shiny servers or databases or writing RMarkdown reports- and being horribly ill, of course, haha!- see this blog, passim. Don’t worry, I’m as fit as a flea now πŸ™‚

Anyway, I’ve been citing one or two papers today and I’ve found it very easy to cite both papers and R packages in RMarkdown, so I’m here to tell you not to ignore it because it’s too hard, just do it. Step one: add the bottom line of the following to your YAML header:

title: “Validation of complexity scores in AMH”
author: “Chris Beeley”
date: “21 January 2019”
output: html_document
bibliography: bibliography.bib

Now create a file called bibliography.bib in the same directory as the RMarkdown. You can either download the .bib off journal websites, or use a program dedicated for it (I didn’t bother, I’m just writing a few sides, not a thesis), or you can even write your own pretty easily by just looking at the template for a similar type of reference and hand editing the field. Here’s my hand edited bit for a paper that I referenced which didn’t have any snazzy download .bib on the website:

author = “Butler, J.”,
title = “Monitoring Community Mental Health Team Caseloads: a systematic audit of practitioner caseloads using a criterion based audit tool”,
journal = “Advancing Practice in Bedfordshire”,
year = “2005”,

You cite this in your RMarkdown using whatever identifier is on the top line, in this case I put the Butler2005 there, and in the RMarkdown you just write @Butler2005.

You can cite R by just typing citation() into the console and copying the .bib into your .bib file, and you can cite, say, the psych package by typing citation(“psych”).

Packages will look funny when you cite them because they’re usually referred to by name, not by author, so rewrite your reference like this:

with assistance from the psych [-@psych2018] package.

The minus excludes it from the actual text while still including it in the references at the end.

Just press “knit” and it will put the references in automatically, so you probably want to write

## References

above the end line or something like that.

That’s it! I just learned it from scratch, did it, and wrote this blog post in 10 minutes so you really have no excuse now, do you πŸ˜‰ ?

Dplyr function to replace nested ifelse (like SQL CASE WHEN)

I hate nested ifelse statements in R code. I absolutely hate them. They are just ugly and difficult to read. I found this lovely function in dplyr called case_when (which is like the SQL CASE WHEN if you know that) that banished them forever. It’s easier if I just show you (this is from the documentation)

x <- 1:50
  x %% 35 == 0 ~ "fizz buzz",
  x %% 5 == 0 ~ "fizz",
  x %% 7 == 0 ~ "buzz",
  TRUE ~ as.character(x)

How wonderful is that? No more nested ifelse!

Data science accelerator lesson one- build a pipeline and ship the code!

My exciting news is that I was accepted onto the data science accelerator and have been doing it since late December. My project, basically, is all about using natural language processing to better understand the patient experience data that we collect (and, if I have time, the staff experience data too). Here are the goals:

1) Using an unsupervised technique, generate a novel way of categorising the text data to give us a different perspective. We already have tagged data but I would like to interrogate the usefulness of the tags that we have used
2) a. Generate a system that, given a comment or set of comments, can find other comments within the data that are semantically similar. Note that this system will need to run live on the server, since it would be impossible to store the semantic similarity of every comment to every other comment
3) b. Generate a system that, instead of searching by word, searches by similarity to that word
3) Produce a supervised learning algorithm which can be trained on a sample of tagged comments and then produce tags for comments that it has not previously seen
4) a. Produce a sentiment analysis function that can tag every comment in a database with how positive or negative it is
4) b. Produce reporting functions that can compute overall sentiment for a group of documents (e.g. of a particular service area) and optionally describe the change in sentiment over time

I’m not really sure if I’m going to get through all of it but I’ve made a decent start. I’ve made a Trello board and there’s a GitHub too.

One of the things about the project I haven’t mentioned above is that I want to make something that can easily be picked up and used by other Trusts. There are loads of companies who want to charge money for NHS Trusts to use their black box but I’m trying to make something others can use and build on. So a lot of the work at the end will be on that.

Anyway, I’ll share the work at the end but I’ve learned loads already so I thought I’d talk about that. It’s the best thing I’ve done since my PhD in terms of learning (so I recommend your doing it!) so there are lots of things I want to talk about.

The first one isn’t super complicated or a revelation to anyone but it’s affected the way I work already. It’s this. Ship code! Ship it! Get it out the door!

Up to now to be honest my agile working has pretty much been that I spend six months on something, I release it, it’s terrible, and then I make improvements. One of the product managers at GDS told me that they ship code every week. Every week! I couldn’t believe it. So I’m trying to work like that. Doesn’t matter if it isn’t perfect, doesn’t matter if some of the functionality is reduced, just get something in the hands of your users. Then if they hate it you can avoid spending a month building something they hate.

And, something related to that, start with a pipeline. Right at the start of an analysis, start building the outputs. This helps you to know what all this analysis is actually going to do. It helps you to make the analysis better. And it gives you code that you can ship. Build something that works and does something and give it to your users. They will start giving you feedback before you’ve even finished the analysis. Too often we start with the analysis and only think about the endpoint when we’ve finished. Maybe it’s the wrong analysis. Maybe what you’re doing is clever but no-one cares. Build a pipeline and get it out the door. Let your users tell you what they want.

More on this as the project proceeds

New Year’s post

So this is my annual New Year’s post, it’s an idea from David Allen that I’ve done before.

2018 was the year that I was finally (pretty much) better. I had my bowel and spleen removed in August 2017 and bled in quite a scary way, but by the time 2018 rolled around I was running 9 miles, building mileage ready for a marathon in May. It’s been a great year. I did have some pretty scary health problems (that I won’t go into) but it all worked out in the finish and I’m pretty much back to working and being happy with my family just as I was all the way back in 2014 before everything started to go wrong.

So the first big news of 2018 was I got a new job (March). I’m now half time in my old job, and half time in the new one. They both link together, in that I do Shiny code and server admin to facilitate several dashboards that we use for staff and patient experience and some clinical stuff too. I’m absolutely loving both jobs. We’re doing a lot of stuff that is pretty new in the NHS, we’re the first Trust that I’m aware of with a Shiny Pro licence, and I’ve talked to people all over the country about what we’re doing. My Trust is very supportive and it’s all going really well.

I suppose my next big thing was running a marathon in May. It wasn’t as fast as I would have liked (four hours and thirty three minutes), but I did have quite a few pretty serious problems with being ill so I did pretty well considering. I’ve got another one in 2019, more on which later. Next up was the British Transplant Games (July). It was my first time competing and I won bronze at 5K and 1500m, which was very nice. My big goal now is to qualify for the world games, I’m guessing silver would be enough to do that, partly depends on the time I guess, too.

For the first time in my life I have a savings account, which is absolutely great, and I’m trying (and failing) to save three times my salary by February 2019. My wife and kids are a lot happier now I’m better, we all really went through hell so it’s been a great year just doing normal stuff you can’t do when you’re ill, like go abroad.

And the most recent thing that’s exciting is starting the data science accelerator. I keep meaning to blog more about it, I’m overbusy just doing it, but there’s a GitHub with some of the first steps on here. Text analysis is really easy to do, and really hard to do well, so I’m really glad that I’ve got a mentor to guide me through it. I’m working really hard trying to build some robust analysis and hopefully deploy some sort of reusable pipeline that other NHS Trusts can use. I’ve been dabbling in Python, too, which I really want to do more of. I feel like having other languages could help me build more and better stuff more easily. I’ve really bought into the whole agile development thing, as well, that’s a part of the accelerator, and one of the talks by a product manager was fascinating.

So that’s it for 2018. I’m starting 2019 strong. Very strong. My health was a teensy bit wobbly this time last year but I’m as strong as an ox at the moment. It really feels good after being frightened and weak for such a long time. At work I want to tie up all the threads and be a part of an agile data science team, producing robust statistical analyses using interactive web based technologies (like Shiny!).

Oh yes! That reminds me. Of course last year I also wrote a book and I started teaching statistics to Trust staff- advanced stuff in 12 one hour tutorials and a quick review in a 2 and a half hour lecture. The teaching has been going really well, I’ve been getting a lot of good feedback and I’m really hoping it helps my Trust do better with the data it collects.

I live a blessed life. All I want to do in 2019 is keep doing the same stuff I already do and love. Run a faster marathon (sub 4 hour). Get a silver medal. Be part of a bigger, better data science team. Keep doing all the stuff we’re doing with Shiny and text analysis and help other Trusts to do it, too. And more than anything else I want my family to just live a normal life and forget about all that stuff between 2015-2017.

Oh yeah. And I want to solve the Rubik’s cube, too. My eldest got one for Christmas and it’s got me hooked.

That’s me for 2019. Good luck with whatever you’ve got on your plate for the year πŸ™‚

A tidy text talk and some stuff from my Shiny book

This is just a quick post to show some of the stuff that I’ve published/ presented/ put on GitHub recently.

So my Shiny book, Web Application Development with R using Shiny, 3rd edition, is out, and I’ve forked the code from the publisher’s repository onto my own GitHub to make it easier to find.

And I’ve got another book about Shiny out which is all about UI development with Shiny, and I’ve forked the code from that onto my GitHub, too

I’ve also put the talk I did about text processing at the Nottingham useR onto my GitHub as well.

Find the Shiny book code here.
The UI development book is here.
And the tidy text talk here.

Drawing stacked barcharts the tidyverse way

Don’t judge me, but I do spend quite a lot of time making stacked barcharts of survey data. I’m trying to go full tidyverse (see this blog, passim) and there is a very neat way of doing it in the tidyverse which is very easy to expand out into functions, or purrr, or whatever. Of course, me being me I can never remember what it is so I end up reading the same SO answers over and over again.

So here, for all time, and mainly for my benefit, is the code to take a dataframe with the same Likert items over and over (in this case, Always/ Usually/ Sometimes/ Never) and find the proportions, change to percentages, make the x axis labels at an angle so they fit, put the stacking in the right order, and remove missing values. It’s pretty much completely generic, just give it a dataframe and the factor levels and it will work with any dataset of lots of repeated Likert items.

theData %>% 
  gather() %>% 
  group_by(key) %>% 
  count(value) %>%
  filter(! %>% 
  mutate(prop = prop.table(n) * 100) %>%
  mutate(value = factor(value, levels = c("Always", "Usually", "Sometimes", "Never"), 
                        labels = c("Always", "Usually", "Sometimes", "Never"))) %>% 
  ggplot(aes(x = key, y = prop, fill = value, order = -as.numeric(prop))) + 
  geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

Et voila!