Authenticating using LDAP from Active Directory using Shiny Server Pro

Right, I promised I would write this a very long time ago and I still haven’t done it. I’ve just got back from the NHS-R community conference (see an upcoming post) and I’m in a sharing mood, and besides someone just asked me about this on Twitter.

So setting up LDAP based authentication on Shiny Server Pro via Windows Active Directory. Do note that this post is about using Active Directory. If you’re not doing that a lot of this stuff will not apply.

I made the most most horrible mess of this when I tried, because quite frankly I didn’t have the slightest idea what I was doing. I’m sure your setup will be different from mine, but I can offer some general advice. Do read the documentation as well, obviously, this is only going to fill in the blanks, it’s not going to replace the docs.

The first piece of advice I can give you is don’t try to get it working all at once in the configuration file. Unless you’re some sort of Linux security genius it’s not going to work. Instead use the ldapsearch command to make sure that you’ve got the right server and your search string is correct. You’re going to need the help of IT who can tell you a bit about what the search string might look like. To give you a clue, here’s what I finally got working:

ldapsearch -H ldap:// -x -D “,OU=Group,OU=Group2,OU=Group3,DC=xxxx,DC=xxx,DC=nhs,DC=uk” -b “OU=Users All, DC=xxx,DC=xxxx,DC=nhs,DC=uk”  -w password “(&(cn=test.user))”

You can see the location of the LDAP server at the beginning, I had to take the details out because I was told that publicising the server name poses a security risk. That first CN=User.Name is the LDAP user. You can just use yourself or IT can set you up a special LDAP login (you will probably once to switch to this once you’ve got everything working). Then in that string the next bit is where the user is in terms of their place on the tree and the DC bit is the domain and subdomain split up into bits. Again I’ve redacted it. Imagine it might say something like DC=mytree, DC=myorg, DC = if the LDAP server was on

Then the string after, the bit that says “OU=Users All…” shows where the person you’re searching for in the tree is. You can search for yourself, or someone else. There may be more than one OU group, depending on what IT tell you, and then the DC bit is the same as it was in the previous bit. Then you need -w password (this is yours or your LDAP user’s password) or if you don’t want to put your password into the terminal use

-W which will prompt for your password. And the last bit as you can probably guess is the username of the person that you’re searching for. Once it works it will bring back a big list of all the groups and stuff that the user is registered in.

So now you’ve got that working you can edit /etc/shiny-server/shiny-server.conf to add the authentication stuff. You probably already did something with it setting up application directories etc.
The thing that I got working was as follows (again with bits redacted, I’m afraid):

auth_active_dir ldap://,DC=xxxx,DC=nhs,DC=uk{

  base_bind “,OU=Location,ou=GroupXXX,OU=GroupYYY,{root}” “LDAP_user_password”;
  user_bind_template “{username}@xxx”;
  user_filter “sAMAccountName={username}”;
  user_search_base “OU=GroupXXX”;
  group_search_base “OU=GroupXXX,OU=GroupYYY”;

Do have a look at the examples in the documentation as well, it will help you to understand what yours might look like. The base_bind line has the actual user name in the first bit, e.g. CN=chris.beeley or CN = ldap.user, and the actual password at the end. This is not very secure. Your connection is also being made over LDAP, not LDAPS. Also not very secure.

Let’s fix both those problems now. Add an “s” to the server name at the beginning (ldaps://…). Then you’ll need to point to a certificate on the trusted_ca line. IT should be able to give you one. Note that it’s a ROOT certificate you need. IT gave me loads that were not root certificates which did not work. I ended up downloading it myself from the Intranet through Chrome which messed up the domain name and I never quite resolved it to be honest. It works, that ended up being enough by the end. Note also that the certificate must be in PEM format.

auth_active_dir ldaps://,DC=xxxx,DC=nhs,DC=uk{

  trusted_ca  /…pathtodir/certificate3.pem;
  base_bind “,OU=Location,ou=GroupXXX,OU=GroupYYY,{root}” “/pathtopassword/password.txt”;
  user_bind_template “{username}@xxx”;
  user_filter “sAMAccountName={username}”;
  user_search_base “OU=GroupXXX”;
  group_search_base “OU=GroupXXX,OU=GroupYYY”;

You can see also that we’ve written the file path to a text file containing the password on the second line. It’s a good idea to modify the permissions of this so only root can read it, to keep it safe.
That’s it! You’ve done it! Well done! That took me weeks.

Wait! One more thing. You now have your staff putting their actual usernames and passwords into a web form over HTTP. This is VERY insecure. We need to secure over SSL. This is pretty easy. You just need a key and a certificate for the server from your IT department. Tell Shiny to listen on 443 (instead of, say, 3838) as below and put the key and certificate file path in. So the top of your config will look like this.

server {

  # Instruct this server to listen on port 443, the default port for HTTPS traffic
  listen 443;

  ssl /etc/shiny-server/KEY.key  /etc/shiny-server/certificate.cer;

… etc.

Your users can now connect at https://serveraddress

That’s really it now. Well done. Have a cup of tea to relax and celebrate a job well done. I apologise that this post is a little light on details, it will be very different I think depending on where you are, but if I’d read this post before I’d started it would have helped me, and I hope it helps you too.

Add label to shinydashboard

I feel like I’m sticking my neck out a bit here, and there’s a simple way to doing this that I haven’t found, but I’ve looked pretty hard and “add label to sidebar shiny dashboard” has basically no Google juice at all, and I should know because I’ve been staring at it for half an hour.

Sometimes you want to add a simple, static label to a shinydashboard sidebar. If you just add it with p() it isn’t aligned nicely with the rest of the sidebar controls.

You can add something dynamic with sidebarMenuOutput, and you could do that as a long way round, but I got to thinking that there must be a simple way of doing it. I ended up looking at my shinydashboard in developer view in Chrome, and just stealing the markup from there. Once you’ve done that it’s very simple.

div(class = "shiny-input-container", 
      p(paste0("Data updated: ", date_update))

This code just shows a pre-calculated value of the date when the data was updated. I stole this idea from a colleague because sometimes the cron job that updates the data chokes and nobody notices for a while.

The analysts’ manifesto

I was at an event a little while ago and there was talk of change coming for healthcare analysts. With the advent of population health management we were going to finally get the recognition we as a profession deserve and would get the training and tools necessary to deliver the improvements we all know that data can give.

I put my hand up and said, in essence, we’ve heard it all before and I’ll believe it when I see it. I’m tired, frankly, of reading policy documents that lay out all these amazing new ways of working that just never happen because we’re not given the time and space by senior management and have to spend our time running around sending all this pointless data to all these different bodies that a lot of the time seem to just throw the data on the pile and then forget about it.

To the speaker’s great credit, they accepted my challenge and I spoke to them later on. We agreed that we would work on a manifesto for healthcare analysts, with me gathering submissions from the analytical community and coordinating it into a document.

This is the first draft. I invite anyone and everyone to criticise, challenge, build on, edit, rewrite, whatever, what is here. This is nowhere near the final version but I thought I would get it out there now and see if anybody has any reactions. I’ve been working with several other analysts so it’s not all me but the writing is and you can blame me for the strident tone.

The analysts manifesto

There’ll be more of this in the months to come, we’re going to take it out on the road, so to speak, and in particular I want leaders to see it. Too often this stuff is handed round healthcare analysts at the coalface who already know this stuff. I want our leaders to know what we need and what we can do, and this is the first step.

Python, working together, and productionising code at Nottinghamshire Healthcare NHS Trust

Someone just asked me a question on Twitter and I was most of the way through what would have been several tweets before I thought perhaps it would work better as a blog post. I’m going to answer the question first, and then talk some more about the context and what I’m hoping to achieve where I am and (with the advent of more collaborative working through the ICS) in the wider system. So I’ve been using scikit-learn to train a classifier to predict non attendance at healthcare appointments and the question to me was “What made you choose Python rather than R for this one? Which classifier are you using?”.

I’m using a couple of classifiers, to be honest I’m really just messing around but I always like to tweet and blog at the beginning because you end up having useful conversations that can save you time later on. I’m sort of aping this excellent paper and have got some pretty good preliminary results using gradient boosting, but as I say I am just kind of messing around at the moment.

The reason I’m using Python instead of R is twofold. The first reason is that I have been using Python more and more doing text mining (for a very brief outline that I will come back to see this blog post). Python is a little bit ahead in terms of the text mining algorithms, because that’s what people are using, and for a while I would do all the data cleaning and outputs (think Shiny) in R, just nipping into Python for four lines to run the model. The problem with doing that is that my Python is so bad that I often can’t even make the model run. Or even if it runs I can’t necessarily get the outputs or plots that I need. So I’ve realised I just need to rip off the plaster and learn Python. Properly, end to end. So when this predicting DNA with machine learning thing came along it seemed the perfect project in which to build a Python pipeline. I could do it in R in a fifth the time, I’m sure, but I wouldn’t be learning to use Python to help with the text mining stuff, so I’m being discplined and torturing myself learning this new language (it’s hard work but it’s not torture, obviously, I’m having a great time).

There is a bigger reason, however, which is what motivated this blog post. People in the system are using Python. The people who wrote that paper cited above used Python and they’ve indicated their willingness to look at sharing the code once they’ve tidied it up a bit. Some other people doing something else I haven’t talked about yet (but will) are also using Python. I had a bit of A Moment the other day when I realised that it’s no good just building reusable, sharable technology. That’s only the first step. In a way, that’s the easy bit. Easier bit. We need people out in the system who can actually use it. Can you just go to any provider Trust in the country and say “Hey, I fitted a machine learning to predict DNA with scikit-learn, here’s the code/ a Docker image” and they’ll just say “Thanks very much, we’ll productionise it in the data warehouse next month”? Nope! And they definitely won’t say “That’s very interesting, we’ve compared it with some of our internal testing and we’ve improved the model, we’ve already submitted a pull request”.

We have a saying where I come from, the Involvement team, I’m sure they got it from somewhere but I heard it from them first. “If you want to go fast, go alone. If you want to go far, go together”. Learning Python for me is just a tiny step towards me being part of the solution to the problem of “We have all these amazing machine learning models ready to rock out in the wilds of the NHS, but who on Earth is going to get them set up and improving lives/ saving money?”

As with quite a lot of my blog posts at the moment, I am aware that I’m talking about tackling huge, systemic problems, and I’m in no doubt that I’m the absolute tiniest cog in the whole system, so it does feel a bit weird talking about all these huge, system level changes. But be the change you want to see, that’s what I always say, so I’m doing my best.

A note for Kindle readers of my Shiny book

Someone has been in touch with me to say that the Kindle version has no chapter numbers and they were finding it difficult to figure out which bit of the book went with which bit in the repository.

I’ve made a little video just to run through all the chapters and which code file goes with which bit of the book. While I was doing it I noticed the minimal example from chapter 2 is missing. It’s only small so I don’t think anybody will be too bothered about typing it but I’ve put it on a Gist anyway.

Hopefully that should resolve any problems with the book and the code, do let me know if not, on here or just DM me @chrisbeeley on Twitter.

The video is here:

The future of patient experience data at Nottinghamshire Healthcare

I’ve talked about my plans for patient experience data in quite a few different forums recently, but it just occurred to me that it isn’t written down anywhere.

Anyone who’s spent any time with me in meetings will know I’m a stickler for writing things down, so here it is.

I launched a Shiny application to explore our patient experience data about 6 years ago. It was pretty early days for Shiny and it seemed pretty whizzy and exciting at the time. Over time I added more features but it seemed less and less whizzy and exciting, and to be honest I was too busy nearly dying twice to worry about it (see this blog, passim, don’t worry I’m absolutely fine now)·

Anyway, I’ve now launched a new version built with shinydashboard which can be found here It launched a bit early at an event, so excuse any rough edges or bugs.

So what’s the big idea, then? Well, it’s twofold. Firstly, I would like to integrate the staff experience data. My Trust has recently started staff about the experience they have working at the Trust and I have made sure that these datasets are tightly integrated. Ultimately I would like to build a dashboard that shows how staff and patients are on the same application. This will be of interest, e.g. to ward managers, who want to know how their staff and patients are. Further, it will facilitate analysis of the connection between staff and patient experience. The key design feature is summarising the data and allowing rapid drill down. So a service manager would start on a page with staff and patient experience data, but would notice something interesting, click it, and then be presented with more detail about that thing. Further clicks would lead to further drill down.

So far, so ordinary. This is what all those private sector dashboard salespeople with the expensive watches they bought with taxpayers’ money all say. What I want to add as secret sauce (which will of course be open source, haha) is what I’ve taken to calling a recommendation engine for patient feedback. So when you buy something on Amazon it will say “people who bought that bought this, too.” And then you buy it and love it (I’ve bought loads of stuff using the Amazon recommendation engine. And listened to great music using Spotify’s).

The big problem we have with experience data is discoverability. We have maybe 200,000 comments. How do you find the one you want? I’m trying to use machine learning to have the computer find other things you might be interested in. Reading some feedback about waiting lists? Here’s some more. Reading angry feedback? Here’s some more angry feedback. And feeding this engine staff and patient feedback data could make for a very powerful system. Lots of people are angry at their doctor? Well, look, all the doctors are fed up to. Join it with other systems. Food’s terrible? Well, look, 10% of the estates staff are off sick at the moment. 10% of the estates staff are off sick? Well, look, three months ago they were all miserable. And so on.

It’s a very ambitious dream, but that’s my vision, and I’ll try as far as possible to build technologies that other Trusts can use. That is in itself quite tricky, so progress with that one in particular might be slow, but I’m trying to get more people on board. See how it goes.

Adding line returns in RMarkdown in a loop

Another one that’s for me when I forget. The internet seems strangely reluctant to tell me how to do this, yet here it is buried in the answer to something else.

Sometimes you are writing an RMarkdown document and wish to produce text with line returns between each piece. I can never work out how to do it. It’s very simple. Just two spaces and then \n. Like this ” \n”. Here’s some real code with it in.

  team_numbers %>% 
    mutate(print = paste0(TeamN, TeamC, "  \n  \n")) %>% 
    pull(print) %>% 


Producing RMarkdown reports with Plumber

I wasn’t going to post this until I got it working on the server but I’ve got the wrong train ticket and am stuck in London St Pancras until 7pm so I thought I’d be productive and put it up now.

So I launched a new version of the patient experience dashboard. I forget if I mentioned it on here or not. I think not. It’s here and the code is here (I should add that I’ve launched a bit early because of an event so it will be a little buggy- forgive me for that).

One of the things we’ve done is get rid of the static reports that we used to host on the website, which we’re just generated as HTML, uploaded to the CMS and left there. We have preferred instead to switch to a dynamic reporting system which can generate different reports from a fresh set of data each time (I’ve barely started to implement different reports, so it’s very bare bones at the moment, but that’s the idea, at least).

One of my users, however, liked the way that the old reports would have links to all the team reports on. So she would send out one weblink to service managers and it would have all of the different teams on it and they could just click on each one as they wished. So I need to replicate this somehow. My thinking has gone all round the houses so I’ll spare you the details but basically I thought originally that I could do this with Shiny, generating URLs in each report that would contain query strings that could then be fed into the report generator, which means that clicking each link would bring back each team report. I can’t say for definite that it’s impossible, but I couldn’t figure out a way to do it because Shiny downloads are much more built around the idea that your user clicks a button to get the report. I could have had the link generate a Shiny application with a button in that you press to get the team report, but that’s two clicks and seems a bit clunky. Really, Shiny is not the correct way to solve this problem.

So I hit upon the idea of doing it with Plumber ( There are very few examples that I could find on the internet of using Plumber to generate parameterised reports with RMarkdown, and none at all generating Word documents (which everyone here prefers) so I’m offering this to help out the next person who wants to do this.

Just getting any old RMarkdown to render to HTML is pretty easy. Have a look at the Plumber documentation, obviously, but basically you’re looking at

#* @serializer contentType list(type="application/html")
#* @get /test
  include_rmd("test_report.Rmd", res)

If you run this API on your machine it will be available at localhost:8000/test (or whatever port Plumber tells you it’s running on). You can see where the /test location is defined, just above the function.

Easy so far. I had two problems. The first one was including parameters. For all I know this is possible with the include_rmd function but I couldn’t work it out so I found I had to use rmarkdown::render in the function. Like this:

#* @serializer contentType list(type="application/html")
#* @get /html
  tmp <- tempfile()
  render("team_quarterly.Rmd", tmp, output_format = "html_document",
         params = list(team = team))
  readBin(tmp, "raw",$size)

This API will be available on localhost:8000/html?team=301 (or whatever you want to set team to).

The RMarkdown just looks like this:

title: Quarterly report
output: html_document
  team: NA

`r params$team`

You can see you define the params in the YAML and then they’re available with params$team, which will be set to whatever your ?team=XXX search string is.

Okay, getting there now. The last headache I had was making a Word document. This is only difficult because I didn’t know to put the correct application type in the serializer bit on the first line that defines the API. You just need this:

#* @serializer contentType list(type="application/vnd.openxmlformats-officedocument.wordprocessingml.document")
#* @get /word
  tmp <- tempfile()
  render("team_quarterly.Rmd", tmp, output_format = "word_document",
         params = list(team = team))

  readBin(tmp, "raw",$size)

This will be available at localhost/word?team=301.

That’s it! Easy when you know how. All the code is on Git here

I’m pretty excited about this. I did it to solve a very specific problem that I had for one of my dashboard users but it’s pretty obvious that having an API that can return reports parameterised by query string is going to be a very powerful and flexible tool in a lot of my work.

I’ll get it up on the server over the weekend. If that causes more headaches I guess I’ll be back with another blog post about it next week 🙂

Suppress console output with ggplot, purrr, and RMarkdown

So I posted a while back about producing several plots at once with RMarkdown and purrr and how to suppress the console output in the document.

Well, I just spotted someone on Twitter having a similar problem and it turns out that the solution actually doesn’t work in ggplot! Interesting…

For ggplot, you need to excellent function walk() which is like map() except it’s called for its side effects (like disk access) rather than for its output per se.

Bish bash bosh. Easy

```{r, message = FALSE, echo = FALSE}

walk(c("Plot1", "Plot 2", "Plot 3"), function(x) {
  p <- iris %>%
    ggplot(aes(x = Sepal.Length)) + geom_histogram() +

A world of #plotthedots and… what else?


Reproduced above is a recent exchange on Twitter. I’d better open by saying this blog post is not impugning the work of Samantha Riley or any other plot the dots people. On the contrary, the whole #plotthedots movement is an important part of a cultural change that is happening within the NHS at the moment. But I would like to take up the rhetorical device Samantha uses, to explore the issues that we have understanding data in the NHS.

Let’s take the tweet at face value. A world where every NHS board converted from RAG to SPC, along with CCGs and regulators. It’s worth noting that this would, in fact, be a substantial improvement on current practice. RAG rating presumably has its place as a management tool but as a data analytic instrument it is very lacking. RAG ratings treat 30 week waiting lists the same as 18 month ones- 16 hour waits in A and E the same as 6 hour waits. They give no help with interpretation in regard of trend or variance. In this new world boards will be able to distinguish chance variation from real changes. They will have due regard for both the trend and natural variation in a variable, and be able to adjust their certainty accordingly. This is all to the good.

But let’s think about all the things that we’ve left out here (I don’t doubt the #plotthedots people are quite aware of this, I’m just using the tweet as a jumping off point).

We’ve left out psychometrics. Are the measures reliable and valid? We don’t know.

We’ve left out regression. How much variance does one indicator predict in another? We don’t know.

We’ve left out multiple comparison. We reviewed 15 indicators. One shows a significant change. What is the probability that this is due to chance variation? We don’t know.

We’ve left out experimental design. We’ve reviewed changes in measures collected on four different wards- two of which have implemented a new policy, and two of which have not. Is the experiment well controlled? Are difference due to the intervention? We don’t know.

We’ve left out sampling theory. We have patient experience questionnaires in the main entrance of our hospital and they are distributed on wards on an ad hoc basis. Are the results subject to sampling bias? If yes, how much? We don’t know

We’re interested in producing graphics to help managers understand patient flow throughout our organisation. Excel can’t do it. What can we do? Nothing! In the bin!

I’m obviously exaggerating a little here, for effect, but the sad fact is that the NHS is in a very sorry state as far as its capacity for analytics goes. Many individuals who have the job title “analyst” in fact would more properly be called “data engineers”, in the sense that they can tell you very exactly what happened but don’t have a clue why. There’s nothing wrong with that, data engineering is a highly valuable and difficult skill, but it’s not analysis, and in fact career structure and professional recognition for true “analysts” (or, let’s dream really big here, “data scientists”) is sorely lacking everywhere.

I passionately want proper understanding of and engagement with data and statistical concepts right from board all the way across the organisation, and in fact I am busy at the moment in my own Trust offering in depth tutorials and one off lectures on concepts in data and statistics. I strongly support APHA’s aim to introduce rigorous professional accreditation of analysts.