A tidy text talk and some stuff from my Shiny book

This is just a quick post to show some of the stuff that I’ve published/ presented/ put on GitHub recently.

So my Shiny book, Web Application Development with R using Shiny, 3rd edition, is out, and I’ve forked the code from the publisher’s repository onto my own GitHub to make it easier to find.

And I’ve got another book about Shiny out which is all about UI development with Shiny, and I’ve forked the code from that onto my GitHub, too

I’ve also put the talk I did about text processing at the Nottingham useR onto my GitHub as well.

Find the Shiny book code here.
The UI development book is here.
And the tidy text talk here.

Drawing stacked barcharts the tidyverse way

Don’t judge me, but I do spend quite a lot of time making stacked barcharts of survey data. I’m trying to go full tidyverse (see this blog, passim) and there is a very neat way of doing it in the tidyverse which is very easy to expand out into functions, or purrr, or whatever. Of course, me being me I can never remember what it is so I end up reading the same SO answers over and over again.

So here, for all time, and mainly for my benefit, is the code to take a dataframe with the same Likert items over and over (in this case, Always/ Usually/ Sometimes/ Never) and find the proportions, change to percentages, make the x axis labels at an angle so they fit, put the stacking in the right order, and remove missing values. It’s pretty much completely generic, just give it a dataframe and the factor levels and it will work with any dataset of lots of repeated Likert items.


theData %>% 
  gather() %>% 
  group_by(key) %>% 
  count(value) %>%
  filter(!is.na(value)) %>% 
  mutate(prop = prop.table(n) * 100) %>%
  mutate(value = factor(value, levels = c("Always", "Usually", "Sometimes", "Never"), 
                        labels = c("Always", "Usually", "Sometimes", "Never"))) %>% 
  ggplot(aes(x = key, y = prop, fill = value, order = -as.numeric(prop))) + 
  geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))

Et voila!

Rplot01

Produce tables in loop

Another one for me before I forget, I seem to have terrible problems writing headings and tables in a loop in RMarkdown. Everything gets crushed together because the line returns don’t seem to work properly and either the table or the headings get mangled. Well here it is, the thing I just did that worked:


for(i in 1:2){

  cat("Heading ", i, "\n")
  
  cat("===========================")
  
  pandoc.table(iris)
}

That’s it! Don’t put any other weird “\n”s or [HTML line breaks which I now realise WordPress will interpret literally] in. You don’t need them!

The shinyverse

I’m pretty sure I’ve made up the word shinyverse, to refer to all the packages that either use or complement Shiny. This is a non-exhaustive list of shinyverse packages, mainly for my benefit when I inevitably forget, but if it’s useful for you too then that’s good too.

shiny.semantic adds support for the Semantic UI library to Shiny. Useful if you want your Shiny apps to look a bit different or there’s something in the library that you need.

shinyaframe doesn’t particularly sound like the kind of thing that everyone reading this will drop everything to do on a rainy weekend, but it’s awesome so it’s worth mentioning here. It allows you to use Mozilla A-Frame to create virtual reality visualisations with Shiny!

shinycssloaders does very nice simple loading animations for while your Shiny application is thinking about how to draw an output.

shinydashboard is so commonly used that I tend to think of it as what you might call “base-Shiny” (another terms I made up). If you’ve never made a Shiny dashboard I suggest you have a look.

shinyWidgets is a lovely package that extends the number of widgets you can use. I’ve used the date picker in the past but there are plenty to choose from.

shinythemes is another one that feels like base-Shiny. Change the appearance of you Shiny apps very easily by selecting one of several themes in the UI preamble.

shinytest is for testing Shiny applications.

shinymaterial allows you to use a UI design based on the Google Material look.

shinyLP allows you to make a landing page for Shiny apps, either to give more information or to allow a user to select one of several applications.

shinyjs. Another base-Shiny one. Easily add canned or custom JavaScript

shinyFiles allows the user to navigate the file system on the server.

bsplus allows you to add lots of things which are in Bootstrap but not in Shiny to your apps (and to RMarkdown, to boot).

rintrojs allows you to make JavaScript powered introductions to your app. Pretty neat.

crosstalk is seriously neat and allows you to use Shiny widgets that have been programmed to interact with each other, or write this interaction yourself if you’re a package author. Once you or someone else has done this, Shiny (or RMarkdown) outputs can automatically react to each other’s state (filtering, brushing etc.)

Filling in non missing elements by row (Survey Monkey masterclass)

There must be other people out there doing this, so I’ll share this neat thing I found today. Quite often, you’ll have a dataset that has a lot of columns, only one of which will have something in it for each row. Survey Monkey, in particular, produces these kinds of sheets if you use branching logic. So if you work in one bit of the Trust, your team name will be found in one column, but if you branched elsewhere in the survey because you work somewhere else, it is in another column. Lots of columns, but each individual has only one non-missing element in each (because they only work in one place and only see that question once).

In the past I’ve done hacky things with finding non missing elements in each column line by line and then gluing it all together. Well today I’ve got 24 columns so I wasn’t going to do it like that today. Sure enough, dplyr has a nice way of doing it, using coalesce.


coalesce(!!!(staffData %>%
               select(TeamC1 : TeamC24)))

That’s it! It accepts vectors of the same length or, as in this case, a dataframe with !!! around it. Bang bang bang (as it’s pronounced) is like bang bang (!!) except it substitutes lots of things in one place instead of one thing in one place. So it’s like feeding coalesce a lot of vectors, which is what it wants.

I really can’t believe it’s that simple.

In praise of awkward questions

I went to a conference last week, more of a meet up really, and they presented the results of the work that we’ve all been doing, indicating that there were several statistically significant improvements in the expected direction.

I’m sure the analysis is well intentioned and basically correct, so I didn’t really have any problem with it, but my arm shot up anyway, because I wanted to see more details of the analysis. The results were so good, I was just curious to see how they were so good really- what were the 95% confidence intervals, the sample sizes, alpha levels, just the nitty gritty of the analysis so I could really get all the detail of it.

But they didn’t have it. They didn’t have much on the slides, and they hadn’t brought any supplementary materials.

I don’t have a problem with that either. I think probably the reason why they didn’t is they don’t usually have stats wonks sitting at the back asking awkward questions. So they didn’t feel the need to be prepared.

I cut my teeth (whille doing my PhD) at academic conferences. Asking awkward questions about statistical analyses is a spectator sport there. And I love it. Everyone’s a black hat, just waiting to crawl inside your work and blow it apart. It’s like a duel, like a competition. And of course that’s how science works. Everyone’s desperate to prove you wrong, and if you can stay at the top, then fair play to you. Probably something in it.

There isn’t enough of that where I work, in the NHS. It really reinforced to me the need to keep going with what I’m doing. I want to train, face to face, everyone in my whole organisation who works with data. Two hours or fifteen, I want to equip them to ask awkward questions.

And one day, I want to sit at the back of the room with my arms folded and watch a whole roomful of arms shoot up.

THEN I can relax.

Jupyter, Python, interactive web frameworks, and more

A couple of days ago on Twitter I said the following:

“Increasingly, RStudio’s products are so good that I feel a lot of my advice to my organisation is “buy a lot of RStudio products”. I love RStudio (I have a tattoo of their logo on my arm, even!) and they clearly give a lot of stuff away (we used their products for nothing for *years*). But I wish I could at least acknowledge some competition in this arena. As far as I can see, if you want to develop a cutting edge data science team with R, it’s RStudio all the way.

I just feel like a brand ambassador rather than someone giving solid, independent advice. I think it’s good advice, don’t get me wrong, but what are the alternatives if you want to interact with R over an authenticated connection other than Shiny with Server Pro?”

I’ve been thinking about it more and more over the last few days and I think my perspective has shifted because my role in my organisation has changed a little. For quite a long time I’ve been churning out Shiny code (and PHP/ MySQL) to run our patient experience data portal. But now I’ve zoomed out a bit and although I’m still doing that I’m working on a couple of other projects and have started to think about how we build our team- using Git, coding collaboratively, enforcing a code style. I’m starting to think about tools.

Now I love RStudio. As I say in the tweet I have a tattoo of their logo on my arm. This is not a metaphor. An actual tattoo on my actual arm. But I’m starting to get concerned that I’m so focused on R and Shiny that I’m starting to miss the big picture. It’s my job to know about other approaches and to understand the benefits and drawbacks of each. Even if I end up saying “There are two other ways of doing this that don’t involve Shiny, but they’re both too difficult/ expensive/ unreliable/ whatever” then fine. But I just don’t feel comfortable advising my organisation without a wider view.

So I’m going to veer off a bit, test the water. I feel sure this will involve Python, Jupyter, and whatever reactive type programming thing Python people use (see? I’m clueless). I know a bit of Python so it shouldn’t be too arduous. And I’ll poke around the rest of the space too. Julia. Other ways of interacting with R that don’t involve Shiny.

As always, I’ll report back once I’ve made head or tail of it, which could be a while.

Producing several plots at once with RMarkdown and purrr



Purrr example






This is a very simple example of using purrr and RMarkdown to produce several plots all at once.

invisible( # suppress console output from hist()
  map(c(5, 6, 7), function(x) { # values to feed to filter function
    
    iris %>%
      filter(Sepal.Length < x) %>% # just Sepal.Length < 5, 6, and 7
      pull(Petal.Width) %>% # extract Petal.Width vector
      hist(breaks = 20, main = paste0("Histogram of < ", x)) # graph
  })
)



Shiny Pro in a Windows/ NHS environment, part 1: The curse of Kerberos

Right, here goes. Let’s tell the story. I’ve been working on this for months, not because it’s particularly hard, but because I really didn’t know what I was doing. We are a health trust in the UK (NHS), and we’ve been using Shiny on a privately hosted Linode server (Ubuntu) in the cloud for a long time. You can see it here. We’ve done this (hosted it in the wild) because none of the data is private. It’s patient experience data. We want it to be public. If anything, we want people to steal it. To be honest it’s looking a bit tatty. I was proud of it in 2012, now not so much. I’m working on a new version.

Anyway, we now want to use Shiny with actual patient data. This means two things. Firstly, we have to host the data behind our firewall. And secondly, we need to authenticate users. We don’t want just anybody in the Trust getting access to it. I should say that in truth, the information governance requirements are fairly mild, since all the data is aggregated and/ or pseudonymised. There is no personal data in the application anywhere. Anyway, we want to authenticate users.

So we went out and bought Shiny Server Pro. As a keen user of R and Shiny, I’m very excited about this. I think it very likely that we are the first Trust in the country to buy a Pro licence. There’s some debate in the Trust about R’s place in our BI ecosystem, which I won’t go into here, but nonetheless, we’re doing it, now, and it’s very exciting for me.

So it was my job to make it all work. This means two things. Firstly, we need the server to authenticate against MS SQL Server so we can fetch the data nightly from a data warehouse. If you use passwords, obviously, it’s easy, but my Trust likes to be more secure than that so Kerberos it is. And secondly we need to use the LDAP features of Pro to authenticate users against the Active Directory user database which the Trust maintains. This post will just be about Kerberos. I’ll write up the LDAP stuff another day (and add a link here once I do). The actual server, in case you’re wondering, is Ubuntu 16.04 running as a virtual machine on the Trust’s completely Windows setup.

Now, if you’re as ignorant as I was at the start it will help you to understand what Kerberos is. Basically, it’s a little program that lives on the Trust’s servers and it issues tickets if you’re allowed to do something. So all you need to do is install a client program which goes to Kerberos and says “Hello, I’m so and so, and my password is such and such, can I have a ticket please?” and Kerberos looks it up and if the server’s name is, as they say, “down”, it gives the server a ticket. This ticket lasts for a certain period of time (I think the Kerberos server gets to configure this) and it will let the client server do all the stuff on the ticket. So I need a ticket that says “let this guy into this bit of the data warehouse”, and Kerberos gives me a ticket, and then I can go to SQL Server, show the ticket, and be let in.

I’m going to do my best to explain how to configure it all. I’m no expert (obviously!) so I will probably miss bits out but I personally would have found a blog post like this that gave a practical sequence of what to do useful when I was learning, so I hope you will find it useful. And if you don’t, as usual it will help me in 6 months when I forget it all :-).

First things first, note that the clocks on the client and server must match. Mine did, so I didn’t worry about this. Right, let’s look first at where we’re trying to get to. Right at the top of my R code I want to run this:

odbcDriverConnect(connection =
  "Driver={SQLServer};
  server=serverName;
  database=dbName;
  trusted_connection=yes;")

This will run fine if you have the right Kerberos ticket, and not at all if you don’t. Next let’s make sure we’ve got some ODBC drivers. As a user of Shiny Server Pro, I get access to RStudio’s paid drivers, so I used them. Microsoft have some for download, too. I also get access to RStudio support, which I have to say is excellent. Install them. You’ll need to configure the name and file path in here: /etc/odbcinst.ini (the name of the driver in the odbcDriverConnect above is defined in this file).

Okay. So far so good. Now we’re getting to the more tricky parts. Next you’ll need a Kerberos client:

sudo apt-get install krb5-user

Ask your IT department about the location of their Kerberos server. You’ll need to put it in the [realms] section of /etc/krb5.conf, like so:

[realms]

EXAMPLE.ORG = {
kdc = example.org:88
}

Obviously replace example.org with whatever your IT department tell you. Note you put it in twice, the first time in capital letters. I’m not going to insult your intelligence by pretending I understand why that is.

And also here:

[libdefaults]

default_realm = EXAMPLE.ORG

Okay, getting there now. You’ll need a username and password from IT. To test Kerberos just run

kinit -p username@EXAMPLE.ORG

Again, the server name is in capitals. Again, I do not know why. You’ll be prompted for your password. Hopefully that should work. If it does, you now have a ticket. You can test your connection to SQL Server using isql (https://docs.oracle.com/cd/E88353_01/html/E37839/isql-1.html).

Wait, you’re still not done! I’m assuming, like me, you want to run the whole thing in a script, either because you want to run overnight as a cron job (as I did) or just because you don’t want to faff around with terminal commands every time you run an R script. Your credentials will expire, so we need to be able to get them again in the script. You can save your password in a script BUT DON’T. It’s bad practice, and I feel sure that your IT department will get cross. You need a keyfile. I’m not going to go through all of that, there’s a fair bit to it, but there is an excellent guide to using a keyfile here, go and follow it.

That’s it! You’ve done it! Now all you need to do is run the following at the terminal, replacing the /home/chrisbeeley bit with the name of your keyfile, and the scriptName.R with your script.

Before I go, there’s quite a useful page here, which goes into a bit more detail than I have https://www.easysoft.com/products/data_access/odbc-sql-server-driver/kerberos.html.

This took me WEEKS, because I DID NOT KNOW WHAT I WAS DOING. It’s my sincere wish that it helps you if you want to do it. I’m more than happy to talk it through with anyone out there, as I mentioned I’m not an expert but I might have had a similar problem to you. Find me @ChrisBeeley on Twitter. And that goes double for anybody who works in the NHS. It’s one of my career goals to support the adoption of Linux/ R/ Shiny in the NHS and if I can help at all, I will.

Tune in next time for the LDAP story, which was even more hideously difficult (at least, for my tiny brain).

Passing strings as variables from a Shiny app into dplyr

I’ve been sort of waiting until I understood this thoroughly, and I was going to write a very detailed blog post about it, and although I do understand it a lot better now than I did, I’m still not at the point where I would write an authoritative blog post about it confidently because there are too many things that I don’t understand.

However, I’m aware that there are people out there right now who are trying to write dplyr code that takes strings as variable inputs, passed in from a Shiny interface, and I know how difficult it is to Google, because I Googled it myself. So I’m going to at least show you what to do, and talk a bit about it, and then maybe I’ll come back to it when I’ve understood it better myself.

So you quite often have a combo box in Shiny applications that returns a variable name, and you want to ask dplyr to filter/ select/ whatever on that name. So you have the variable name in the form of a string, and you want to pass it into dplyr. The old way, which still works, but is now deprecated, uses the filter_ function which evaluates a whole string as if it were dplyr code. So, for example, you might do this:

library(tidyverse)

input = data.frame("variable" = "cty")

mpg %>%
  filter_(paste(input$variable, " > 18"))

input = data.frame("variable" = "hwy")

mpg %>%
  filter_(paste(input$variable, " > 18"))

I’ve given the dataframe the name “input” to make it look like a Shiny application. So imagine your user clicks on “cty” in your app, which makes input$variable equal to “cty”. Now you just paste that together with a filter condition (“> 18”) and pass the whole thing to filter_(). Now your user perhaps sets the variable to “hwy”, and the calculation can be done with the new value.

So what’s the new way? Well, as I mentioned I don’t understand it deeply, so read the following at your own risk, but essentially what you’re doing is described in this blog post about dplyr programming. You need to quote the variable names which sort of makes them variable-y (told you I didn’t really understand) and then you need to unquote them. Normally you would be doing this:


input = quo(cty)

mpg %>%
  filter(!!(input) > 18)

input = quo(hwy)

mpg %>%
  filter(!!(input) > 18)

So (I think!) you’re saying to R- hey, listen, R, input is cty as a variable name. Wherever you see input, pretend it’s cty but as a variable name. This is called quoting. Then you have to unquote, using !!(). So you’re saying- that thing I mentioned about input being cty quoted? Well, I want it now, unquote input (using !!()) and then use that variable name in the following.

So you’re basically doing that except! Except you’re quoting a string, not a bare variable name. To quote strings, use sym. Other than that, it’s the same. So do this:


input = sym("cty")

mpg %>%
  filter(!!(input) > 18)

input = sym("hwy")

mpg %>%
  filter(!!(input) > 18)

So now you can pass variable names from comboboxes in Shiny applications using dplyr.

I’m sorry I don’t have the detail as nailed as I would like, I wouldn’t advice sitting any exams after reading this, but it should get you coding, at least, which is the first step. If the fog ever truly clears for me I promise I’ll come back and write more.