In defence of eating

One weird thing in remote/ pandemic times is that people have started turning their cameras off when they eat. I’m all for the right to turn your camera off, I think you should be able to do that at any time without giving a reason, but I think it’s a shame if people think that they can’t eat on camera. I have therefore been glad to eat several large, difficult to eat things in meetings on camera recently, to perhaps give others the idea that it’s okay if they want to do it too.

So far I’ve eaten a footlong Subway which was kind of falling apart really and today it was half of a pretty massive pizza which was also a bit lacking in the structural integrity stakes. So please join me if you wish, we all used to eat in front of each other in The Before Times, it’s totally natural and normal- but also if you don’t want your camera on because you’re eating or for any other reason that’s okay too.

In defence of dashboards

I’ve had lively debates about dashboards with various people, including someone in my own team, and somebody on Twitter just mentioned that dashboards are often not used (this blog post will be my response to this tweet, not the first time I’ve answered a tweet with a blog and very on brand for me 😉).

I should acknowledge at the top that I’m The Dashboard Guy. I’ve written books about making dashboards in R. It’s my role in the data science team in which I sit. Shiny in production. It’s my thing. So take all this with as much salt as you wish, I promise I won’t be offended.

People say that dashboards proliferate, and that nobody uses them. That is quite true in a lot of cases. I’d like to suggest why that is so, and talk about when people do use them.

The first thing to say is that in the NHS (which is where I have always and will always work) many staff are not engaged with data in general. They’re not engaged with data, they’re not engaged with analysis, they’re not engaged with reports, they’re just not engaged. They see data as a punishment, a “test” they cannot win. They can’t see the point of it and it’s just a distraction at best and over-critical and unfair at worst.

So my first question would be what do you replace the dashboards with? What will they engage with? I became The Dashboard Guy because we used to make a 300 page report by hand every quarter. It took literal person weeks and every recipient was only interested in their four page chunk. It was ridiculously inefficient. But everybody did want their four page chunk. Dashboards are useful for data that people want to see.

The team and I have recently built a classification algorithm for free text patient experience data. There is no way on earth anybody could use that without a dashboard because there are thousands of points of data and you are typically only interested in a few hundred at a time. So we built one. If they are using the algorithm at all, they’re using the dashboard (or someone else’s, it’s open source so you can DIY the dashboard if you want). If they’re not, that means they’re not looking at what their patient experience data is about, or they’re reading a 4000 row spreadsheet (and I do know people who have done/ are doing that).

So I think dashboards are very useful when you have a large highly structured dataset where everyone wants their own bit, and I’ve deployed perhaps three or four that have certainly been used by the individuals who wanted that data.

But something else that I think they can be used for is putting data science tools in the hands of your users. I just built a dashboard that allows you to pick how many LDA topics you want to use, and then shows you:

  • Term frequency for each topic in a graph
  • Five example comments from each topic

(I should say, it’s very rough and unfinished, but you get the idea). The idea of this is to allow the person who is interested in the data to make the topic model work themselves without writing R code. In this particular project all the data hasn’t come in yet anyway so the dashboard will contain more data in the future. It may be that the first tranche of data contains four topics, and the final dataset six. Building a dashboard makes the enduser part of that decision making process.

In fact, I actually build them for myself. A few years ago I was fitting a lot of structured topic models and it was such a pain in the neck fiddling around with the code that I just made a dashboard for myself so I could just sit and play around with it. The analysis itself took days, tens of hours, and the dashboard took an afternoon (to be fair, I’m probably a little faster than the average person with Shiny at least, because I have a lot of experience with it).

If your users are not using your dashboards, ask yourself why that is so. If they aren’t engaged with data or don’t care about the metrics then you have a bigger problem than dashboards. Would they engage with a report? An email? A bullet point in a team meeting? Build what they want, not what you think they want or should want.

Dashboards give you an opportunity to devolve analysis to your users. I work with a lot of text data. The users are the experts, not me. I want them to be able to do all the stuff I do in code but in a browser- sort, filter, summarise, aggregate, and even tune models (do they want five topics or fifteen? Do they want an overinclusive multilabel classification algorithm that captures all of a theme, or a conservative one that only shows them the highlights?).

In my opinion the dashboard is the icing on the cake. The team and I are building a very large, complex dashboard summarising many types of data, in the hope of driving customers between the types (patient experience data users looking at staff sickness levels, clinical effectiveness data users looking at all-cause mortality). But the work is in uniting the data, in producing analytics that are meaningful and useful. I don’t think it’s terribly important whether it’s a dashboard or a report or just played over the PA every Monday morning (there’s an idea 😉). Engage people in data and analytics first, and then serve that need using all the tools at your disposal.

Make statistics sexy

I’ve been enjoined to make statistics sexy. It’s really taking root in my brain.There are no easy answers in statistics; it’s a long, hard road. It’s rigorous, and honest, and that’s why I love it. Frank Harrell’s talk at NHS-R typified this. The price of truth can be very high- like binning all the data from your £1m study instead of doing what many do which is slice it into so many pieces that something looks shiny and publishing that.

At the same time there’s a lot of hype in ML and AI. There are serious minded people doing ML properly and making a real difference but there are also a lot of private companies selling the NHS snake oil in a new, fancy bottle and giving them overfit, janky, ungeneralisable models or even just selling them a pile of promises and a dashboard and leaving them with nothing.

How can the “I think you’ll find it’s more complicated than that” brigade compete with the “Our data science team showed that this model increased patient flow by 17%” (but please don’t ask us about pre-existing trend, or the Hawthorne effect, or the other 8 models they deployed that didn’t do much at all).

I truly have no idea, but I’m going to have a good go at finding out. Rest assured if I discover the answer I shall write it here first 😃

Data saves lives- and so does open code

I’m reading data saves lives. And look, it’s that thing that I say all the time, that I have pinned on my Twitter profile, the thing I’ve heard time and time again for a decade that everybody just stands around nodding at and nothing actually changes. I quote:

“Working in the open

Public services are built with public money, and so the code they are based on should be made available for digital pioneers across the health and care system, and those working with it, to reuse and build on.

Analysts should be encouraged to think from the outset of a project about how work can be shared, or consider ‘coding in the open’ for example, through use of open notebook science. This will include sharing of technical skills and domain knowledge through sites like Cross Validated and StackOverFlow, and sharing code and methodology through platforms like GitHub, will build high quality analytics throughout the system

Our commitments:

  • we will begin to make all new source code that we produce or commission open and reusable and publish it under appropriate licences to encourage further innovation (such as MIT and OGLv3, alongside suitable open datasets or dummy data) (end of 2021)”

My team does this, proudly. Does yours?

Pay and reward

I’m reading No Rules Rules, the Netflix management book. It’s really good, lots of interesting stuff in there about doing things differently and although clearly there’s a lot I can’t do in the NHS, there are lessons in there for sure.

I’ve just read the bit about paying people top of market rate and I’m so happy because it describes loads of people who are on $250,000, crazy money, moping around and leaving because they can get a 40% pay rise somewhere else.

How does 40% on top of $0.25m make you any happier? It’s a nonsense. But the people I work with and recruit don’t chop and change for money. They do what they do because it’s meaningful, and because they believe in the founding principles of the NHS.

So although I can’t pay top of market I don’t need to, either. I just need to ask people to do useful stuff, in the open, for everyone, and to do it their way with whatever support they need. And pay them properly, of course.

That I can do ❤️

That is, in fact, my jam 🎸🎸🎸

Graphs > tables

One phenomenon that I find very strange in the NHS (and elsewhere, probably, I’ve never worked anywhere else) is the obsession people have with having tables of numbers instead of graphs. I have encountered this absolutely everywhere. People really want to know whether something is 21.4% of the whole or 19.2% of the whole and they can’t tell by looking at the beautiful graph that you’ve drawn.

I saw an analysis today which had nine separate tables of proportions. I’m going to go out on a limb and say no human being can understand a thing of such complexity. Nine tables, each with three categories, 27 proportions given. You could fit the whole thing on one graph and it would be readily apparent how they compare with each other.

But no, people want to know is it 13% or 15%, even though in almost all cases the amount of precision far exceeds the confidence levels of the sample.

Your report needs to say “category A is found twice as often as C, whereas A and B are similar”. Not “category A is found 17.6% of the time, whereas C is found 9.2% of the time- on the other hand category B is found 19.5% of the time”. Just writing it is exhausting me, never mind trying to understand it from cold in a meeting.

There are of course rare exceptions to this rule, sometimes you really need to know that something is 13.5% of the whole. But you should be asking yourself more questions- how reliable is the measure? What is the sampling error associated with this estimate? Otherwise your 13.5% is 14.6% is 12.3%. And who is usually saying this, if anyone-me!

Stop punishing people with data

NHS data people know all about Goodhart’s law. First stated by Goodhart as the not very catchy

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes

It was then popularised by Strathern as

When a measure becomes a target, it ceases to be a good measure

So far, so ordinary. What else do we all know? League tables are a bad idea too. Somewhat related to Goodhart’s law, the problem with league tables is that they often ignore nuance. "Bad" schools aren’t bad, they just have an intake that starts way behind. "Bad" surgeons aren’t bad, they just take on the more complex cases or work in a geographical location where there are more complex cases. There are ways of trying to balance out this bias, "value added" in the case of schools (being married to a teacher, I hear a lot about this one), and case mix adjustment in the case of surgeons. Although these methods go some way to ironing out problems there is always the chance that an unmeasured variable is affecting the results and distorting the picture.

In my own area of work, patient experience data, league tables are a terrible idea because the services are all so different and they serve so many different people. Older people are, on average, more satisfied with healthcare. People who are detained under the mental health act for what can be 10 years are, unsurprisingly, often less positive about their healthcare.

I would like to add another law of data to the canon.

The more data is used as a punishment the less engaged those it punishes will be with all data

I’ve read so many beautifully written and persuasive arguments about the power of data, all, naturally enough, written by True Believers. People who can see how data can be used to inform service delivery and planning. If I’m honest I think I live in an echo chamber, filled with analysts and technical types, all passionate about the insights data can generate. But sadly the reality outside my little bubble is that mostly people have data done "to" them. Every quarter a manager or public body drops out of the sky and asks them for all these numbers.

They’re punished if the numbers are late. They’re punished if the numbers are wrong

And they’re punished if the numbers don’t show them in a good light even if everybody knows there’s a perfectly good explanation as to why the data looks like that (thinking back to value added and case mix adjustment above).

They get it all done, feeling harried and that the way the data is collected and scrutinised is unfair and makes their team/ department look bad, they cross their fingers and pray that it doesn’t come back with red pen on it three days later, and then they forget about it for another quarter.

And the really sad thing is that these people have questions that could be answered with data. Everybody makes suppositions and hypotheses all the time, it’s impossible not to, and with help and understanding they could refine their ideas and be more effective at their job.

But data has crashed through their window like a rabid dog and made a mess of the place and it never occurs to them to take that same dog to the park to see if it can help them find the jacket they left there last week

They’re just glad to close the door and lock it and forget about the whole ordeal until the next quarter.

I think we do need to keep promoting the value of data and analytics, and I obviously spend a decent chunk of my time either selling people that dream or (more often) fighting Python dependencies on a server to make that dream a reality.

But I think it’s just as important to stop beating people with data, to try to work with them to understand why their numbers look the way they do, and to try to make all processes that involve data more to do with learning and less to do with punishing people with an unfair analysis of what really goes on in their department.

Shoddy data

You know, naming no names because I’ll get in trouble but someone somewhere has paid for some data from a proprietary vendor and they’re shipping absolutely unusable garbage data.

They won’t fix it because “no-one else has complained”.

HAVE SOME PRIDE IN WHAT YOU’RE DOING. How about that? How about fixing it because it’s an embarrassment? I’m a random idiot in some random NHS Trust in the countryside and I couldn’t sleep if my database looked like what you’re sending.

I swear on my life the NHS could do 99% of this stuff better itself if we just dug in and had a go. Everyone’s in thrall to these people with expensive watches and glossy brochures but it’s all a confidence trick.

Managing data science teams in the trenches

We’re systematically devaluing management and leadership in the analyst space, and I encounter a lot of people who think that a day with no code written is a day wasted. I do write code, but I do a lot of other valuable stuff too. I’ve written this post as someone who is very new to managing people with my personal opinion, so just like my posts on proxied SSL with Apache, there is your caveat emptor 🙂

The people I manage like to get in the zone and write code. I think of it as like they’re digging a trench. They’re not getting out of the trench, they’re not sticking their head out, they’re just digging. And it’s efficient, they’re digging fast, and it’s rewarding too. They’re working on interesting problems and solving them and it feels good. My job is just to stick my head in the trench occasionally.

“What are you digging there? Oh yes, I like it, nice. You know, you could use that other tool over there on that bit. And actually, have you thought about trying this other technique? Let me show you, pass me the chisel. Okay, great. Let me know if you get stuck”

next trench… “Hey, that looks good, what is it? I like this trench, I like this bit. You know, you’re actually going the wrong way, though, you’ve gone off at an angle. Come up here and I’ll show you. See that trench over there? We’re trying to get there. Just jump back in and I’ll help you get it pointing in the right direction”

next trench… “Hey, this trench is taking ages, isn’t it?! We need to finish it soon, really. I think we maybe need to not finish some bits of it for now. Which are the really important bits, do you think? Okay, let’s keep this bit. This other bit, I know you love it, and I love it too. It’s really clever what you’ve done. But we can’t ship on time with it. I think we need to come back to it later, or maybe just chalk it up to experience and maybe we can do it next time”

This is pretty basic stuff, really. It feels a bit silly to have to spell it out. The point I’m trying to make is that the people getting in the trench occasionally are helping the people in the trench stay in the trench and that’s good for them and good for the project. And that’s my test for all the stuff that we do. Code review. Standup. Appraisals. Hack days. Overall, is it helping them stay in the trench, digging quickly and solving problems, or is it just a silly distraction?

I’ve never in my life had a manager who has the slightest idea what autoregression is or how to test the assumptions of OLS regression. I’m really happy now that I do manage people and that I do know what those things are, and I hope the people I manage can benefit from my understanding of what they’re doing (even if they usually know way more about it than I do- I can at least keep up if they explain it to me).

(if you’ve read my previous post and you’re paying attention you’ll realise that I’ve just contradicted myself by saying firstly that analysts should do more than just write code all day, and then said in this post that a manager’s job is to help analysts write code all day. Forgive me. In reality the people in the trench have got other people in the trench, or possibly in a neighbouring trench, and it’s their job to stick their head in that trench and help that person write code instead of needing to get out of their trench, before coming back to their own trench. Indeed, really, the trench visitor is in fact in their own trench, and someone else sticks their head in there occasionally, but the analogy gets a bit convoluted and silly at this point. I think you get the idea, even if it doesn’t translate to actual trenches within trenches within trenches)

What do we want from senior analysts in the NHS?

I’ve been meaning to write this for ages and I’ve just had a rather silly conversation on Twitter about managers and leaders in the NHS which has inspired me to get on with it. I think most people are agreed that we have A Problem with mangers and leaders in the analytic space. Much ink has been spilled in the subject, and I think that the two main conclusions of all this ink are broadly accepted by everyone. They are, briefly:

1: Managers often lack the analytical training and experience to critically engage with the data and analyses with which they are presented (or indeed, to do analyses well themselves) and this leads to decision by anecdote and hunch.

2: Analysts lack a clear career track that rewards them for developing their analytic skills rather than for their becoming managers, unlike, say, surgeons, who can be very successful by just being very good surgeons and aren’t expected to become managers (although there’s a bit of nuance even in there that I’m getting to so stick with me if you’re thinking I’m talking rubbish already)

I’m not really going to talk about point number one except to say that in my view part of the solution to this problem is to have analysts in more senior positions, including at board, and by that I mean actual analysts, not managers of analysts. Many people with much more knowledge and experience (and credibility) than me have said this already, so you didn’t really come here to read that. All this does relate to point number two, however, which I will talk about now.

Having advanced some way in my career as an analyst, I can say that my experience absolutely fits with the general point that people make about analyst careers. Quite honestly, there wasn’t really a career structure that I fitted into when I finished my PhD in 2009, and I’ve sorted of bobbed around doing different stuff. I’ve never looked at job and thought “That’s the job for me, I’m applying for that”. I’ve pretty much just made it up as I went along.

The thing I’m not sure about, however, is this idea that we need to reward analysts just for their analytical skills. Sometimes when I talk to people about this I get the idea that they’re promoting the idea that we’ll have these Python geniuses just sitting in a box doing linear algebra and deep learning and advancing up the career track like that. To me I think that misunderstands what we want from analysts in the NHS. I believe the likes of Google and Facebook do pay some very, very clever people to just work on really high end statistical and computational tasks, and they use the methods that these people are producing to increase revenue. I think we in the NHS look at that and we imagine that we can make that work for us. But the NHS is not Google, and most analytical teams are far too small to support an individual working that way. There may be scope to employ people in that capacity in the national bodies, which are much larger. I don’t claim to know much about that. But speaking as someone who works in a provider trust and who has worked with people all over the country in trusts and CCGs, I’m pretty confident in my belief that we actually want to reward people to do other stuff than just clever Python and machine learning.

We do want to employ people with high end technical skills. Source control. Statistics. Machine learning. But once they’re good at that stuff we want more from them. I don’t want them sitting in a box writing code all day. That is a much too narrow definition of working effectively. They need to be training. Mentoring. Doing high level design. Understanding the local, regional, and national landscape. Writing and winning funding applications. Even if they’re not line managing anybody they need to be aware of what more junior colleagues are doing, helping them prioritise and understand their work, and managing projects so they’re delivered on time and to spec.

And therein lies the heart of what I consider to be the mythology around this issue. This stuff is hard. Universities are churning out people who can write Python for data science at a rate of knots now. We’ve got it the wrong way round. Yes, those people at Google are terribly clever, and I couldn’t do what they do in a thousand years. But this stuff is hard too. Teamwork. Mentoring. Communication. Understanding what to do where, with whom, and how. These skills are incredibly valuable. Recruit to and reward technical skills, by all means. But ask yourself if you want more from your team members.