Data science panel questions

I was on a panel the other day talking about data science careers (I guess there will be a link to it somewhere at some point, I’ll share it when I see it) and they’ve sent some questions out after the event because they didn’t get through them all.

I may as well answer them here where everyone can see them rather than just have the people who came read them (hmm… might be a general point about working in the open there somewhere 🤔😉). There’s quite a few so I’ll take a few goes to get through them all.

Q1. “To what extent is being an advanced user of Excel and experience of briefing/ presenting data to decision makers a good foundation for upskilling to data science?”

I don’t think using Excel to an advanced level helps at all. It’s a totally different mental model where each change is laboriously carried through cell by cell (which is why you always have loads of hidden tabs on complicated Excel applications). Using DATA to an advanced level clearly helps, whether it’s Excel, SPSS, or anything else. You need to learn (the hard way, usually) to ask the same basic questions every time. “Where did this data come from? Is it reliable? Is it complete? What does it look like if I plot it? Are there spurious correlations in it? What is my hypothesis as to why the data looks like it does? How can I *concretely* test my hypothesis against the data? Etc…”. I don’t want to get into an Excel rant because everyone’s heard it all before but I don’t touch Excel ever for any purpose other than totting up a column of numbers, and for reasons that I consider to be well reasoned and valid. YMMV.

Q2. “Does a data scientist need to be a good statistician?”

Definitionally, a data scientist needs to be better than any computer scientist at statistics, and the corollary applies- they will be worse than any statistician at statistics. Data science is a *very* big field, and some data scientists hardly touch statistics. Some do them all day. I think data scientists need to have a really deep understanding of some of the basic points I mention in Q1, and a lot of domain knowledge about the kinds of data found in their field (for example, there are lots of waits of 3 hours and 55 minutes recorded in A&E departments, for an obvious reason). And they need to understand the methods they’re using. If they’re using regression, they need to understand regression. I think at times all data scientists (including myself) are guilty of using methods that they don’t understand fully, and although you can get away with this for a while eventually you will slip up and look stupid. So do your homework and Keep It Simple, Stupid, would be my advice.

Q3. “To what extent is becoming a data scientist ‘gatekept’ by having a statistical science degree or A level in mathematics?”

I love this question. My experience of the recruitment of data scientists in my industry (the NHS) is that it’s a complete mess. Often data scientists are recruited by people who are not data scientists, because there are not enough of them around and those that are often haven’t had time to work their way up the ladder. I have quite a hardline view on this. DS is gatekept by quantitative degrees but it shouldn’t be. As I said on my panel I personally look for abilities in problem definition, communication, stakeholder engagement, and most importantly teamwork. People who can write version controlled code are ten a penny this days (I mean that respectfully, but it’s true). If you want to distinguish yourself, be kind, in other words. You do not need a quant degree to be a data scientist, I know because I have met many great data scientists without quant degrees, but you may find that the people recruiting you do not agree with this. YMMV