Practical AI – Episode #81

Building a career in Data Science

with Emily Robinson

All Episodes

Emily Robinson, co-author of the book Build a Career in Data Science, gives us the inside scoop about optimizing the data science job search. From creating one’s resume, cover letter, and portfolio to knowing how to recognize the right job at a fair compensation rate.

Emily’s expert guidance takes us from the beginning of the process to conclusion, including being successful during your early days in that fantastic new data science position.

Featuring

Sponsors

LinodeOur cloud of choice and the home of Changelog.com. Deploy a fast, efficient, native SSD cloud server for only $5/month. Get 4 months free using the code changelog2019 OR changelog2020. To learn more and get started head to linode.com/changelog.

AI Classroom – An immersive, 3 day virtual training in AI with Practical AI co-host Daniel Whitenack

FastlyOur bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.

RollbarWe move fast and fix things because of Rollbar. Resolve errors in minutes. Deploy with confidence. Learn more at rollbar.com/changelog.

Notes & Links

📝 Edit Notes

Comment on the episode page for a chance to win a FREE copy of the eBook! Tell us why you are interested in Data Science and how this book might help you achieve your goals. Our 3 favorite comments will be selected on April 13th!

Transcript

📝 Edit Transcript

Changelog

Click here to listen along while you enjoy the transcript. 🎧

Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist with SIL International, and I’m joined, as always, by my co-host, Chris Benson, who is a principal AI strategist at Lockheed Martin. How are you doing, Chris?

I am doing very well today. How’s it going, Daniel?

It’s going well. I think we both have less travel on our calendars than we expected this month, for obvious reasons… I guess if you’re listening to this episode later on, this is Coronavirus season, so much of our travels got canceled, at least on my end. Is it the same on yours?

My March and April - I was gonna be traveling non-stop, all over the U.S, in different places, and pretty much the whole smash has gotten canceled, so everybody is working remotely these days.

Yeah, and that means we have extra time to dig into great topics on Practical AI, and make sure we get some good content out, so that’s exciting… As part of that, today we have an amazing guest with us; we have Emily Robinson, who is a senior data scientist at Warby Parker. Welcome, Emily.

Hi, thanks so much for having me.

We’re having Emily on the show today – she’s coming out with a book, with her co-author Jacqueline Nolis, about building a career in data science, called “Building a Career in Data Science”, from Manning.

We’re gonna dig into those topics here in a second, but before you do that, could you just give us a little bit of information about your background and how you got into data science, Emily?

Absolutely. I was lucky enough when I went to college at Rice University to be there when Hadley Wickham was a professor. For those of your listeners who might be not that familiar with R, or do more in Python, Hadley Wickham is one of the most well-known R programmers. He’s created some of the most popular packages, especially focused on data analysis. I started learning R in university, from classes he had designed.

My major was one I created myself, called Decision Sciences, which was focused on the social sciences, with a minor in stocks… And then I went on and got a master’s degree in organizational behavior.

That was actually a part of the Ph.D. program, and I decided two years in that Academia wasn’t necessarily for me, so I off-tracked, and I got my master’s degree. Then I came back to New York, where I’m from, and did a data science bootcamp Metis.

[00:03:54.24] The reason I was drawn to data science is the process I found was quite similar to the social science research process. You would come up with the question that you want to investigate, find the data, analyze it, and then present it to people, whether that’s in Academia, someone very much in your niche, or sometimes to a professor in a totally different department. What attracted me to data science was being able to do that, but for companies, making an impact, and on a bit of a shorter time scale than the seven years it can sometimes take to publish a paper.

After that bootcamp I went on to work at Etsy, and then at Datacamp, doing data science for both of those companies, and specializing in A/B testing or online experimentation. Now I’m here at Warby Parker… I started back in December, so I’ve been here a little over three months now.

Oh, congratulations. That’s great.

Thank you.

Yeah, what a great opportunity to get that early start in R, like you did… I know you’re still pretty involved in the R community, aren’t you?

Yeah. I really love the R community. Metis, the bootcamp I did, was in Python, and I think it really did help me to learn Python, and also more machine learning in that bootcamp… But I’ve actually been using R since I started working in data science, and a big part of what attracted me to it and has kept me in it is the community, which I find an especially friendly and welcoming community, and especially towards people who might not consider themselves programmers, and are using programming as more of one part of their toolbox, and are more focused on the ends of what they are trying to program than necessarily being like “I need to make the most beautiful code.”

Yeah, so I’ve been involved in the R community, especially with R Ladies, which is a global organization to encourage gender minorities programming in R. And also on the R community on Twitter, which is again, very active and welcoming. You go to Twitter and you ask for R help, and immediately you have all different people coming in and helping you figure out what your problem is.

Yeah, I can definitely – so I work most of the time in Python, but I have done some things in R, and I was really happy to attend the R Conference in New York (I think it was 2-3 years ago, something like that) that Jared Lander helps put on… That was a great experience, and I was a little bit nervous, because I felt a little bit like an outsider or a poser, because I didn’t have extensive background in R… But it was a great experience, and the community was so welcoming. So I can definitely attest to that, it’s a really great community.

I’m kind of curious… You took us through working for several organizations here, and you’ve gone and written a book, “Build a Career in Data Science”… So what was it that made you want to write this book on building a career in data science, and how did that come about, and how did you get connected with Jacqueline Nolis?

Yeah. And I think this is a great example of – you know, you don’t know what’s necessarily gonna make a huge difference in your life and your career… Jacqueline and I met back at Data Day Texas, which is a conference in Austin, in January 2018. It was sort of interesting, because that conference had mostly been like a graphing conference, but that year they decided they wanted to do an R track… So my brother, David Robinson, who is also an R programmer, Hillary Parker, and Jacqueline and me among others were speaking. That’s where Jacqueline and I first met. We attended each other’s talks.

Then a couple months later Jacqueline reached out to me, because Manning had got in touch with her, asking if she was interested in writing a book… And Jacqueline reached out to me and said “I know you’ve done some of this writing in the career advice space. Would you be interested in writing this book with me?” And that was another example of… When I was writing – so I had previously written some blog posts, including some more career-focused, which I think partly came from my background in organizational behavior, that I’d studied some of these topics… And I think that helped Jacqueline beyond meeting me – she had read these pieces, she felt I was a good writer and I had something to say, and that’s how we got started writing this book.

[00:08:12.07] I think the big motivation for me has always been really trying to scale up advice… So I do meet with people one-on-one, I write a blog, but this book felt a really good way to dedicate a lot of time to thinking about these topics, to learn from Jacqueline, who comes from a little bit of a different background… She’s been in data science longer, she has a Ph.D. in industrial engineering; she has been a data science consultant and a manager, and so on… So having her input.

And then also at the end of every chapter we interview a different data scientist. So we have people who have bachelors, we have people with Ph.D’s, we have folks who are very heavy in machine learning, we have folks very much on the analytics side of data science…

And again, maybe we could have done a blog post series or something like that, but having a book really gave us the ability to dedicate a lot of time to putting a resource out in the world that we wish we had had when we were getting started in data science.

Yeah, and in that process – I mean, you mentioned that Jacqueline’s background was kind of different, and you interviewed a lot of people… Did your perception of people’s track through a data science career and how data science careers are happening these days - did that shift through the process, or were you surprised by certain things?

Yeah, so I think one thing that’s interesting – it will be interesting how this will play out in a few years, because I think Vicki Boykis… We had her write a little blurb in the first chapter, called Data Science is Different Now, based on a blog post she wrote by the same name, where basically she’s saying it’s getting harder to enter data science; with bootcamps, and in master’s degree programs, you have a lot of people entering the field, and it can be really hard to differentiate yourself.

So I do think most people who we interviewed have been in data science at least a year or two years, because we wanted people who had some experience… But I do think it’ll be interesting a few years from now to look – okay, folks who are entering at the moment, who are looking for their first data science job… I do think the landscape is changing some; I think a lot of the principles do remain the same in terms of like networking is going to be really important. Writing a resume. Also, half of our book is once you have the data science job, how do you do well in it?

I think people we talked to – we didn’t come in with very strong expectations, because we already knew it was such a diverse field in terms of backgrounds and interests and career paths… But I’m interested to see more how it will keep changing in terms of – we would see with some of our interviewees… It was very possible five or ten years ago – everyone said “Oh, I don’t necessarily have a typical background”, but there wasn’t a typical background, there wasn’t a data science degree ten years ago, so everyone was coming in with different stuff… And how that will change in 5-10 years from now when people can major in data science, and are we gonna see it’s harder, for example for people from the social sciences to enter.

That raises an interesting point, when you talk about people coming in from different places… What have you found about building a career in data science that’s different from other technical careers? …whether it be software development, or maybe the other sciences. What did you discover along the way that was distinct about building a data science career from other areas?

I think one thing that is distinct is that there’s not necessarily this well-trodden path, in that the field is not as well defined. That can mean, for example – interviews at a company can just totally run the gamut. For computer science folks, for example, you’re trying to get into a software engineering job, there’s a book “Cracking the Coding Interview”, there’s tons of resources out there, like “What questions do Google ask, or Facebook?” if you look at bigger companies, and even smaller companies have kind of adopted this.

[00:11:53.29] But in data science right now, you might have one company that doesn’t give you any coding in your interview. And another one has you whiteboard code. And another one has a take-home project to do in Python. And another one asks you to derive something mathematical while you’re in the interview. Another one has you do consulting-type problems. I think as a field, that can make it really challenging to enter… That’s coming from – data science is such a broad field, and there’s so many different parts to it, and it’s very easy to feel impostor syndrome, because you know, how are you ever gonna know it? And the answer is “Well, no one knows it.” And because it’s all these separate, overlapping fields that are very deep in their own right, I think it can be quite intimidating, especially when you come up with lists of, you know, “True data scientists need to know these ten algorithms, and they know how to deploy things in the cloud, and they are an expert at managing stakeholder relationships, and have a degree in math.”

Yeah, easy stuff.

Easy stuff, exactly. And I do think one thing that does give me hope though is I do see data science similar to how computer science went, where for example you used to have a webmaster; and that doesn’t exist anymore…

Yeah…

Right? No one’s a webmaster. Things have started to specialize. And it’s not because like no one works in software engineering or runs websites anymore, of course not, but it’s become more specialized. And I do see in data science there are starting to become things like Airbnb has all their data science job postings always are one of inference, analytics or machine learning. Obviously, there’s still some subgroups within there, but I think that’s a very good start to helping people realize that these are distinct jobs. And someone who is a very good fit for a machine learning job there, for example, probably is not a good fit for an inference job, because those two use different skillsets within data science.

That’s an interesting point that you’re raising. Also, going back a moment to when you brought up impostor syndrome… We’re at these early stages of this field, and all of us are coming in – whether you’re coming in soon after college or university, or whether you’re transitioning from another field and you’ve been in the career for a while… that lack of standardization I think affects everybody coming into the field, to some degree. Did you find any similarities or differences for people entering this field based on those different points of origin from which they entered?

Yeah, I mean I do think – I’ve been asked before like how do I for -for folks developing an understanding of what data science looks like at different companies is very different, and specialties, “How do I figure out what do I wanna do?” If we think of the Airbnb, which we also have a similar breakdown in our book, of what we see as the different areas of data science - analytics, inference and machine learning. You know, machine learning often people – especially for production machine learning, putting a recommendation algorithm on the website, often those people come from engineering backgrounds. And inference - it’s often statistics or quantitative social science. And analytics might be some of both of those, but also you’re often more directly dealing with business stakeholders, so maybe you used to be a consultant, or you have domain expertise – you’re doing analytics for marketing and you used to be in marketing.

So I do think there’s some of – you know, depending on your background, that led easier into one of these than the others… But on the other hand, I have seen people change. Someone I worked with at Etsy, who’s more on the analytics side, what he ended up doing was he – at Etsy you could bootcamp with another team for a month, and sort of shadow them and help them out, learning… In his case, he bootcamped with the software engineering team, so he learned more about software engineering, he contributed his analytics skills, and he used that to help transition into a more production/machine learning/data scientist role.

So that was an interesting case, where he started out doing one thing, and then shifted within data science to a different type of role.

Break

[00:15:52.14]

Recently we had a show - I think our last show - which really talked a bit more about options around data science education… But we didn’t get a lot into the day-to-day of being a data scientist, and I know that you’ve already highlighted that that can look very different at different companies, and the positions can be very different.

One of the things that I thought was pretty interesting about your book was that you highlight what some typical companies are like to work at each day… I was wondering if you could maybe share a couple of your favorite examples of those types of companies in terms of the profile and how they’re different.

Yeah. The biggest part we do this in is chapter two, which is what that whole chapter is about. That was fun – so Jacqueline and I split writing the chapters, so each wrote half, and that was Jacqueline’s chapter… But we reviewed each other’s chapters, of course, and gave feedback. And that was fun for me to read, because that was one of the chapters that I felt like there’s not that much material out there on it. So you might have a blog post where someone says “Here’s my experience working at startups”, but they don’t necessarily have experience at a big company or government contractor, so it’s not really contrasting it. You just get a window into one type.

But yeah, so one fun example there is comparing, for example - we talk about it in chapter 9 - onboarding between a small startup and a massive tech company. So if you’re working at a small startup, especially with a very small startup, it’s like, maybe they have the laptop for you that day, maybe they don’t. You’re probably not gonna have any systems set up, you might have to try to figure out how you even plug into the data source; the data source may have been set up to help push data to the website and not for you to analyze… So the first time you try to count how many customers your business had, it could take six minutes for it to return that there’s a million, when of course if you’re writing SQL it should take a millisecond.

Versus if you’re at a massive tech company, you probably have a week-full of very structured onboarding, and there’s tons of documentation… You might end up spending a lot of your initial time just reading about what the team has done… Everything is trying to get a handle on the full tech stack, and it’s pretty much impossible. So it’s a very different situation there, and there pros and cons to each, of course.

[00:20:01.14] I was talking to someone recently who came in as a first data scientist, and she really deliberately did that. She had some previous experience… She felt that she was basically in control; she could make sure that from the beginning she was setting up in a way that she thought was best, versus having to deal with the tech debt, and in some cases things that were unchangeable now, and she didn’t think were the most efficient.

So I do think what we lay out is less of “You should definitely do this… But you definitely shouldn’t do this…” At the end of chapter two, for example, is “Here are some factors to think about. Maybe the company you’re looking at doesn’t fall into one of the five example types we give… But think about what mentorship opportunities would you have there. How will the pay be? What about the autonomy? What about the learning opportunities? That can really help you lay out and figure out, given what’s important to you, what type of companies should you be going to.

Do you think because there’s so much hype around data science and AI and machine learning these days – in some cases it seems like companies are trying… Like, they feel like they need a data scientists or an AI person or that sort of thing, because it’s gonna give them an edge, but they haven’t really explored what opportunities they have internally for that yet… Are there ways to understand what a company profile is in terms of their commitment to data science and how essential it is to a part of their business, and how that affects the day-to-day?

Yeah, absolutely. That’s a great point, and actually, our chapter one interviewee, Robert Chang, has shared his first experience in a data science job. He went and worked at the Washington Post, and what they quickly realized was that they had almost no data infrastructure setup… So actually he ended up working on data engineering, basically for the first year, when that hadn’t really been what he wanted to do.

As he said in our interview, he was hoping – he wanted to do data visualization, and he’s like “Well, the New York Times has really cool data visualization stuff… And the Washington Post - that’s a newspaper, so let me go there…” And what he said is now he really recommends to people to ask a lot of questions in the interviews, ask about what’s the data engineering team look like, is there a data engineering team, how big is the data science team, how long have they been around…?

Some of this you can find out online and you can guess - most people could probably realize that Google, Airbnb, most mature tech companies have pretty big data science teams, whereas a legacy company might be less likely to… Or it might be a little harder for them to integrate it.

But you know, once you’ve done your own research - Glassdoor is great. Even looking on LinkedIn, like “Is there someone with the title ‘data scientist’ who works at this company?” Definitely an interview is a really good place, and I would certainly say that it’s important to remember that an interview process is a two-way street. You are also interviewing the company, and it can feel sometimes especially with the hype around data science, and a challenging job market, that “I just wanna be hired as a data scientist. I don’t really care where, as long as they pay me a decent salary…” But you want a place where you can be learning, and thriving… And unfortunately there are some companies that aren’t – I wouldn’t say universally bad, but may not be good, for example, for someone who is new to data science and might benefit from some more structured mentorship.

You raise a really interesting perspective there about the job market and some of the issues that people face in trying to differentiate it a bit in terms of what they’re looking for… How do you distinguish between these different opportunities in the job market? And as part of that, what does the demand look like in each of those areas, and how do you prepare? I noticed that you talk about portfolios in your book… So how do you use some of these tools to address each of these different parts of the job market that you might have an interest in?

[00:24:03.18] Yeah, I think the job market right now is pretty good for people who’ve had at least one year of data science experience… So with a job title like data scientist, or something very similar. I think it is harder for people who are, say, coming out of a bootcamp, or undergraduate, or trying to just change careers, doing online courses, just because there are just more of them. There are fewer people who have had experience working in data science.

Companies use different methods to understand whether someone is really good, and often the easiest one for a recruiter to do who maybe doesn’t know know the field that well is just “Did this person have the title that they would have here? …which means that some other company said that they could do this job, or at least thought they could do this job enough to hire them, and they’ve had experience doing it.”

So what do you do if you haven’t had – how do you get over this paradox of needing experience for getting experience. Needing a data science job to get a data science job. And I do think that’s where the portfolio piece can really help. A portfolio would be having your code and some projects up on GitHub, and I think ideally on a blog. Why I think that’s important is 1) to have something to show employers, to show that you can do the job that they’re asking you to do. That you can take a question that you come up with, find the data, analyze it, and present it back, whether that’s by writing a blog post or by making a web application. I do think this can really help yourself stand out, especially if it’s an original portfolio project.

So what I don’t think would be really helpful here is, for example, your script for trying to predict who’s gonna die on the Titanic, which is a very classic Kaggle, beginning data problem… Because that’s not especially original. Who knows if you copied that code from someone else… Versus, “Hey, I’m really interested in fashion, so I use New York Times’ API to pull all the fashion articles, and then I did some topic modeling to see how different trends came in and out of fashion.” That’s a very original idea, it shows some personality too, and I think that can be a really strong way to help you stand out.

But at the end of the day, I mentioned a little bit earlier, networking is really important. A lot of jobs go to someone who was referred by someone who’s currently working at that company, or through meeting the person who is hiring, or meeting someone related on that team… And I think trying to expand your data science network can be a really good step in getting that first job.

I’m curious – it seems like when I got into data science I feel like I got lucky, because it was the start of the hype, and there weren’t a lot of people filling those positions… So I kind of got in at that time. Then there was the big data science hype, and now we’re going through a lot of emphasis on AI and neural networks and deep learning… How is that whole wave of influence from the AI side of things shaping the data science job market? Is that putting pressure on people too, on top of all the other things listed in a data science position? …learn TensorFlow, and have implemented their own implementation of this or that as well… How do you see that shaping things?

Yeah, I think absolutely that it’s putting pressure on people, but I think in some ways it’s more pressure from peers. I think there’s some companies who are getting really caught up with that. The companies should know what they’re doing. Generally, most data science problems don’t need AI or deep learning applied to them, and in the cases that they do, sometimes those go to people with, honestly, very specific backgrounds in that.

[00:28:02.08] For example, Google’s self-driving car division - I’m sure they have a ton of research scientists with Ph.D’s in very related fields. But I do think people are putting the pressure on themselves to learn that, and actually I don’t think that’s what you should be focusing on… Because again, most of the problems you’re gonna be faced with in data science, and certainly at the start of your career, won’t need that. And it’s much more important to just get very… My brother, David - we actually interviewed him in chapter four for the portfolio chapter, and his advice is not to focus on that deep learning, like TensorFlow, the cutting edge of the field stuff, but to get very comfortable manipulating data, summarizing it, visualizing it and so on, and making some more basic models… Because that really is like day in, day out, much more what you’re gonna be doing, along with, of course, things like managing stakeholders and communicating, and all of the “softer skills”.

Yeah, so that begs the question – you talk about Dave’s recommendation, and such… If you’re out there and you are searching for a data science position, how do you identify the right position for you? The one that fits your desire, your need. How do you know when you’ve arrived at that?

Yeah, I think it’s tough, because there’s only so much you can know before starting a job, even if you ask good questions in the interview… Jacqueline and I have a post on 12 red flags in data science interviews, talking about what you should look out for in terms of what questions they ask you, or their answers to your questions that we recommend you ask… But given that, there’s still gonna be unknowns. Maybe you end up with a really bad boss, or a dysfunctional department… In that case, I do think it’s important to remember - you can learn even from bad jobs, and to also think about how can you design… Like, given your situation – in a job usually there’s still some room to job-craft, so I really like this book, Designing Your Life, and the authors just came out with a new book, Designing Your Work Life, where they’re talking about thinking about what you wanna do.

A lot of people turn to “Let me sit in a room and think for a while, introspect.” And they’re much more on the design process, which really advocates for “Try things out and iterate, and then reflect.” Don’t just go in a box, like “Oh, would I like this type of thing?” It’s like, “Well, try it.” Think about how much you think you’re gonna enjoy it, reflect on how you actually enjoy it… Go through your day and mark off, in half-hour increments, how your energy level is… Then you can reflect back and you can see “Oh wow, it looks like my energy level was high when I was collaboratively coding or in meetings, and actually pretty low when I was by myself for a couple hours, which maybe I didn’t expect.”

So I do think once you’re in a job, that’s something you can do. Before you get a job… I think generally what makes jobs appealing to the people - there are some universal traits. One of them is having autonomy, another one is having the ability learn… So I think focusing on those and having – and I would say a third thing is mentorship and support… Those would be the types of things I would look at. And if you’re very new to data science, be a little bit flexible. Don’t necessarily say “The only thing I wanna do is make TensorFlow models, and I only wanna do it at Google, and I only wanna do it with a data scientist title.”

Data science happens in a lot of positions that don’t have the data scientists title, so I’d also advise people to maybe let go of that a bit… Because then once you broaden that scope, you might find a lot of really great jobs out there that maybe you wouldn’t have found if you only wanted to have that data scientist title.

That was, in fact, a very narrow set of search criteria right there…

[laughs]

[00:31:47.26] Yeah, but I think it’s a good point, because once you establish yourself… At least my experience has been – let’s say that you do have those ambitions to train state of the art models and all of those things… My experience is even if there’s a company that’s currently doing that, or exploring that, it’s generally not the first thing you’re going to be doing with them… So regardless, I think it’s beneficial to really develop a good understanding of the business processes that are happening there, develop good relationships with the company, and understand what problems are important to them… Because once you have a better understanding of that, eventually maybe it is that you can proof-of-concept some more advanced model or something like that, but you’re not gonna be able to convince people that that’s even worthwhile if you don’t even understand the business processes and if you don’t have good relationships and all of those things within your team.

I was curious - you mentioned certain things that you had thought about in terms of things to avoid, or red flags, and that sort of thing… Given that there’s so many posted data science and AI and machine learning positions out there, I was wondering if there are any sort of tips you had in terms of filtering through that noise. It can be really overwhelming for people, because they see all these positions and they’re so varied. One is talking about “Oh, you need Hadoop and Spark and TensorFlow and PyTorch and reinforcement learning” and whatever, and then the other one is like a totally separate set of tools… Do you have any good resources that people might utilize in terms of searching through job postings? Or maybe it’s about, like you’re saying, networking… Are community events a good way to deal with that sort of noise?

Yeah, I think community events is one way. You’d mentioned earlier, the New York R Conference run by Jared Lander - he also runs a monthly meetup, and at the beginning of each meetup he asks anyone who’s hiring to announce that. So that’s one way.

I do think you can add search terms… For example, rather than just searching for data science or analytics, you could be like “analytics R”, or say you wanna do online experimentation… That might narrow down the options you have to look through.

I’m not sure I have any tricks besides adding those search terms, but I do think once you’re reading a job description and start filtering them out, definitely looking for a job that seems to want a unicorn… They want someone, and they say like “You’re gonna be making dashboards, you’re also gonna be doing deep learning models, and you’re also gonna run our online experimentation system, and you’re also gonna do this, and that…” and just this whole laundry list of things that is– for example, if we go back to the analytics and inference, machine learning hits all three, because the problem is not just like… One, it’s unlikely that there’s anyone who’s gonna be really good at all of those, but also you’re gonna be pretty overworked if you’re expected to do all of these very distinct tasks.

When you’re glancing at descriptions, try to see “Okay, does it seem like this is an actual person, or are they talking about a full data science team, and expecting that to be in one position?” And again, that’s going back to the fact that it is a two-way street, and you can also be picky, and not just like you might be less likely to get that job. It’s like, if you did get that job, it’s usually not gonna be a good experience. So I think that’s one way to filter them out.

My little tongue-in-cheek warning is if they say they need ten years of TensorFlow experience…

[laughs]

…that’s a warning sign right there.

Yeah, exactly. This also depends on where you are. San Francisco, New York, a few other cities have the luxury of lots of job postings to sift through, but if you’re in a smaller town and you need remote– like, it might not be the case that you end up having that many. So I don’t know if it’s necessarily a problem that everyone is facing, that’s like “There’s just too many data science jobs.”

[00:35:47.21] Emily, I’ve got a question… I wanna figure out where we are. In job searches, we’re always talking about resumes and cover letters, but I guess my question is we’re in this age where everyone is on LinkedIn, and we’ve actually had episodes where we were talking to organizations that are now doing analysis of job applications with different AI models, and stuff… So are the traditional resumes and cover letters still relevant in this day and age? Have they changed? What should people be thinking about now, as they’re looking at prepping what they need for a job search?

Yeah, absolutely - traditional resume and cover letters is so important. Cover letter can vary by companies, but almost always – even if you’re referred somewhere, you’re gonna have to submit a resume… And it’s not necessarily the case that like “Oh, I was referred, so I’ll just submit whatever. They’re definitely gonna interview me.” The [unintelligible 00:36:41.17] could still say no, but the hiring manager is still gonna look at it and decide whether they wanna spend half an hour talking with you. So I say absolutely, it’s still very important.

The other things, like a LinkedIn, or a blog, or a GitHub, I think can really help bring attention to your profile, and maybe have companies reaching out to you, or enhance your application… But you’re still gonna need that resume or cover letter.

We have a chapter n our book about this, and it was sort of funny, because there’s a lot of advice on resumes out there. Yeah, some of it is field-dependent, but some of it is somewhat universal… But I’ve been surprised at how many resumes I’ve sometimes seen by people that I think would really benefit from following some advice.

I think the big ones here would be like almost always just do a one-page resume. Unless you have many years of relevant experience, fit it onto one page… Because it just shows you can be concise, and it’s a lot easier to scan. And with that “easier to scan” - have some white space, and don’t have it necessarily just filled wall-to-wall with all the text you can cram in there in size ten… Because you want someone who’s glancing at it for, say, ten seconds, to immediately be able to zero in on the important points… And all it needs to do is just get you in that door, get you into the hiring manager interview. From there, it’s gonna be based on your interview, how your interviews go, and maybe some other pieces like your portfolio, like your blog. But still, that resume is still a very key component, and sometimes the cover letter, in getting you in the door in the first place.

So let’s say that you did get in the door and you’re about to start ramping up in a new data science position, AI position, machine learning position, whatever the title is… What are some of the first things to focus on as you’re settling into that new position? I know you focused on this a little bit in the book as well.

Yeah. I mean, this is something that varies by company, of course, how it looks… What you need to do in your first couple of months at a startup where you’re the first data scientist is quite different if you’re joining a mature and functioning team. With that being said, I think there’s a couple of principles that really apply to any type of new role… And a big one there is really trying to learn as much as you can and not being afraid to ask questions. That doesn’t mean necessarily you should ask a question that you could easily google, like “What is the difference between a vector and a list in R, but really don’t be afraid of asking questions, trying to understand “Where does something live? Where could I find docs on past projects? Why do we do things this way? What Slack channels should I be in? What does this data mean?” and so on and so forth. To ask those really with a sense of curiosity and not a sense of, for example – not quite entitlement, but like “Why do we do it this way, instead of this?”, like clearly superior way that I learned in school. You don’t wanna come off immediately as like “Oh wow, I can’t believe you all are idiots. I can’t believe you’re doing it this way, and I’m so glad I’m here to fix it all”, and really try to keep an open mind… Which doesn’t mean, of course, that everything you’re doing is perfect.

[00:39:50.21] Ideally, you came in and you are hired because you have a lot to contribute… And in those first couple months really focusing on learning the most and trying to set yourself up to be productive in the long-term, and not worried too much about “Oh wow, I really immediately have to start delivering, or they’re gonna wonder why they hired me.” Because I think if that was truly the case, that’s usually a sign of not a very good and supportive company, if they’re expecting you to immediately start delivering reports and other things.

Of course, you can start doing small stuff, but they recognize it takes time to ramp up, and you don’t wanna become too focus immediately on the short-term, rather than for example spending a day or two making sure – you know, building some internal functions, for example if you’re the first data scientist, to make it a lot easier for you to load data, which is something you’re gonna be doing every day from now on… And by saving ten minutes every time you do that, it’s really gonna pay back in the long-term.

I actually wanna extend a part of your answer right there, and that is I’ve heard many people in data science jobs say that the hardest part of the job isn’t about the data itself, it’s really about the people at the organization. You kind of alluded to that in that last answer, in terms of expectations, and such… So if you are going into a position and maybe that organization is not a data-driven organization culturally, the way that some of the leaders in the industry might be, and you’re trying to work with people and show them the value of data-driven methods, how do you go about developing influence with those people and being able to help them see the benefit of driving their own decisions via data, rather than maybe just their own experience, their own sense of ego that “Hey, I’ve already been here and I know what I’m doing” - how do you content with that?

Yeah… That certainly can be challenging. I think there’s two things that can happen. One is almost like “Oh, who needs data? I have intuition.” The other is like “Great, we have a data scientist… Build a model that will predict which sales customers will churn.” And it turns out they don’t even know how many sales customers are leaving each month, they’re not even quite sure what their ARR is… No one’s even done a descriptive analysis, which might turn up something like “Oh, it turns out we have a big problem with very small customers churning. You don’t need to go and build a fancy model, let’s just get the numbers and understand their… Maybe we don’t even focus on churn at all; it turns out that we’ve been really slowing down in acquiring new customers, and that’s gonna harm us in the long-term.” So sometimes it’s about redirecting that… And I do think something that can help there – or if they’re numbers-averse, starting to figure out “Okay, is there a champion?” For example, at a small company or a startup it’s not unusual for many employers to have direct relationships with people in the C-suite… So for example maybe the sales head is like “Test some numbers”, but maybe you’re finding out that they’re not doing as well as they said they were, or they’re not getting returns on some of their sales hires… And of course, they’re not very motivated and maybe not that interested in hearing that, but the CEO would be, right? Because that’s their bottom line; maybe they’re more metrics-driven.

So that doesn’t say you should never talk to the salesperson and just go above then, but if you are finding sometimes that, you know, like “I’ve been really trying, I’ve been working to developer relationship, developing empathy”, which is very important, really trying to understand the questions behind the questions, what problems you’re dealing with if you’re finding that that is not being fruitful… Of maybe seeing “Okay, do I have to go to someone else to talk to them about that, or are there other places I can add value?” Because I do think it’s hard… Like, how much politics are you willing to put up with, basically? Given you try things like have empathy, talk with people, communicate well, really try to understand their problems, at the end of the day sometimes you can do a lot of work and it just won’t necessarily get received that well, and then you do have to make a bit of a decision about what you wanna do with that.

[00:44:02.19] So as you’re settling into your position, maybe even you’ve had a couple of different data science positions or you’re understanding more about what you want to do and what you wanna learn over time, what are some good ways to continue your personal development as a data scientist throughout your career? What are some of the things that are kind of easy wins that you can be involved with, or integrate into your workflow, or other things?

I think this really depends on the person. For me, one of the things I really enjoyed doing when I started as a data scientist was to begin speaking. I found a lot of opportunities to do that. As I mentioned, this book with Jacqueline came about because we saw each other speak at a conference, and I never would have met her without that.

So that’s one thing… Some people call it conference-driven development. I know some folks who give talks, for example, who are saying like “Oh, I’m gonna give a talk about the package” for a package they haven’t created yet, and putting that deadline on themselves really helps motivate them.

Another way people keep learning is doing open source. For example, let’s say you’re one of the only data scientists at your company; maybe you wanna get involved in a big open source project, because there you can learn more about “Alright, what’s it like to work with a bit of legacy code?”, to have many collaborators on a project, to have to think about “There’s thousands of users, we can’t just be chaining functions willy-nilly.” That could be one way. Other people like to do online classes…

I will say for me, and I think most data scientists I’ve talked to, I would be wary of just doing an online class without having an application. I think most people can overestimate their learning just by watching lectures, or even doing little problem sets, and learn much better when they then have to take that and apply it to a project that they’re working on, whether that’s a personal project or one at work.

So I think you’ve got a lot of different options, whether you wanna do speaking, or blogging, they wanna do personal projects, who wanna say “I’m not really interested in doing stuff outside my work. My work is really intense. I wanna just focus on projects at my company.” I do think there’s a lot of different ways you can try to keep learning and keep growing your skills.

I’ve got a question that I think especially applies to a lot of companies that may not have that long-term culture of data science, and that is the idea of failure. Failure in applying data science - there are so many factors that can cause a data science initiative to fail, or to go awry, or there’s not enough data… So when one of those hits, how does the data scientist or maybe somebody who’s focusing more on the neural network side, AI side - how do they gracefully deal with and learn from those failed projects? And as part of that, how do they communicate the normalcy of that state to stakeholders within the organization that might not otherwise have arrived at that same understanding that the data scientist has?

[00:47:04.26] Yeah, I think there’s a couple of things with dealing with failure. One of those things is you don’t wanna come in and surprise them, like “Oh yeah, I’ve been working on this for four months and I’ve told you it’s all going well, or I haven’t told you anything… And I’m like “Surprise, it’s all failed!” You don’t wanna shock people, so how you can avoid that is having fairly frequent check-ins, where you’re like “Okay, here’s our plan. Here’s how we’re progressing with that. Here’s where we ran into an unexpected bug, here’s our plan for getting around that etc.”

You could be doing that, and maybe it turns out like “Wow, there was some external shock…We were going off for a couple of weeks with this part of the dataset, and we only just got access to this other part of the dataset, and it turns out it’s totally useless, so we can’t do the project.” But at least if you’re checking frugally, it’s less likely that will happen, and also if you do a good job upfront of explaining the risk. So there is a lot of unknowns, of course, but as much as you can, at the beginning, saying like “Alright, here are some parts we don’t know. How’s data availability? What’s the impact of the gains we’re gonna get?”

And the final step, I would say, is trying to make a balance of projects… Because all data science projects aren’t created equally risky. For example, prediction models can be fairly risky, because they just might not be single; you might not be able to predict the outcome with the data you have… Versus a more infrastructure-related problem, like setting up a preliminary A/B testing tool, or a more descriptive problem, like “Alright, let’s surface more about our customer data; let’s build these dashboards.” Having that can make sure that you have a little bit of a balance, so it’s not (for example) gonna turn out wild because we took on all really risky projects, and don’t really have much to show for it in terms of things that will benefit the business, and a whole year of work… Because you don’t wanna be in that situation.

Yeah, that makes sense. I hate to end our conversation on the topic of a failed project… [laughter] To give a little bit of brightness at the end here, I do want to, again, mention your book, but also that Manning was nice enough to give the podcast a really great discount code for the book… And there’s so much more in there, there’s so much more about the career path of a data scientist, and interviews, and job offers, and all sorts of things that we didn’t have time to cover.

The discount code is “podpracticalai19”. We’ll put that in the show notes, as well. Make sure and utilize that, and look up the book, and also follow Emily and Jacqueline with a lot of the great content that they’re putting out there in the R community, and elsewhere as well. Thank you so much for joining us and sharing your thoughts, Emily.

Yeah, thank you again for having me.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

0:00 / 0:00