David Yakobovitch joins the show to talk about the evolution of data science tools and techniques, the work he’s doing to teach these things at Galvanize, what his HumAIn Podcast is all about, and more.
Matched from the episode's transcript 👇
David Yakobovitch: Right. If I look at an ML engineer as someone who has software experience in building applications, and a data engineer as someone who could already work with cloud systems or distributed systems… And often the bootcamps and the masters programs just don’t give enough there. That’s why you wanna look at capstones for that.
But I think the challenge is there’s so much information to cover and pack into it that data science has just become this term that encompasses the industry. And how I look at it is, simply put, what used to be big data became predictive analytics, became data science, which is now the AI industry. It’s constantly evolving… But the truth is when you look at data science roles, 60% to 80% of the work is still in the data. It’s cleaning data, it’s labeling data, it’s getting it all set up…
I featured actually just at the beginning of August on the HumAIn Podcast Mark Sears, who runs CloudFactory, which is one of the big data labeling companies between and Africa. They have 10,000 just labeling data. I think the reality that a lot of data scientists don’t know until they join the company is you’re not playing with algorithms all day. Maybe you might, but even an ML engineer - you’re gonna be working with APIs, and setting up pipelines and systems before you get to start training and testing and working with other teams to see those results.
So I definitely see a specialization occurring in the field. In fact, I’m calling now a new subfield emerging in data science, which we’re starting to see in some trend reports, called Data Science as a Service. Similar to how we saw Infrastructure as a Service, with things from like HashiCorp, Ansible and Terraform, and a lot of deployment options for the cloud, we’re gonna start seeing that (and we already are) in Data Science as a Service. We’re seeing companies like Neptune and Spell, and [unintelligible 00:19:00.20] and other ones, which have all just recently raised their series A’s, that they’re helping deploy systems.
You even see the founders of Anaconda - a couple of them branched off and launched Saturn Cloud, which is launch these systems in Dockers containers, and now do your data science. Paperspace and Brooklyn got notorious for that, and has been doing a phenomenal job partnering with companies like FastAI, and Insight Data Science Fellowship as well.