Metaflow is a joint effort by Netflix and AWS that attempts to solve the discrepancy between what data scientists care about and what they spend their time doing (pictured below). Get the backstory on Netflix’s technology blog.
This post by Lauren Reeder of Segment goes over the different layers to consider when working with a data lake. What’s a data lake, you ask?
A data lake is a centralized repository that stores both structured and unstructured data and allows you to store massive amounts of data in a flexible, cost effective storage layer.
Her article explains what tools are needed and provides code & SQL statements to get started. 🤟
Since data science has a huge impact on today’s businesses, the demand for DS experts is growing. At the moment I’m writing this, there are 144,527 data science jobs on LinkedIn alone. But still, it’s important to keep your finger on the pulse of the industry to be aware of the fastest and most efficient data science solutions.
Click through for key takeaways and trend analysis.
The latest machine learning research from my friends at Fast Forward Labs. Shiou Lin Sam and Nisha Muktewar teach us what meta-learners are and how they learn.
A curated list of applied machine learning and data science notebooks and libraries accross different industries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated. The catalogue is inspired by awesome-machine-learning.
This is an explainer on how to build a GitHub App that predicts and applies issue labels using Tensorflow and public datasets. Hamel Husain writes:
In order to show you how to create your own apps, we will walk you through the process of creating a GitHub app that can automatically label issues. Note that all of the code for this app, including the model training steps are located in this GitHub repository.
See also: Issue Label Bot
Google, Intel, and others have recently been targeting AI at the edge with things like Coral and the Neural Compute Stick, but NVIDIA is taking things a step farther. They just announced the Jetson Nano, which is a $99 computer with 472 GFLOPS of compute performance, an integrated NVIDIA GPU, and a Raspberry Pi form factor. According to NVIDIA:
The compute performance, compact footprint, and flexibility of Jetson Nano brings endless possibilities to developers for creating AI-powered devices and embedded systems.
And it’s not only for inference (which is the main target of things like Intel’s NCS). The Jetson Nano can also handle AI model training:
since Jetson Nano can run the full training frameworks like TensorFlow, PyTorch, and Caffe, it’s also able to re-train with transfer learning for those who may not have access to another dedicated training machine and are willing to wait longer for results.
Check it out! You can pre-order now.
China has committed to becoming the world leader in AI by 2030, with goals to build a domestic artificial intelligence industry worth nearly $150 billion (according to this CNN article). Prompted by these efforts, the Semantic Scholar team at the Allen AI Institute analyzed over two million academic AI papers published through the end of 2018. This analysis revealed the following:
Our analysis shows that China has already surpassed the US in published AI papers. If current trends continue, China is poised to overtake the US in the most-cited 50% of papers this year, in the most-cited 10% of papers next year, and in the 1% of most-cited papers by 2025. Citation counts are a lagging indicator of impact, so our results may understate the rising impact of AI research originating in China.
They also emphasize that US actions are making it difficult to recruit and retain foreign students and scholars, and these difficulties are likely to exacerbate the trend towards Chinese supremacy in AI research.
OpenAI, one of the largest and most influential AI research entities, was originally a non-profit. However, they just announced that they are creating a “capped-profit” entity, OpenAI LP. This capped-profit entity will supposedly help them accomplish their mission of building artificial general intelligence (AGI):
We want to increase our ability to raise capital while still serving our mission, and no pre-existing legal structure we know of strikes the right balance. Our solution is to create OpenAI LP as a hybrid of a for-profit and nonprofit—which we are calling a “capped-profit” company.
The fundamental idea of OpenAI LP is that investors and employees can get a capped return if we succeed at our mission, which allows us to raise investment capital and attract employees with startup-like equity. But any returns beyond that amount—and if we are successful, we expect to generate orders of magnitude more value than we’d owe to people who invest in or work at OpenAI LP—are owned by the original OpenAI Nonprofit entity.
To some this makes total sense. Others have criticized the move, because they say that it misrepresents money as the only barrier to AGI or implies that OpenAI will develop it in a vacuum. What do you think?
Learn more about OpenAI’s mission from one of it’s founders in this episode of Practical AI.
Those of you following AI related things on Twitter have probably been overwhelmed with commentary about OpenAI’s new GPT-2 language model, which is “Too Dangerous to Make Public” (according to Wired’s interpretation of OpenAI’s statements). Is this discussion frustrating or confusing for you?
Well, Ryan Lowe from McGill University has published a nice response article. He discusses the model and results in general, but also gives some perspective on the ethical implication and where the AI community should go from here. According to Lowe:
“The machine learning community really, really needs to start talking openly about our standards for ethical research release”
Claire Jaja (Manager of Data Science) at TalentWorks was curious about how many job requirements are actually required, so they analyzed job postings and resumes for more than 6,000 applications across 118 industries from their database. The results are quite interesting…
Your chances of getting an interview start to go up once you meet about 40% of job requirements.
You’re not any more likely to get an interview matching 90% of job requirements compared to matching just 50%.
…these numbers are about 10% lower i.e. women’s interview chances go up once they meet 30% of job requirements, and matching 40% of job requirements is as good as matching 90% for women.
Go beyond pandas, scikit-learn, and matplotlib and learn some new tricks for doing data science in Python.
I was surprised (and confused) to see
wget on this list, but aside from that there are some goodies in here. Gym looks pretty rad, to name just one.
PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters.
What a difference a few years makes. In 2015, a LinkedIn snapshot of what it calls the skills gap—a mismatch between the skills workers have and the skills employers seek—showed a national surplus in the United States of people with data science skills; as of August 2018, LinkedIn data shows a dramatic shortage.
It’s a good time to be
alive a Practical AI listener. 😉
Natasha Noy with the announcement:
Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page.
Open data is such a powerful tool when in the right hands. Hopefully this tool will help us find the datasets we need to make amazing things happen. 🤞
Are you really pushing Kubernetes? No? OpenAI is…
We’ve been running Kubernetes for deep learning research for over two years. While our largest-scale workloads manage bare cloud VMs directly, Kubernetes provides a fast iteration cycle, reasonable scalability, and a lack of boilerplate which makes it ideal for most of our experiments. We now operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which we’ve pushed to over 2,500 nodes. This cluster runs in Azure on a combination of D15v2 and NC24 VMs.
We’ve needed this post for a very long time. Thank you David Robinson.
When I introduce myself as a data scientist, I often get questions like “What’s the difference between that and machine learning?” or “Does that mean you work on artificial intelligence?”
But that overlap, tho.
The fields do have a great deal of overlap, and there’s enough hype around each of them that the choice can feel like a matter of marketing. But they’re not interchangeable. Most professionals in these fields have an intuitive understanding of how particular work could be classified as data science, machine learning, or artificial intelligence, even if it’s difficult to put into words. Here’s the break down…
How do you you teach a neural network to code? One screenshot with matching HTML at a time. 😂
Within three years deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software. The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code.
Currently, the largest barrier to automating front-end development is computing power.