Woo hoo! As we celebrate reaching episode 50, we come full circle to discuss the basics of neural networks. If you are just jumping into AI, then this is a great primer discussion with which to take that leap.
Our commitment to making artificial intelligence practical, productive, and accessible to everyone has never been stronger, so we invite you to join us for the next 50 episodes!
Learn more about neural networks with the following learning resources.
- Deep Learning textbook
- Data Science from Scratch, 2nd Edition
- There are literally too many others to name…
- fast.ai MOOC
- Deep Learning course by Google on Udacity (free)
- Practical Machine Learning with TensorFlow 2.0 Alpha (free Udacity course)
- deeplearning.ai Deep Learning Specialization
- Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning (free Coursera course for TensorFlow 2)
- Intro to TensorFlow for Deep Learning by TensorFlow (free Udacity course for TensorFlow 2)
Click here to listen along while you enjoy the transcript. 🎧
Welcome to another Fully Connected episode of Practical AI, where my co-host Chris and I keep you fully connected with everything that’s happening in the AI community. We’re gonna take some time to discuss the latest AI news, and we’ll dig into some learning resources to help you level up your machine learning game.
I’m joined today by my co-host, Chris Benson, who’s the chief strategist for AI, high-performance computing and AI ethics at Lockheed Martin, and I’m Daniel Whitenack, a data scientist with SIL International. How’s it going, Chris?
It’s going great, I’m excited today.
Yeah, me too. How’s the week been for you?
The week’s been good, just lots of stuff at work; I was traveling, about a week ago I was up in Boston at LiveWorx. I’ve been doing lots of interesting stuff in high-performance computing and AI ethics. Artificial intelligence is a field that just keeps getting more and more interesting in terms of what we’re doing, and just at large, and all the places you can go in it. It’s a great time to be in it.
Yeah. Well, hopefully I’ll survive; I’m working on a little bit of a jet lag right now. As you know, last week I was in India, which was great. I was in Bangalore. India was great, but getting back from India was quite a chore. It turns out I was on an Air India flight which they don’t fly through Pakistani air space because of obvious reasons, but then while I was in India, this tension happened between the U.S. and Iran, and the U.S. put restrictions on planes coming to and fro over Iranian air space, which is the re-route that Air India does as it goes around Pakistan, and then over the Arctic back to Chicago… So finding a route back to where I needed to get turned out to be rather interesting. I got back a lot later than expected, and it’s about 2 AM now in Bangalore, so… Bear with me if I start going off on a tangent.
[00:04:03.13] Not a problem. I will wake you up though, because this is a special episode for us. This is our 50th episode.
It is, congratulations on the 50th episode. Pretty crazy.
I know. It’s gone by so fast over the past year, and just the idea that we’ve put out that much content, and that we actually have people that still wanna listen to us after doing that… I’m amazed by that every day.
Yeah, definitely. Thank you to all the listeners. It’s been great to kind of gradually get more and more connected to the listeners, and listeners engaging on our Slack channel, on LinkedIn and other places… It’s just really great to hear that you’re appreciating some of the content, but also great to hear some of your ideas that we’ve been able to filter into the show. Keep those coming, thank you so much for listening. We really appreciate all of you, and really want to engage with all of you in our community… So make sure and check that out at Changelog.com/community.
I’ve just gotta say, when we get messages from people out there, or people engaging us in the communities, and stuff, it is just enormously exciting, because it’s the reason that we’re doing it… And the fact that people are out there not only listening, but saying “Hey, what about this? I’d love to hear that” and “Hey, here’s a suggestion”, or “Hey, I know somebody who’d be great on your show…”, it makes the whole thing wonderful. I know that sounds a little corny, but it’s true.
Yeah, it is super-encouraging. So keep that coming. We’re really excited about episode 50. This is kind of a celebration for us, and so we were talking before the show about what should we do to celebrate episode #50… And what we came up with was kind of to loop all the way back to where things started with AI, and with Practical AI - that’s to devote this celebratory episode to one of our favorite things, which is the neural net.
We’ve talked about a lot of neural nets on the show obviously, and many advanced architectures and applications and all of that, but we’ve never actually just talked about the neural net itself, where it came from, and just kind of in brief and from scratch what a neural net is, what makes it a neural net… And we thought this would be a great episode to circle back to that starting point.
Yeah, absolutely. One of the common comments that we get back, that I’ve had conversations with several people about, including the young man that is at the Chinese restaurant two miles from my house - because he actually listens to the podcast… But he’s not a data scientist. And he made some comments to me a while back. He said “You know, you guys are really good, and so long as you don’t do jargon…”, and we took that as a point to be very careful about that… But he said that sometimes we get a little bit out there for where he’s at. He’s very interested in the topic, but we’ve never really done a true Into to Neural Net type of a show. And it occurred to me that for those people out there who are trying to jump in and maybe find it a little bit intimidating - I can’t think of a better way to celebrate a milestone episode.
Yup, sounds great. So why don’t we start with giving just a little bit of history about the neural net itself…? Neural nets are not new. They’ve actually been around for quite some time. Do you know when neural nets came onto the scene, Chris?
Sometime around World War II, if I recall correctly… Do you have the specifics?
[00:07:51.24] Yeah, so you can just search Google for “neural net history”, and there’s several lists that come up that have variations of the various dates, and facts, and all of that… But generally, people include a date around the mid-1940’s, when the first computational model for neural networks came out. It was a guy named – I’m sorry if I’m mispronouncing this; I don’t really hear this name too much… Warren McCulloch and Walter Pitts created these computational models, really that paved the way for both modeling biological processes like actually neurons in our brain or neural networks in our brain, and then more practical applications of neural networks.
Yeah. And then I think the next major step was when the Perceptron was invented. That was by a guy named Frank Rosenblatt, in 1958. So for me, we’re getting a little bit closer to my year of birth; not quite there yet, not quite that old… But that really stood off one of the early waves of research in this area.
Yeah. People are sometimes surprised because there’s been a lot of talk about neural networks recently, but maybe they didn’t hear it a while back… So these sorts of things have been around for quite some time in research, and like you were saying, moving up through the ’60s and ‘70s, they were a topic of research, but I think that a big shift happened in the 1980’s up to the mid-’90s. This is where things like deep learning and backpropagation, so these larger neural networks and applications to different types of data came around.
Up until this point, people were researching neural networks, but they hadn’t really figured out a way to make them bigger and learn more complicated patterns. Before that, they were pretty limited to divisions between linear class boundaries, and these different things, but as they saw that they needed to model more complicated relationships, they saw that the size of the networks needed to increase, but they didn’t really have a good way of training those sorts of neural networks. That kind of changed in the ’80s.
That’s true. And I have a special affinity for that time period, because in 1992 is actually when I first became aware of neural networks. That’s before the name “deep learning” was applied to it, and it was before anyone was calling them “deep neural networks” necessarily.
My father worked for Lockheed Martin, just like I do. He would have been shocked that I do at this point, but… He worked there, and there was an event that really affected me in a very personal way. That was that there was a fighter plane called the F22; at the time it was the YF22, which is still kind of the world’s top air superiority fighter. In other words, for dogfighting, you might say. There were two prototypes, and one of those prototypes was doing some testing at Edwards Air Force Space, and they were testing avionics on it, and there was a malfunction. Fortunately, it was close to the ground, and the plane slammed into the ground and went skidding in a fiery ball down the runway for quite a ways… The test pilot got out and got away safely, but as the aftermath of that, both of my parents were on the F22 team, the core team that built the avionics for the plane, and my father was assigned to help solve that avionics thing. And as part of that, he was using networks of the day, and he started with feedforward and backpropagation, which we’ll talk about in a few minutes, and moved on to other architectures…
[00:11:57.05] That was really special because he would come home, and there’s all sorts of classified stuff he would not talk about, but in terms of the actual science, we’d come home and he introduced me to neural networks, and this was our dinner table talk for a while… And as he made progress into different areas, and he would explain it to me at night and I would ask questions… And I was a college student at the time… So it was a really interesting way for me to get into it in a very practical problem. Obviously, the problem was solved and the F22 is in service today. Anyway, that’s how I originally came to be aware of them.
Yeah, and that’s where you first started getting the ideas for this great podcast, Practical AI, I’m sure… [laughter] I don’t know if podcasting was not a thing–
Maybe not that far back.
[laughs] This time was really when some interesting things came on the scene. That was recognized actually this year with this year’s Turing Award, which went to LeCun, Hinton and Bengio for things like backpropagation, and ideas around deep learning. That was big news recently… But there was this kind of time period of the 1980’s, up to the mid-1990’s, where things were getting really exciting… And then there was sort of a die-off of interest in these sorts of methods. Some people call this the AI winter.
Yup. One of several.
Yeah. That kind of led up almost to the mid-2000’s. This was a time when these methodologies were known, but the problem was that as these networks got larger and larger, of course, they had more parameters that needed to be fitted, and that required more data and more compute… So there was kind of this lag of the actual data and compute that was needed, and along with that, the adoption that we’ve seen recently. That really kicked into gear maybe in the mid-2000’s and on, where people really had access to a lot of compute, a lot of data, and really were able to plug that into these advanced methods.
Yeah, it really got kicked off by a guy who had been in the field for a while, kind of coming out of this AI winter… By Geoffrey Hinton. He’s one of the legends in this field. He started research, and he continued through that AI winter, while everybody else was turning to other things. I would argue that it was really some of his initial – kind of in this latest wave, since the mid-2000’s, that kind of kicked it off. I credit him with coming out of the AI winter and being at the moment that we’re at now.
Yeah. And you know, recently of course Google has switched from a mobile-first to AI-first approach to their business in general, and that’s kind of sparked a lot of interest from a lot of other industry leaders as well. Pretty much all the big tech companies now, along with a host of startups and smaller companies, have really switched to a focus on AI in terms of research and development and the methodologies that are powering their products.
AI has kind of at this point become a new layer in the software stack, that’s enabling new sorts of functionalities in applications. At the core of most all of those AI systems are neural networks, these things that started back in the ’40s, that were kind of envisioned and built up over time… But the core idea is the neural network.
[00:15:56.23] Now, a lot of people will argue about what AI encompasses, and the sorts of methods that are AI and aren’t AI, and there’s certainly a lot of methods that aren’t just kind of simple neural networks. There’s non-neural network methodologies, there’s a lot of other machine learning type of methodologies… But really the neural network is kind of the core piece that’s powering a bunch of things in industry now, and really is the focus of a lot of the AI research that’s going on… Which is why we’re focusing on them.
That’s true. And I have to say, that was very well said… Because the reality is when you put different people in this field, data scientists and deep learning engineers, and you ask them what AI is, you’re gonna get a lot of different answers.
I was actually at an event where that almost comically demonstrated itself in that way. It was an Adobe event which was a live broadcast on Facebook, and I was one of ten people that came and participated. And there was a lot of stuff we agreed on, but the one thing that all of us had different viewpoints on was exactly what constituted artificial intelligence today. Without delving any further, I just found that fascinating - they introduced us as experts, whether we were or not, but we were positioned in that way, and yet none of us could agree on the basic definition of the field.
Okay, so we’ve talked a little bit about the history of neural networks, and we’ve talked about how they came onto the scene, and really that they’re powering a lot of these big tech innovations… But before we jump into the very specifics of a neural network and what it is, I think it would be useful to just give a real broad definition of supervised learning. There’s a lot of different types of machine learning models out there, some of which are kind of unsupervised, and semi-supervised, but the bulk of models that people get into when they’re first getting into AI and machine learning are supervised machine learning models… And I think that would be a good framework for us to talk about neural networks within.
Yeah. I was just gonna say, when I’m talking about neural networks in an introductory thing, I may allude to some other things that are out there, but supervised learning is definitely the place to start. It’s kind of the basics; you learn the basics, and then you can build on it in a lot of different directions.
Yup. When I’m teaching classes, I normally try to introduce some type of model problem that people can have in the back of their minds. When I’m thinking about supervised learning, you might think about - let’s try to model our sales for the month based on the number of users on our website, or something like that. Now, one way you could do that is by creating a sort of function in code that would take in your number of users and output your sales. Most often, that would include some type of model definition and some parameters.
[00:20:20.19] You might input a number of users and then multiply that by a parameter or a coefficient, and out comes your sales. So that’s a model definition with a parameter. Now, the big thing that separates machine learning functions versus regular code functions is that regular code functions (that definition and parameters) are set by domain knowledge and someone coding them in, whereas in a machine learning context I like to think about those parameters being set by trial and error and an iterative process of looking at a bunch of examples and trying to make predictions for all of those examples, and then fitting or setting those parameters based on this sort of iterative process.
Overall, that’s kind of how I have the picture in my mind. Does that make sense, Chris, or do you have a different view?
No, I think I would see it the same way. I think one easy way to think about it is if you look at solving problems programmatically up until you get to deep learning - in other words, just using more traditional programming - you explicitly are going and giving the program commands on what you’re going to do, and you might think of it in a very simplistic way as lots of “if/then” type statements, lots of case statements, and you’re having to think of all the things… Whereas this way of doing it, where machine learning (the model) is learning what it needs to do is sort of implicit; it’s figuring it out for you, and it’s a different programming paradigm in large, in computer science, beyond just what deep learning is. So if you think of it as the job of training the model is now to go figure out what it needs, as opposed to being told what it needs, it kind of puts you in the right frame for learning this.
I would say some people when we use words like “the computer figures it out” or “the computer learns”, they kind of have this view of like “Oh, I’m gonna go put my laptop in the corner of my office, and then sprinkle some special fairy dust on it, and it’s gonna spontaneously start learning things about the world…”
Fairy dust… Hm.
Yeah… You don’t have some of that laying around in your kitchen, or something?
Yeah, I’ll borrow it from my 7-year-old daughter.
Yeah, maybe. So in reality, there is always a sort of model definition; remember thinking of our users to sales - there’s some definition and there’s some number of parameters that parameterize that model definition. It might be the coefficients’ multiplication, or what we call bias, which is a number that we add onto the definition… But there’s some model definition in those parameters, and what we mean by learning or training isn’t just that our computer is at the right temperature, in the right conditions, and month of the year, and the stars align and it starts learning. It’s that these parameters are set through an iterative process of looking at a bunch of training examples; examples of what input is and should be, and what output should be.
There’s a bunch of examples of “There’s this input, and this should be the output. There’s this input, and this should be the output.” There’s a training process, which is just another function written in code, that iteratively looks over all of these examples and fits these parameters such that the model can then make predictions on new examples that it hasn’t seen yet.
[00:24:03.10] So it isn’t that there’s kind of a spontaneous learning that happens; it’s really something much more benign. It’s that there’s a bunch of examples, and computers are good at repetitive tasks. So we just have the computer look at these examples over and over again and tweak these parameters until we get a good set of parameters to parameterize this model definition. Then we can make new predictions. So that first process is called the training process. Then when we make new predictions, that’s called the inference, or prediction process.
In the training process we’ve talked about making the little tweaks, and that’s called error correction. As Daniel was talking about, when we’re in training, we already know what the ground truth is for any given example… And the model is essentially trying to find that with where it’s trained to up to that point. Then it actually says “Okay, I have a result in this training cycle, and then I have the ground truth; there’s a difference in the two, and I’m going to use an error correction algorithm to say “What should I do? What tweaking should I do in this case, when my result isn’t what I know to be the ground truth in the training set?” So it is an algorithm that is driving that tweaking, but it is able to use that algorithm based on the data it’s come upon on that particular cycle.
Let’s maybe make this a little bit more concrete now. We’ve talked about supervised learning in general, and that there’s this definition and parameters that are set… So what does that look like for a neural network?
In a neural network there’s these kind of sub-units – I have an overall definition, and then I have a bunch of sub-definitions within that. Or you could think about it, if you’re a programmer, like a function that calls a bunch of sub-functions underneath it… And these sub-units or sub-definitions are called neurons.
Each of these neurons has its own inputs and outputs, with its own definition and its own set of parameters… And these parameters for the neuron are often called weights and biases. Again, you can kind of think of my overall definition of my model, containing a bunch of these sub-definitions of neurons that are linked together in some way, and together that assembly of neurons make up what’s called a neural network architecture.
That architecture just means there’s a bunch of these sub-units. Each of them have a definition and some parameters that can be set. Now, there’s a lot of different ways that you can set up those neurons. Maybe we should look at a common way to set up all these neurons.
Yeah, a fully connected feedforward is a good starting point.
Yeah. Do you wanna start there, Chris?
Sure. Daniel was just talking about these units of neurons, and if you wanna paint a picture in your mind as you listen, you could think of each one of those – the way they’re usually depicted graphically is as a little circle; you can think of it as a little circle that has some stuff inside it, which we’ll talk about in a moment. But you take each of those circles and you line a few circles up into a row. So you have a row of circles, and then at that point you line up a second row, and maybe a third row… So there’s some number of rows, any number of rows that you have there; and there’s some special relationships between each of those layers.
For every neuron in that first layer, it is connected to each of the neurons in the second layer, but to none of the neurons in its own layer. In that second layer you recreate that. So each neuron in a given layer is connected to all the neurons in the previous layer, and all the neurons in the next layer, but none of the neurons in its own layer.
[00:28:08.24] So you can kind of envision this mesh of rows of little circles in that way, and you start from one side to go in as an input, and then you come out the other side. That is the basic image in your mind of how you might think about a fully connected feedforward networks.
I’ll note one other thing - these shows where Daniel and I talk about topics on our own, without a guest, you may have noticed that they’re called Fully Connected episodes - this is what we’re referring to. It’s named after this.
You mentioned that each of these nodes or neurons is fully connected in this network, and each one has its own inputs and outputs. Now, if we dig into one of these neurons to think about what’s inside of that bubble - and again, you can think about that visually, like a bubble or a node, or if you’re a programmer you might think about it as one of these sub-functions under a big function… But it has its own inputs and outputs, and if we think about it maybe as just having a couple inputs - let’s say X1 and X2 - what happens inside of that circle, or inside of that neuron? Well, there’s some kind of simple things that happen often.
One way we could think about processing these inputs in the neuron is to just add them up. In a linear regression sort of way, we could multiply each of my inputs - X1 and X2 - by a couple coefficients. Let’s say W1 and W2. Those are often called weights. So I just add up the two things after I multiply them by coefficients, and then I might add in an intercept or a constant, so just X1 and X2 plus something; that’s often called a bias.
In this case, I would have three parameters that parameterize the way I’m combining these inputs. So when each of my X1 and X2 come in, I combine them together in this way. And that’s all good and fine, except most relationships in our world aren’t linear, so it might be good to introduce some non-linearity into this combination. That’s where a thing called an activation function comes in… Which is just a non-linear function that’s kind of applied to this combination of inputs to give it some non-linearity.
Common functions that are used are sigmoid, or ReLu, or hyperbolic tangent etc. that are applied to this combination of inputs. So when my inputs come into that node or that circle, they’re just added up in a special way, and then output out the other end. All of my neurons in my network kind of do similar sorts of simple operations that are parameterized in a similar way. So each neuron has a certain number of inputs. They’re combined together using some parameters, and then output as a number. That’s what each neuron does.
That was a very good explanation. So as these inputs start flowing through these layers, and they’re doing this concurrently - so the inputs come in, it hits all the neurons in that first layer, simultaneously everything that Daniel was just talking about happens in each of those neurons in that first layer, and as they go through their transfer function that adds the non-linearity and then they go out, the output of each one of those neurons in that first layer goes to all of the neurons in the second layer, and it’s combined. Remember, since they’re fully connected, there’s lots of inputs potentially coming in, and they’re all summed up again in each neuron, just the way Daniel described.
[00:31:56.27] So this happens in concurrency at each layer, and it goes through layer by layer by layer, till you get to an output. Then you discover at that point, while you’re going through this training process, that you have some values coming out; you compare that against what you know to be the ground truth, that’s in your training dataset. You know what the result is while you’re trying to train. That’s when your error correction comes in, where you have to say “Okay, well I’ve ended up with an output, and it’s not quite what I was hoping it would be, so I need to change the values throughout the architecture.” The initial thing that most people learn and is most widely used is called backpropagation; that’s where you work your way back through the layers, through a set of algorithms that make little tweaks all the way through your layers, and then hey, guess what - you’ve done one full cycle and it’s time to go to the next row of your data to train that. And you do that whole process over and over again.
Yup. You might think about – if you’re trying to set these weights and biases manually, as a human what we would do is try to make an initial choice for them, try to make some predictions and then see if our predictions were good or bad, and kind of adjust the parameters accordingly… And then just do that over and over. That’s what the computer is doing - essentially, a bunch of trial and error. It’s making some predictions, and of course, there’s more sophisticated ways of updating the weights and biases rather than just kind of randomly making choices for updates, and that’s where this gradient descent comes in… But essentially, we’re just making those corrections.
Now, I think an interesting thing to add in here is we’re always talking about models; we have a neural network model, we have this type of model… So here we’ve talked about the definition of the neural network, we’re talked about all of the parameters that need to be fit for this neural network, we’ve talked about the training process that trains or fits all of these parameters, and then we’ve talked about the inference or prediction phase in which we use all of that to make predictions.
I’m curious, in your mind, Chris, what do you consider the model? What is the model in your mind amongst all of that?
The way I would think of a model is when you start out with these layers of neurons lined up - and we’re talking about the simplest use case, obviously; you can add a lot of different complexity to this over time, to achieve different architectures, and there are many architectures… When someone talks about a model though, I typically think of a trained architecture. If you think of a fully connected feedforward architecture as being something you’re training, when it gets done it has a purpose; its purpose is to make inferences about a particular set of inputs to give you an output, and that’s what I would call a trained model. It’s the architecture at work, that is deployable.
One thing that we did mention briefly is when you’re training a model, how do you know when you’ve gotten there? I just wanted to note that it’s arbitrary based on your use case, in that we’ve been talking about the fact that when you get to the end of each cycle in training, you have some sort of delta between what you have and what you know to be the truth. So that is an error that you have there; it’s a degree of error, and you have to decide for your use case how much error can you tolerate. If you can tolerate more error, because it’s not a very critical need, and if it happened to be wrong it might not be a terrible thing, then you can probably achieve training quicker and deploy. If it’s a life and death thing, and it has to be extremely accurate, then you need a very small error in your final product, and therefore you may take quite a bit more training to achieve that. I just wanted to note that’s how you know when your training is over - you’ve achieved an acceptable level of error for your use case.
[00:36:15.12] Chris, you were just talking about the acceptable level of error with a neural network, and I think something that needs to be understood here is that these nodes or these neurons can be assembled in all sorts of – from simple to very complicated ways. You could have layer after layer of these that might be fully connected or might not be fully connected, but as soon as you start adding these things up or assembling them in all sorts of complicated ways, which is really what’s done in deep learning, then you start accumulating a ton of parameters.
In some of these recent models - let’s say transformer models - that have come out for a language, there is millions (in fact, hundreds of millions) of parameters that need to be set… So when you’re thinking about the compute and the data that’s needed to actually train these models, or fit all of those parameters, now you can kind of understand why a lot of data and a lot of compute is needed… Because you can’t have like 300 million parameters and then 2,000 training examples and call it good, and say that’s gonna set all of your parameters. You have to have a significant amount of data for you to be able to learn the complicated patterns and fit all of these parameters. So a ton of compute and a ton of data is needed.
Absolutely. I think calling out the scale that you’re talking about there is important, because it is a distinguishing factor between this particular tool in data science and other tools that we’ve all worked with previously. And people say “Okay, I understand that”, and then shortly upon coming into the field you learn that there is special hardware used for the computation… And people have often asked me why is that - GPU, and stuff like that.
That is because to do these computations - which are not actually complex, but it is a field of linear algebra called matrix multiplication, and as Daniel just pointed out when he was talking about the scale of the parameters, and you might have very large architectures with many, many neurons, that are all concurrently doing these mathematical operations, it lends to efficiency to have hardware that is able to do this type of computation much faster than the hardware that came before.
That’s why you hear about GPUs and TPUs, versus something that we may have all grown up with, which was the CPU, driving our laptop and stuff, and that is this hardware enables the mathematical operations that have to happen at such scale. The fact that you have that relationship there really distinguishes this particular data science toolbox from others.
And makes it expensive, in some cases.
Yeah, so we’ve talked about the neurons, we’ve talked about architectures or combinations of these neurons, we’ve talked about what it takes to fit all of these parameters of the neurons, but we haven’t actually got to maybe what’s the most important point, which is why do neural networks work?
If you think about what we’ve done, it’s somewhat arbitrary in some ways, in the sense that we’ve just put a bunch of functions all together in a row that combine things over and over. That’s kind of simplifying things, but it’s really what we’re doing. There’s inputs, and those are fed through a bunch of things that combine them over and over, and then output something that combines that output over and over. Why does that sort of thing work?
[00:40:15.21] The way I like to think about it - and I’m curious about how you think about it, and I know there’s more formalisms we can put around it, but… The way I like to think about it is if I have a relationship let’s say between some input and output… I’m thinking of, again, the users and sales example - that might be a fairly simple relationship. It might just be a proportional one, that I can model via one or two parameters. And I just put that in, and there’s a simple relationship there.
But there’s a lot more complicated relationships in our world, like if I’m trying to detect a face in an image - there’s a lot of important things there, from color, to edges, to certain features of the face… And it’s really hard for me to write down a definition using my own domain knowledge that kind of is the definition of a model of a face. And so the way I think about neural networks is kind of just saying “Well, okay, I’m not even gonna try to write down this domain knowledge definition. What I’m gonna do is make my model definition as complicated as it needs to be, such that whatever the relationships are between my input and output, whatever those happen to be, I’m able to account for those complexities, because my model is parameterized in such a complex way.”
This takes some of the burden off of the programmer/domain expert and really puts it on the computer in terms of computation and data… Because all the assumption I’m making is that there is a relationship between my input and output, and if my definition is complicated enough, I am gonna be able to parameterize it to model that.
That is actually a great explanation. I really like how you said that. I think it differentiates from a number of other approaches one might take. When we are using neural networks to solve really complex problems, there’s a balancing act that we’re trying to do. The bigger the architecture, the more computation you’re introducing into it by default. But you need it, and there’s actually mathematically the ability – you can have a feedforward network with a single hidden layer… And since we haven’t specifically mentioned the word “hidden” - think about this neural network architecture that we talked about, and you had that input layer of neurons, and then the second layer only takes the output from the first layer and then it passes its output to a third layer, which is your output.
So you have an input, a hidden and an output, and there is a mathematical equation called the universal approximation theorem (you can go look it up on Wikipedia), which notes that a feedforward network with a single hidden layer, containing a finite number of neurons, can approximate continuous functions. That sounds like not a very impressive statement to make, but I think it’s pretty amazing in that it’s saying you can approximate all sorts of different functions out there. And I think that’s really important, because it lends itself to why this is so powerful.
Going back to it, you said a moment ago, Daniel - you mentioned the fact that it will add complexity, because if you have a really complex function, that one hidden layer might eventually get there, but it may be unreasonable in terms of the time it takes to train it to get there, to get within what is an allowable error for you… So the way we get around that is we either add more neurons, or we add more layers to it.
[00:43:59.16] So we deliberately add complexity before we know what the solution is… And in doing that, it gives this matrix multiplication a lot more options on finding all those little things; “Here’s a line, and here’s a line with another line that creates a shape”, and lo and behold, it turns into part of the face eventually, or something… So by having these layers, that complexity allows you to pick apart pieces of it and do it.
So you’re balancing “How big of a network do I want for computational expense?” versus “What does my problem require?” When you get into this field, you learn that you have to balance that, and then obviously you have various architectures that lend themselves to solving particular problems.
Speaking of getting into this field, I think maybe with the few minutes that we have left here, Chris, it might be good to just talk about, you know, if you’re getting into this field, or if you’ve done some tutorials with neural networks but you don’t have this fundamental understanding of how they operate, how can you get some of that intuition about how neural networks operate?
I know one of the things that I did in the past that was really helpful for me was implementing a simple feedforward neural network from scratch. I did that just for the iris classification problem, which is a very well-defined classic problem in machine learning, where you’re trying to classify types of iris flowers based on the measurements of their petals. I did this in the Go language, because I was also interested in that, and using that… But I think whatever language you use, it doesn’t really matter, but introducing each of these components - the neuron, the activation function, this loop of training - is really useful to gain a fundamental understanding.
If that’s kind of intimidating to you, I might recommend the great book from Joel Grus called Data Science From Scratch. He just released a second edition of that book, and he added in a bunch of things about neural networks, deep learning, recurrent neural networks… But in that book he kind of walks you through some implementations of neural networks from scratch, using Python.
I think that’s a really great way to gain this fundamental intuition, and something that I think would be even good for me to do occasionally in different languages or in different ways to help me keep that intuition.
Yeah, and not only that - there’s so many approaches… I really think it’s such a great time to get into this field right now… I won’t call it mature, but it has matured a lot in the last few years; back when you and I were first looking at it, incidentally, that’s what I did as well, in the same programming language. I created a toy neural network in Go, just to make sure that I understood where I wanted to start from and all the pieces made sense to me. It was more of a science experiment kind of thing… Before moving into frameworks, which is where the real action is.
[00:47:22.00] But there is a lot of learning, so if you’re into books, there’s all sorts of different books. There’s the Deep Learning Textbook, which was written by several of the luminaries in the field. You’ve gotta love your math if you wanna jump into that one; if you’re very comfortable with your linear algebra and your calculus, then that’s a great place to go. If you’re not so, then it’s a good reference to try to work toward, but you might wanna find some books that cater to whatever your knowledge level is.
Also, there’s a whole bunch of really fantastic courses online - Coursera has them, Microsoft, Google… There’s a bunch out there. So whatever your approach to learning is, however you consume new information best, I can almost guarantee there’s a high-value way of doing that that you can cater it around yourself. I know that didn’t really exist when we were doing ours originally, but the last 2-3 years it’s just exploded.
Yeah, there’s great online resources. I really like the Machine Learning Crash Course from Google. There’s of course the Fast.ai material that’s all online, that people love… So it’s a great time to get into the field, and hopefully this has given you a sense of what neural networks are, or given you a refresher on that, to really encourage you that we can get some intuition about what’s going on under the hood here, and that’s not too far away from you; it’s within reach, so if you have a passion about this stuff, get involved, dive into some resources. Let us know if you need help finding those resources.
I’m just excited about the next 50 episodes that we get to dive into more about AI, Chris.
I am, too. I hope people listening out there will join us in the various communities. We’re on Slack, we’re on LinkedIn, we’re on Twitter, and we really do have a lot of great conversations. As we look toward the next 50 episodes, we really want your input - what do you wanna hear about, who do you wanna hear from, what topics are of interest to you? We really wanna build the next 50 episodes around you.
Yup. And congrats again, Chris. Great to be doing this with you, and looking forward to the future episodes. We’ll see you next week.
See you next week. Thank you.
Our transcripts are open source on GitHub. Improvements are welcome. 💚