The Changelog – Episode #439
Elixir meets machine learning
with José Valim
This week Elixir creator José Valim joins Jerod and Practical AI’s Daniel Whitenack to discuss Numerical Elixir, his new project that’s bringing Elixir into the world of machine learning. We discuss why José chose this as his next direction, the team’s layered approach, influences and collaborators on this effort, and their awesome collaborative notebook project that’s built on Phoenix LiveView.
InfluxData – InfluxDays EMEA 2021 Virtual Experience (May 18-19) — InfluxDays is an event focused on the impact of time series data. Find out why time series databases are the fastest growing database segment providing real-time observability of your solutions. Get practical advice and insight from the engineers and developers behind InfluxDB, the leading time series database. Learn more and register for free at influxdays.com
LaunchDarkly – Ship fast. Rest easy. Deploy code at any time, even if a feature isn’t ready to be released to your users. Wrap code in feature flags to get the safety to test new features and infrastructure in prod without impacting the wrong end users.
CloudZero – For software-driven companies focused on growing margins, CloudZero is the only cloud cost intelligence platform that puts engineering in control by connecting technical decisions to business results. Visit cloudzero.com/changelog to get started.
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com.
Notes & Links
Click here to listen along while you enjoy the transcript. 🎧
Alright, I’m joined by José Valim, creator of Elixir and frequent guest on the Changelog. I think this is your fourth time on the show. Welcome back.
Thank you. Thanks for having me again.
Excited to have you. Lots of interesting stuff going on in your neck of the woods. And I’m also joined by – hey, that’s not Adam. That is Practical AI co-host, Daniel Whitenack. What’s up?
Yeah… Practical AI sometimes, with the font, on Zoom, it looks like Practical AL… So when we record on our podcast, normally I’m known as Practical AL.
Well, welcome to the show. I have a Tool Time reference. You’ll be my Al Bundy for this show. But that would be too old for most people to get that one. Or you can be my Adam, I’ll be your Chris Benson, and we’ll co-host this sucker, how about that?
That sounds wonderful. I’m excited to be here.
Well, I had to call in the big guns, because I know very little about this space. In fact, everything I know about the world of artificial intelligence, I learned from producing practical AI, and by listening to Practical AL do his thing each and every week… So that’s why Daniel is here.
I do know a thing or two about Elixir, but nowhere near as much as José… And here we’re at the intersection of those two worlds, so kind of an exciting time. We’re here first to talk about Nx. So José, what is this Nx thing you’re here to tell us about?
Alright, so Nx stands for Numerical Elixir. Back in November last year we started working on this. I can tell more about the story later… But the important thing is that in February we finally unveiled Nx, which is a library, but also this idea of a collection of libraries to improve Elixir, so we can start doing machine learning, data science, numerical computing, and so on.
[03:49] So I’ll just give an overview of what we have out so far, so everybody is on the same page, and then we’ll expand on that. So we started with Nx, which is the idea and the library itself, and the main abstraction in Nx, as you would expect, is multi-dimensional tensors. So you can do – when I announced Nx, one of the things that I did was that I gave a talk, and in this talk I built an MNIST classifier, a new network classifier for the MNIST dataset from scratch, just using Nx. So you can work with multi-dimension arrays, tensors, and for those who are not familiar why multi-dimension arrays and tensors, one simple example I like to give is if you take an image, if you need a data structure to represent that image, you can represent that with a tensor, and it’s going to be a three-dimensional tensor where one of the dimensions is going to be height, the other is going to be the width, and then the third dimension is for the channels, like RGB and so on. And then if you can represent the data like this, you’re going to send this tensor through data networks, through neural networks, and at the end it’s going to tell “Hey, is this a dog or a cat?” Or more complex things.
So that’s where we started. That was the first building block that we built. And one of the things that people ask a lot is that, you know, Elixir is a functional programming language, and functional programming languages promote immutability a lot, which means if you have a multi-dimensional tensor, like if you have a 15 MB image and you need to do something with it, you need to transform this image, each transformation that you do is going to copy the whole image in memory, and do a new copy. So you’re allocating 15 MB every step of the way.
So to solve this, what we did – and this is an idea that we’ve seen elsewhere. For example, in the Python community we have Jax, so a lot of the inspirations in Nx come from Jax… So the way we solved this in Nx is that we have this thing called numerical definitions. And what numerical definitions are is that they are a subset of Elixir that can compile and is guaranteed to run on the GPU. That’s how we can have numerical computing in Elixir, and machine learning, and neural networks, because we can effectively look at the Elixir code and say “Hey, I’m going to get all of this compiled to run on the GPU, and it’s going to be really fast.”
So those are the two building blocks. We can come back to this and talk a lot about those things later. And then we released two bindings for Nx. One is EXLA. EXLA is a binding for the Google XLA, which stands for Accelerated Linear Algebra. So if you’re using TensorFlow, what is running, the things in TensorFlow is Google XLA; they’re using Google XLA to compile, which run on the GPU, which run on the CPU as efficiently as possible… So we have bindings for that. We are also now working on bindings for PyTorch, to be more precise, LibTorch, so PyTorch for Facebook. They have the LibTorch, which is the C library. We are wrapping that as well.
And two months later - so that was in February - we released two other libraries. One is Axon – so we starting with the building block, which was tensors, multi-dimension arrays, numerical definitions… So we released Axon, which is a high-level library for building neural networks… And we just announced LiveBook too, which is interactive and collaborative code notebooks for Elixir. So that’s kind of what we have released in the last two months, and it’s just the beginning; there are still a lot of things we wanna do… But we are really starting on working on this ecosystem and building it up.
So José, I’m curious, from the AI perspective - and I’m going to have to admit, for listeners, that I know almost nothing about Elixir, except what I’ve learned on the Changelog Podcast from you, in previous episodes… So I’m curious - from the community standpoint, what was really driving your motivation to spend so much time on these things? And we can dig into the individual components, but like you’re saying, the main components that I think can make this very functional, it sounds like, are there, and are being built… But from the community standpoint, were people requesting this? Were people trying to roll their own neural network stuff in Elixir? From your perspective, what led up to that side of things?
[08:27] That’s a great question. To give some context - one of the things… Like, going way, way back, it always started because of the Erlang Virtual Machine; the only reason that Elixir as a programming language exists is because of the Erlang Virtual Machine… And the Erlang Virtual Machine was built by Ericsson, which is a telecommunication company for building concurrent, distributed and fault-tolerant software. I don’t want to expand on that; you can check Elixir on the website… But all of these things, my love for the Erlang Virtual Machine – so when I created Elixir, I was like “I want to have as many people as possible building on this platform, because I love it, and I think other people are really going to love it and enjoy it, too.
So I’ve created Elixir, and I’ve always thought – in terms of programming languages, I really think that Python is a really stellar example of tackling a bunch of different problems. I always had in mind that I want that for Elixir and for the Erlang Virtual Machine, for the Erlang ecosystem. I think we can grow diverse to solve all the different kinds of problems. So I come from a web background; I was a member of the Rails core team almost a life ago… When I started with Elixir, I had this obvious web background, and that was the first dimension that Elixir took off, with the Phoenix web framework… People started using Elixir more and more for the web. Elixir was already a good natural fit for building distributed systems, or anything regarding the network due to the Erlang heritage…. But it was like, “I’ve always wanted to try to expand this.”
The first time I expanded this was back in 2016 we released abstractions for data pipelines and data ingestion. So if you need to consume queues, and you need to do that very efficiently, we released libraries for that, and that brought Elixir to a new domain, which was like data processing, and there are some very nice use cases on our website… For example how change.org is using data abstractions that we wrote back then to process – because if you have a petition that one million people signed, you need to send them an update; now you have to send an email to a million people. How are you going to do that?
So we started that segment, and then the community started to grow, so people started bringing Elixir in the Erlang Virtual Machine for embedded. So there is the Nerves framework; people started bringing that to [unintelligible 00:10:53.25] streaming… And then there’s always the question, “Why not numerical computing? Why not machine learning?” So I always had this interest; I feel like it’s part of my responsibility, part of my job to try to broaden the domains and the areas of the language. The community is also doing that a lot for a bunch of areas… But you know, if there is something where I feel like “Hey, this is a good opportunity. We can do it”, then why not? Let’s do it.
And this always started - just to finish giving more context - when PragProg… I always had this interest. Actually, my thesis, my master thesis was in task classification. But that was 11 years ago… So we were not talking about deep learning at the time yet; I think everything was still support-vector machines were kind of state of the art. I never fell back, but I always had this interest.
[11:49] So in October last year, PragProg announced a book, which is “Genetic Algorithms in Elixir.” And then I was like, “Hey, apparently there is somebody who knows things about AI and machine learning in the Elixir community”, and he is Sean Moriarity. I sent him an email and I was like “Hey, I think the platform could be good for us to do everything in machine learning”, and he said “I agree. Let’s work on it.” And we started working on it. So it’s kind of like “Why not? If we can make it happen, let’s make it happen. Let’s build this, and then later we will continue working on how to package and how to sell this to people and say like “Hey, what are the benefits of having those two worlds joined together and working together?”
So if we stay big-picture but we do a bit of a comparison, trying to understand exactly your aim here… If I was a happy NumPy/PyTorch, that Python data scientist kind of a person, are you hoping that maybe someday the Nx based and Elixir-based tooling would draw me over to Elixir? Are there aspects of it that it’s gonna be well-positioned better than Python? Or are you more just saying “Well, let’s bring this area of computing to existing Elixirists” and hope to give them more tools? Or are you also thinking from the other direction?
Honestly, I never tried to look at it that much ahead. For me, my goal right now is that for example – imagine you are building an application in Elixir and then you need to do something with machine learning or data science, and like “Oh, I need to go to Python to solve this problem.” If we have a tooling – so you don’t have to go there, and you can stay within the community; I would already consider that a tremendous victory, just because that was not an option in the past. So if people there are starting to make this choice, I would already be very happy, and I would be like “Mission accomplished.”
And then we’ll see. Baby steps.
Daniel, what tools do you use in your day-to-day work?
Yeah, I like the framing of how you’ve just framed it, José… Because actually, my team’s toolset - we develop models in Python using TensorFlow and PyTorch, but typically, in terms of the products that we’re building, or what we’re developing - we’re developing either API servers, or something, and for the most part we’re doing that in Go. So a lot of times what happens is exactly what you were saying. So we’re happy writing our API handlers in Go, and everything’s nice and wonderful, and then we basically just have to call into Python to do an inference, potentially.
Now there’s new stuff coming onto the scene in the Go community as well to try to support that same sort of workflow, where – like, I would love to not do that. If I was working in Go and I didn’t have to call into Python, that would be super-cool. And I think that’s still developing.
So I totally get what you’re saying - if you’re working in Elixir, then it would be great for those developers to not have to do this sort of awkward call into Python for inferencing. It’s awkward in always managing that and monitoring it and all of that is sort of dicey… Also though, I think that there is this sense in the Python community - well, I’ll say the AI community - that Python’s sort of consumed the whole world… But I don’t think necessarily out of a particularly one good reason why it should consume that whole world… Because it’s kind of like all these scientists or grad students working on computational science and working on AI - they’re like “Well, all our stuff that our advisor wrote is in Fortran. I don’t wanna write Fortran, so I’m gonna write this Python stuff that wraps around my Fortran…” and then people just start writing Python a lot, because it’s pretty easy to get into, so they do all their scripting in that… And eventually, this science world just sort of started latching onto Python and building things there.
[16:07] I don’t think it’s necessarily the best tools for AI will be built using Python; actually, I think a lot of my frustrations in life are because of working in Python. And I’m not trying to bash that, because it’s also great, like you’re saying. I think there is an opportunity for both sides of things I guess is what I’m getting at.
That’s interesting to hear that. José, one of the things you did with Elixir which I appreciated and I think a lot of people appreciated, because you’ve got a lot of people loving and using the language… As you took all of these things that influenced you and that you appreciated, and you brought them together - your love for Erlang was the reasoning. But then you went to your language design and you designed a language and you pulled in ideas from Ruby and ideas from Perl and ideas from functional languages, I’m not sure which ones… But you’ve told this story before, and you can probably reiterate all your influences. And you kind of made what I think is a really beautiful language out of it. But it was based on your history, your knowledge, your taste, what you liked… Here you are, doing numerical stuff, and you’re doing data sciency stuff, and I just wonder, how do you acquire that taste, how do you acquire that knowledge? Do you just know every domain very well, or how did you learn this stuff? I know you said back in school you were doing statistical things, but how have you come up to speed on what would be an awesome way to do numerical Elixir?
Yeah, so this time it has really been shown in Jekyll. So all the deep learning, how things should work. Sean - he’s really the one leading it… But the main seed that led to this was actually [unintelligible 00:17:47.13] before we started working together, I sent a tweet; I don’t remember, but it was asking about some references… And then he pointed me to the Jax library in Python, which a lot of people are taking it to be next big library, potentially replacing TensorFlow. That’s what some people speculate. But it’s from Google, there’s a lot of traction behind it.
And then I was reading the docs for Jax, so we were saying “Hey, Elixir is a functional programming language, and as a functional programming language, everything’s immutable, so working with multi-dimensional data would actually be very expensive.” But then I’m reading the docs for Jax, which is a Python library, and then they have quotes like “Jax is intended to be used with a functional style of programming.” And then they say “Unlike NumPy arrays, Jax arrays are always immutable.” And then I was like “What is happening here?” So it was like this reference, like hey it’s functional… spider senses were tingling… “Okay, wait, wait, wait… There is something here.” That’s when Sean and I, we jumped with both feet in and we were like “Okay, there is really something here.”
And the whole idea in there is because the way that Jax works and the way that numerical definitions in Nx works is that when you are doing all the operations in your neural network, like “Hey, we need to multiply those tensors, we need to calculate softmax, we need to do the sum” - when you’re doing all those computations, you’re actually not doing those computations at the moment. What you’re doing is that you’re building a computation graph with everything that you wanna do in that neural network. And then they get this computation graph, and when you call that function with a particular tensor with certain dimensions and a certain size, it emits highly specialized code for that particular type of tensor, for that particular graph. And that’s why everything is functional, because what you’re doing is building a graph, you’re not doing any computations… And then you compile that to run in the GPU.
[19:53] When we saw this idea, it was like “Hey, everything can be functional.” And when it started, it was like a bunch of happy accidents, a book being published… I like to say, I really have a thank you for PragProg, because if they did not publish this book, if somebody read the proposal that Sean sent to PragProg and said “Hey, we don’t need a genetic algorithms book for Elixir”, maybe none of this would have started. And then somebody pointed us to Jax, so it was all those things happening, and that kind of like gave me a path for us to explore and come out of this.
So I said, “We are going to start working, and as we build the tools, we are going to try to find what advantages Elixir can have compared to other programming languages, for example”, and it turned out that as I kept saying what I thought would be a negative aspect, which is immutability, it really turned out to be a feature. And it’s really interesting, because there are some pitfalls in Jax, for example. If you go to the Jax documentation, they have a long list of pitfalls; so there are some pitfalls in the Jax documentation that - they did not happen in the Elixir implementation in Nx, because everything’s immutable.
So the way that Jax works is that – in Python they call it the tape pattern. Basically, as they’re calling methods in an object, it is requiring all the methods that you call. In Ruby we know it as method missing, but there are some operations in Python that cannot be recorded. For example, if you are setting a property in the object, or if you pass that object to a conditional, you don’t know that that object is being used in a conditional. So Jax cannot record that structure in your code, so they have some pitfalls, like “Hey, you have to be careful.” Or if you have a for loop in Jax, what it’s going to do is that it’s going to unroll the loop, and that can lead to very large GPU code. But in Nx everything is immutable, so we don’t have those operations in the first place. And because we have macros, I can actually rewrite the if to be an if that runs in the GPU. This is really cool - in Nx, when you go to the numerical definitions and you look at the code, that code, no pitfalls, is going to run on the GPU, is going to be sent to the GPU. It’s effectively a subset of Elixir to run on the GPU. So yeah, it started with this small tip, and then it kind of spread from there.
So sitting on top of Nx is Axon, which is Nx-powered neural networks. Do you wanna give us the skinny on that tool, José?
Yeah, so it’s pretty much what the name says, it’s neural networks built on top Nx. A lot of those things, Sean is the person behind it - Axon, EXLA, it’s all Sean’s work. And what he did for Axon is that he built all of the building blocks of a neural network, he built just using functions. They are regular, numerical definitions; and numerical definitions are regular Elixir functions. So he just built a bunch of functions, and then you can compose them together to build the neural networks.
[24:03] So he built all of this – it was really funny, because I think we can still find it in the repo… He created the initial issue, which I think had like 100 checkboxes, which was just like all of the functions that you use, all the initialization functions, optimizers, layers, activations - everything that you have in a neural network that you usually use; he listed all of those, and then he implemented most of those, and then he came up with a higher-level (still inside Axon) API. So you can say “Hey, I have a neural network that is going to be this dense layer, and this convolutional layer, and this activation, and this… And I wanna train it”, and you’re done.
So the same level of API convenience that you would expect from Keras or from PyTorch is there in Axon, but the building blocks as well. That’s what Axon is about. It’s a little bit out of my reach of my understanding… And it’s kind of funny, because I can run the basic examples, but I still don’t have a GPU. And then if you’ve got a convolutional neural network, if you’re going to train it without a GPU, it’s going to take a lot of time. So I cannot run some of the examples, but Sean added already a good amount of examples to the repository.
We have some very classical datasets that people use in machine learning, like MNIST, CIFAR… I don’t know if I’m pronouncing those correctly, Daniel, but you probably what I mean. The Fashion-MNIST, and so on… And he has examples of – and then [unintelligible 00:25:38.11] ResNet, and this kind of stuff… And there are examples already in the repository. And for those things running in Elixir and compiling and running on the GPU, which is very exciting.
Don’t you have a GitHub Sponsors or a donation button, man? Let’s get this man a GPU. Someone’s gotta get you a GPU.
Yeah, I know, right?
Come on…! [laughter] The world would be a better place if José Valim owned a GPU. I’m gonna put it on record.
Yeah, I was really – just an aside… I was like “I’m going to buy a Linux machine, so I can have the GPU.” And then Apple came out and was like “Oh, we have TensorFlow running on M1.” But they released just the compiled executables, and not the source code… So I’m like, “Do I buy a new machine that is going to take space in my house, and then three months later Apple - the thing is going to be merged in TensorFlow and I’m never going to use it again?” So I’m suffering from decision paralysis; should I invest in this thing or not?
Well, you’ve come to the right place. This is Daniel’s expertise right here. This guy - he builds these things in his house.
Yeah… Unfortunately, it’s all crazy right now. I know we ordered a server, and we had to switch the GPUs because of – I don’t know if you saw NVIDIA’s… They kind of got mad that everybody was putting consumer cards in their enterprise servers, and so that all got switched up, which - I understand their business, but… Yeah, that whole world is crazy right now in terms of actually getting your hands on something as well.
Supply shortages and everything?
Yeah, it’s insane. Just scrolling through this, I’m pretty excited to try this in my little workstation with a GPU. I think it’s cool that – again, I’m coming not from an Elixir standpoint, but I recognize the API; it’s very Keras-like, this high-level API that you’re talking about, where you’re like “Well, I’ve got a dense layer, I’ve got a dropout layer, whatever it is.” That instantly makes sense to me. I feel like I could take this API and create my model definition fairly easily, and I really like that. Being a Python user and coming from that outside world - it makes me want to play with this. If it was a totally different sort of looking API, I think I would be nervous to dive in. But I also see - you have your model struct, you have your layers, you have your high-level API, and you talk about it like it’s just an Elixir struct, and so serializing it to multiple formats is possible; I’m talking about the model itself.
[28:15] So I don’t know a ton about Elixir structs, but serializing it to multiple formats is really interesting to me, because - at least from my perspective, what I’m seeing is a lot of push for interoperability in the AI world, where people publish their model that they wrote in PyTorch on PyTorch Hub, and then I’m over here with TensorFlow, but I can pull it down and convert it using something like ONNX tools or something and use it in TensorFlow… There’s all sorts of frameworks out there, and I think people are generally realizing it’s not gonna be one that wins the day, but interoperability is really important if we’re going to release models and expect people to be able to use them. Was that sort of factoring in your mindset as you were thinking about how to represent models in Axon?
Yeah, definitely. When Sean was working on it, from the design, he was thinking “How can we get an ONNX model, load that into an Elixir data structure so we can get that, and send to the GPU, and have that running on the GPU?”
It goes back to what we were talking about a while ago - I think that the first users of this… Maybe I’m wrong, and I’ll be very glad to be wrong, but I think the first users are going to be like “Hey, we have our data scientists that are super-familiar with this tooling in Python that is very productive, very useful for them, and it’s harder to convince them to migrate… But hey, we are running Elixir in production.” I just want to bring that model and run directly from Elixir. I think that’s very important for that use case, and the whole purpose of interoperability.
One of the things that I think is really worth talking about with this idea - a lot of people, when they think about Elixir, they think about web. But Elixir is also really good thanks to the Nerves framework for embedded, and I think there is a lot of potential in this area of having machine learning neural networks running on the edge, and that can be an interesting application as well. And that requires kind of the same ideas, because you’re not going to try it on the device, right? So you need to build the model elsewhere, and do all these steps, and then bring that into the device.
Serialization is there, and I think it’s a matter of time. A lot of those things we are working on them. We also started a machine learning working group in the Erlang Ecosystem Foundation for people who are interested in this. So it’s something that we plan to work, but if somebody is really excited about this - so if you are listening to this show, like “Hey, I wanna try this out, and maybe I can implement ONNX serialization” and you would like to work with us, send a PR; it’s definitely welcome. We can have a link to the Erlang ecosystem foundation, the machine learning working group in the foundation. So we have a Slack, people can join, you can talk to us… There’s a lot of work to be done, and the serialization is definitely going to play a big part of it.
So how long have you both been working on Axon? Because it just seems like there’s so much implemented. You were talking about “Hey, we need all of these different layers implemented that people know about.” Typically, I see libraries maybe that have a new API for machine learning or something - it seems like it takes them so long to add operations and add support for different layers and such, and I’m wondering, what was your thought process and approach to building this in a way that you could come out of the gates, supporting as much of that as possible.
[31:59] To give you an idea… So Sean has been working on it in his free time. He started working on Axon as soon as we announced Nx. So he has been working on it for two months on his free time, and it already has a bunch of stuff. If you check the readme, it already has the – I won’t be able to say everything, but the dense layers, dropout, convolutional layers, a bunch of optimizers, like 7-8… So he has been able to add those things really fast. I think one of the reasons for that is because the foundation are just building functions on top of functions, so it’s very easy to compose.
The other thing is also that I think one of the reasons - I’m speculating here, to be clear. I think maybe one of the reasons why some of those libraries - it takes a lot of time for them to implement a layer, it’s because they are implementing everything; they are going maybe like from Python all the way down to the SQL, or C++ code and implementing that.
For us it’s a very layered approach where Axon just works about Nx, Nx is the tensor of abstraction, and then we have the tensor compiler stuff that compiles for our XLA. And working those different layers - when you’re working at Axon or in Axel, you are really at the high level; you are not really worrying about C, C++, any of that; you’re just saying “Hey, what are the tensor operations that I need to do?” And I think that’s why he was able to be so productive in building all of these features in this short timeframe. And I think adding new activations layers that are relatively straightforward, what takes more time and discussion is where we need to change the topology, because that requires to think about how the struct is going to represent – for example if you have a Gann or a recurring neural network, now you’ll have to think “Oh, if it’s recurring, now I need to get the data fitted back inside”, so you have to think how you’re going to model that. But it’s m mostly– it’s just at the high-level representation. So that’s kind of how things have been structured.
Yeah, I cloned the repo, and his first commit was January 25th of 2021…
It’s pretty amazing.
…with a few to follow. And it’s funny, because the first commits are like “Add some functions. More functions. Adding even more common functions.” So he’s just like cranking out these functions, like you said.
Yeah. So that was in January. Okay.
Yeah. So a couple of months.
Yeah. But while working on that, he was still working on XLA and Nx with me. We started in November; in November it was Sean and I, we were working part-time, so it took us about three months to release Nx and XLA. And then Sean - he’s still working with Nx and XLA, and then his focus after we announced it in February changed to be on Axon until we announced it. And now we are probably kind of going back and forth between projects, because there’s still a bunch of things that we want to build in Nx.
One of the things that I really want to work on is streaming. So Elixir is really good for streaming, and I want to have very good abstractions, so we can start streaming data to be inferred into the GPU, so we don’t have to load everything into memory. Or for example if you have a webcam or a camera that it’s either an embedded device, or you are getting from WebRTC or something like that, and you want to send that straight to the GPU and stream it… So we can do all this kind of interesting stuff that I think we can do. So yeah, we’re going to be jumping back and forth a lot.
I think it speaks to the power of a solid abstraction too, and like a layered approach when done well, when you get to those higher layers… Like you said, unless you have to change the topology; if you’re just adding and building on top, and not having to drill down through each time, then you can move relatively fast.
[36:04] There’s probably also an aspect of this where it seems like Axon’s API is trying to be familiar, and so a lot of times, at least for me, the slow part of software is getting that API figured out, and rewriting that API so that it’s better… And maybe there’s a step-up because of all these other projects that have come before, that makes it familiar to Daniel and other people who are working in this world.
Exactly. That’s a very good point. And I think on the Axon side, one of the inspirations - I think there’s a project Think AI in Python, which is a functional approach…
Yeah, there’s a team in Europe that writes the SpaCy library, which is an NLP library, and I think their main backbone for that is Think.
I see, yeah. So that has been one of the inspirations as well, and I think there’s PyLightning, or LightningTorch, or something like that, that has also – so yeah, that’s a very good point. So if you can look at what people are doing and say “Hey, this is what I think is good, this is what I think is going to fit very nicely at what we do”, that speeds up the process considerably as well.
I mean, there’s just such diversity in the AI world in terms of the types of models that people are building… But there is a fairly consistent – if you look at the implementations, whether it’s TensorFlow, or PyTorch, or these other frameworks, you can kind of get a pretty quick sense of how they’re building their architecture, looking into the source code…
I’m just looking at some of the layers that are implemented in Axon, and like I said, I think you’ve done a good job at – like, I don’t know how to read Elixir; I can sort of get the sense of what’s happening here, and I think that’s a testament to following some of the good inspiration that’s already out there in the world… And also, I think it will be easier for people maybe that do want to jump and experiment in Elixir from the Python world, and they want to add their own cool layers into Axon - it’s gonna be a lot easier for them to jump in and do that, I think, if they feel like they’re not in a total foreign world, and they recognize some of these components and all of that… So I definitely think that that’s a good call.
I know that some of data science/AI world kind of operates with a weird set of tooling, that includes these things called Notebooks, and other things… There’s even some functionality related to interactive coding, and cells, and that sort of thing too, isn’t there?
[40:02] Yeah. So there is a separate project; another person has been working on this project, Jonatan Kłosko. When Sean and I started talking like “Hey, we want to build this foundation for machine learning, numerical computing” and then we mapped a bunch of things that we have to do… And there are a bunch of things that we have not started working on yet. For example, we don’t have an equivalent to data frames, so that’s an open question that has to be solved. We don’t have plotting libraries yet… But one of the things that we wanna do was this idea of the interactive and collaborative notebook.
And I said “I wanna do this. We wanna build this notebook thing as well”, which we called LiveBook. So that’s the LiveBook project. And the way it started was very funny… So we have a project called ExDoc, which generates documentation for Elixir, and we are really proud of it. We think that our documentation just looks great, and it’s standardized… All the projects in the community - they generate this documentation with ExDoc; it has a bunch of great features… And somebody, some time ago opened up an issue, “Hey, this project is using jQuery, jQuery is huge, we probably don’t need to use jQuery anymore.” So somebody opened up this issue in the issues tracker… I was like “Sure, it sounds like a good idea if somebody wants to do it.”
At about the same time, John (another Jonatan) had released something like a notebook for Elixir as well, a very barebones one; so he had some experience from Python and we brought him in, like “Hey, if we are going to do this, how are we going to do it? What are the benefits?” And then we were like “Okay, so one of the things that we wanna do is that we want to leverage the fact that it’s very easy to build collaborative and interactive applications in Elixir.” So it needs to be collaborative from day one, and it is…
There is a video on YouTube of me announcing LiveBook, and it’s really cool, because it shows how LiveBook works, it shows Axon, so there are some good examples… So it’s like “Hey, it needs to be collaborative from day one”, and we really wanted it to be interactive, because one of the things – so for those who are not familiar with Elixir, the Elixir runtime, it’s very easy to extract a lot of information from the runtime, like what your code is doing, we break our code into lightweight threads of execution, so we can inspect each of them…
[44:06] So we wanted to be interactive not only for people that are working with machine learning and numerical computing, but if you want to get data out of an Elixir system, like a production system, and try to say like “Hey, where is my bottleneck?” you should be able to do all that; you should be able to interact with a live system as well, and interact with your neural network that is training. This feature is not there yet, but it’s part of our vision.
And then I said, “Well, what do people complain about in Notebooks?” That’s always part of the research. So if you go to like Jupyter, what people usually complain.
A lot. [laughter]
What DON’T they complain about?
What we heard was like “Well, the format that it writes to disk - it’s not good to diff, it’s not easy to version-control. So how are we going to solve that? The dependencies are not explicit and the evaluation order is not clear as well, so how can we solve all those things? We brought our set of inspirations, we brought the problems, and we started working on how we wanna solve this.
A couple weeks ago we announced it (maybe 1-2 weeks ago), we announced LiveBook. Or maybe it was last week. Anyway, it’s there. Our vision is not complete; you can see the important parts included there, of like it’s fully reproducible, the evaluation order is clear, your dependencies need to be explicitly listed, so everybody who gets a notebook knows exactly what they need to install, and the notebook’s going to install it for you.
John - he created a format called LiveMarkdown, which is a subset of Markdown that we use for the notebooks… Which is really change, because now if we change a notebook, we are just changing a markdown file, which means you can put it on version control, people can actually review your changes without having to spin an instance of that thing and make that work… So for us, it’s a step, again, into this ecosystem, and I think there’s a bunch of things that we want to explore and try out… And we’re just trying to be like a very modern approach for interacting and collaborative notebooks.
And there are other things happening in the space… There’s Jupyter Notebooks, there’s also Pluto.jl, coming from the Julia folks. There is also Deepnote, which is a software-as-a-service… So we’re kind of looking at everything and coming up with our own takes and ideas as well.
That’s awesome. I’m glad that when you looked at this, you took that perspective of not “We need notebooks.” People love notebooks, but what’s wrong with them? Because I think there have been a lot of – there’s notebook kernels for all sorts of different things for Jupyter, but they all suffer from similar issues. And of course, I love Jupyter, and it’s powerful, and people use it with great success… But I think after people have used it for so long and they’ve seen these consistent issues… I think the whole managing state thing that you mentioned, and the execution flow is probably the top one on my list. So now you’re really tempting me to try it out.
It also seems like you release something cool every week. I don’t know how that works… I don’t release something cool every week, so I’m feeling really deficient right now…
[laughs] I’m with you.
I don’t have anything new to release for now…
Until next week.
Daniel, what we need to do is find some really talented university students and inspire them to work on some stuff for you.
I guess so, yeah.
Yeah, Jonatan has been excellent at this… And it was his first LiveView application, so I think it’s both a testament to Jonatan and to LiveView, the fact that he could build this thing in three months, while still studying, working part-time. And go check it out, go check the video. I’m really excited about LiveBook, it’s really interesting.
[48:25] One thing about the notebook is that in my opinion it was a very different approach to how we approach Nx and Axon. For Nx and Axon we were like “Okay, let’s build this and see where this leads us.” But for notebook it was like “This is an area that Elixir is really good at, and I really want to have our take on this.” I think we can make this ours, our version of this; how our vision, our understanding of this… And of course, that requires looking around. But it was a very different thought process. It’s just like “Hey, I think we can build this, and I think we can build this great, because we have great tools for that.”
Just to make it clear - out of the box it works distributed as well. For example, if you have a bunch of people using notebooks, for some reason, and you want to start five machines in production, and have people connect to those machines from anywhere they want, it just works, out of the box. There’s no need for external dependencies. They don’t need to bring Redis, you don’t need to bring a database… So everything was really build using the – again, if you go to the beginning of the talk, where I’m talking about the Erlang Virtual Machine, and they are building telecommunication systems… Imagine you have this platform and you can build collaborative notebooks. So that was kind of our idea, our take.
How does it do that? Because it looks like it only runs on local host. Maybe there’s a way to – how do you tell it “Hey, I’ve got ten nodes that I want you to run across”? Is that just configuring Phoenix?
By default, we run it on local host. By default, if you run your own machine, you don’t want to expose that and have somebody access the notebook…
Yeah, it’s like a public-facing eval, right?
Yes, right. Imagine you’re at an ElixirConf and somebody would just be “Who is running notebooks here? I can’t–” Right now I think we just need to trick the configuration file. But one of the things that we are working on, we are going to get in the release - we are going to ship both Docker images and a command line executable. Then we will flags for all this kind of stuff.
And most likely, what people want to do is that they want to say “Hey, I am deploying this to Kubernetes, so I’m going to use something that uses the Kubernetes DNS manager to connect the nodes.” In Elixir you would use something like Peerage or Libcluster, that figure out the topology and connects everything for you.
Yeah, and I can definitely confirm that people will want to spin these things up everywhere. Now I’m not surprised when I hear this, but the first time I started hearing production notebooks, I was like “How do you have a production notebook? It’s a notebook. How are you running a notebook in production?” But this is so pervasive, people are like “Oh, this is my production notebook, and this is my dev notebook” and all of these things.
I don’t know if I’d go that far, because I don’t know how to support a notebook in production, but it is such a pervasive idea… It’s cool to see that as a piece of this. And of course, there’s other things too; you were mentioning Pandas, and other things… For people that aren’t familiar, in Python there’s a library called Pandas, which deals with tabular data, and you can do all sorts of cool data munging stuff… So yeah, it’s cool to hear you say that those things are on your mind. And because you release a cool thing every week, maybe that will be next week, or the following one.
[laughs] Yeah, right now I think we are going to tackle graphs, because it’s part of the notebooks… But I’m hoping for the data frames stuff other people are going to step in… And we are having a bunch of related discussions on the Erlang Ecosystem Foundation Machine Learning Working Group, and this kind of stuff.
[52:12] And sure, machine learning, and then we can talk about neural networks… There’s so much work to be done, and so many things to explore… So people that are excited will jump in, and we are going to have a feast. We didn’t talk about clustering, forests, classifiers, regressions… And then we can talk about linear algebra… There’s just so many things in the ecosystem that one can build and explore… There’s a lot of work to do, and we hope people will get more and more excited and they are going to join us in this journey.
Yeah, it seems like if you’ve got the graphing thing going, and you’re talking about Elixir having the natural abilities with web development, with LiveBook and the other things here… You know, a big thing in the AI world is monitoring your training runs with a bunch of cool graphs, with something like a TensorBoard, or something like that… So it seems like – yeah, that would enable a lot of things. It’d be pretty sweet to have your training run going in Axon, you kick it off from a LiveBook, and then you can pull up an interface to see all your nice training plots, and all those things, and that’s all happening in a really nice, unified, robust way.
Yeah, that’s definitely something that we’ll explore at some point. Probably TensorBoard integration as well is something that we are bound to have.
Yeah, it seems like LiveBook really could be your marketing machine; it could be your way in for all the disillusioned notebook sharers out there, who’ve had – like Daniel said, they can do a lot of stuff with Jupyter Notebooks or existing tooling, but there’s pain points with collaboration, with all these things. The fact that one of your headlines is sequential evaluation - to me that seems like… Shouldn’t that be how everything works? It says “Code cells run in a specific order, guaranteeing future users of the same LiveBook…”
Not so quick, Jerod… [laughter]
I’m like, “That’s a feature? Isn’t that how things work?” [laughs]
I mean, it’s kind of the wonderful thing about Jupyter Notebooks and the really hard thing about them. It’s similar – if you go back in history, I don’t know if either of you ever used Mathematica, but it’s a similar idea. You have these cells of execution… It’s really wonderful for experimentation, because you can “Oh, you did this…”, but when you’re in experimentation, you expect things to fail all the time. So you don’t expect to have a script that runs and you unit-test it etc. You expect to try something and fail, and fail over and over and over, until you tweak it enough to where it works. So that’s great, in the notebook environment, if you can tweak things like that.
The problem is then “Oh, what were the 4 million things that I did tweak to get this to go, and what state is saved in my notebook?” I could get it to work and then reboot it and run it from top to bottom and it’s not gonna work again. So it’s the good thing and the bad thing.
Yeah, and I’m pretty sure it’s like this feature, let’s say, this sequential evaluation is going to be a limitation at some point. People will be like “Hey, I started training my neural network, but now I want to do something else in the same notebook, while the neural network is training. How can I do that?” So we’ll have to come up with ways of branching, but we’ll want to be very explicit on the model. We’ll say “Hey, you can branch here”, or what we have been calling it internally beacuse everything is organized in sections. We have to think maybe we can set up some asides. So aside may fork from a particular point and execute the code based on those bindings. It’s basically the state of the notebook from that moment on, without ignoring the other side. So it’s something we’ll have to tackle.
[56:09] If you look at the issue tracker there are a bunch of things that we have been thinking about. For example, one of the things that I wanna do – so when you persist a notebook, they are persisted to the file system; so one of the issues is for example pluggable file systems, and I want to make a GitHub file system, so you can easily persist your notebooks to GitHub, and that works transparently from LiveBook, without you having to say “Hey, I need to clone”, and stuff like that. We can work directly on the repository, and I think that’s going to be a boon for our collaboration as well. Or not collaboration – I mean, a different kind of collaboration. You put it on GitHub so somebody can fork and implement it.
I know there’s this thing in the Python world called Binder. Essentially, you could create a GitHub repo with a notebook, and then you click on the little badge and it just pops up a hosted version of that notebook that will run, so you can give it a Docker image or something, with all the dependencies. For someone like me, if there was that tie-in with GitHub and I could just launch a notebook and try Axon, I feel like people would just latch on to that so quickly.
Then the barrier is not like “Oh, Elixir is sort of new to me as a Python person, so I need to figure out the toolchain, but really what I wanna do is I just wanna click Shift+Enter through a few cells and see how it works.” And that’s very powerful.
Yeah, that’s a very good point, something for us to look into.
Well, you guys have done a lot, but there’s a lot left to do… What’s the best place to get involved? Like you said, fertile ground; what did you say, hop in and have a feast, or something? If you’re interested in this space and in Elixir, it sounds like there are lots of ways to get involved and to build out a lot of the stuff that’s lacking… So is there a Discourse forum, or is there a Slack, is there a community around this? Or is it just you and the Dashbit folks working on it? What’s the situation there?
We have the elixir-nx organization on GitHub, but a lot of the discussions are happening in the Erlang Ecosystem Foundation; we have the Machine Learning Working Group… So if you go to the EEF website, you can get all the working groups there, and you’re going to find machine learning… And then you can create an account (it’s free) and then you can join the Slack and we’ll be there. So that’s where we are usually chatting things.
Originally, a lot of those things were kept confidential, like LiveBook, but now everything – at least everything that Dashbit was working on is out in the public. We don’t have anything, no more secret projects. So that’s the place to go and where we’re talking about things. We have a monthly meeting where we meet and discuss and exchange ideas… So that’s definitely the place.
Is Nx bringing machine learning tools to Erlang, or are there other Erlang but not Elixir efforts in this space? Do you understand what I’m saying?
Is this the first time in Erlang the BEAM-based tooling around numerical computation is happening, or is it like Erlang-only things that have been going on?
I think it’s the first time for the ecosystem. And because you can call Elixir from Erlang with no performance costs whatsoever…
Yeah. It’s pretty cool, right?
You can just call – like, the numerical definitions, they don’t work in Erlang because they translate the Elixir AST. Or not the Elixir AST, but they translate the Elixir execution to the GPU; that wouldn’t work with Erlang, but everything that we are building on top, like Axon, because it’s just building on top of the abstraction, so somebody could go get Axon, call it from Erlang, build a neural network from Erlang, and just run it, and it should just work.
That’s cool. Daniel, anything else from your side of the fence you wanna ask José about before we let him go?
I’m just super-excited about this. Hopefully, there is some cross-over from the Python world. It seems to me like the timing is such that people in the AI world very much are more open to trying things outside of the Python ecosystem than they once were… And so yeah, that’s my hope, and I definitely wanna play around with this, and I appreciate your hard work on this. I’m excited to try it out, and also share it with our Practical AI community.
Awesome. I’m really glad that you are having me on the show, and I was able to share all of those ideas and this work that we have been doing.
You’re welcome back any time. All the links to all the things are in your show notes, so if you wanna check out José’s LiveBook demo on YouTube, we’ve got the link to that. We’ll hook you up with the link to the Erlang Ecosystem Foundation if you wanna get involved… Of course, Axon and Nx are linked up as well, so… That’s all. Thanks everybody for joining us, and we’ll talk to you again next time.
Our transcripts are open source on GitHub. Improvements are welcome. 💚