Practical AI – Episode #156
Photonic computing for AI acceleration
with Nick Harris, CEO of Lightmatter
There are a lot of people trying to innovate in the area of specialized AI hardware, but most of them are doing it with traditional transistors. Lightmatter is doing something totally different. They’re building photonic computers that are more power efficient and faster for AI inference. Nick Harris joins us in this episode to bring us up to speed on all the details.
Featuring
Sponsors
Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com
LaunchDarkly / TrajectoryConf – Software powers the world. LaunchDarkly empowers all teams to deliver and control their software. DevOps and feature management are reimagining how we build and release new products. On November 9th and 10th, LaunchDarkly is hosting Trajectory Conference 2021 — a two-day event for software innovators who want to break orbit, not systems. Trajectory is a fully-virtual conference that focuses on the technology, people, and processes that continuously deliver better user experiences and more powerful software. Register for free at trajectoryconf.com
Linode – Get $100 in free credit to get started on Linode – Linode is our cloud of choice and the home of Changelog.com. Head to linode.com/changelog OR text CHANGELOG to 474747 to get instant access to that $100 in free credit.
Notes & Links
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Welcome to another episode of Practical AI. This is Daniel Whitenack. I am a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?
Doing very well, Daniel. Excited about today’s show. I think we have some pretty good stuff coming up here. I’m just kind of raring to go.
We’ve talked about a lot of computers on this show, and accelerators, and cards, and GPUs, and VPUs, and all sorts of Us, but we have yet to talk about photonic computers. That’s going to be the topic of today’s conversation. Today, we’ve got with us Nick Harris, who is CEO of Lightmatter, and we’re going to hear a lot more about that and photonic computers today. Welcome, Nick. It’s great to have you here.
Thanks for having me, Daniel and Chris. And nice to meet you.
A lot of people might have not heard of photonic computers, or even know where they sit in the sort of landscape or ecosystem of accelerators, and types of processors, and all this stuff… So maybe just give a little bit of a background on how you came to encounter this topic, and then also maybe some of the motivations behind this whole topic of photonic computing.
Yeah. So I’ll start out with just talking about the AI field as it’s growing. In around 2012, we had AlexNet come out. AlexNet was a neural net that was able to do image recognition tasks that were really incredible, and there was no other technique that could really keep up with it, and it made a great business case for companies like Microsoft, Google, Facebook, Amazon, and so on, to start building neural networks, to do special kinds of problems, things that were very hard to write code for, but that neural networks could solve.
Over the past 10 years now, we’ve had a really fast rate of progress in the amount of compute that’s going into these state-of-the-art neural networks. It’s motivated a lot of companies to start building accelerator chips to try to speed up these computations and make it so that we can make bigger and bigger neural nets. Some people’s objective on this bigger and bigger scale is really to try to build something that’s on the scale of a human brain. That’s kind of the trajectory where things are going.
[03:57] There’s huge capital expenditures that go into building these kinds of AI supercomputers that can do incredible tasks, and what we’re doing at Lightmatter is we’re trying to take a crack at that market, with a unique twist. We’re using photonics to do computation. I can tell you how I got to that place.
Please do.
Nicholas Harris: Yeah, I was an engineer at Micron, working on transistor device physics. Basically, I was getting a lot of exposure to the fab processes, and some of the challenges that companies not just Micron were having, and shrinking transistors and getting more performance out of them, year over year. I decided to go to graduate school, ended up at MIT, and studying quantum computing. It’s a bit of a leap. What Lightmatter is doing isn’t quantum computing, but I did spend those years building photonic quantum computers, and that happened right at the advent of that AlexNet neural network coming out. What we realized in Dirk Englund’s group at MIT was that you could actually use the same kind of processor we were doing for quantum computing to do computing on traditional neural networks, using just lasers, and the benefits would be pretty massive.
Could you tell us like what that implies? What does that mean? How to use lasers for computing? Can you kind of bridge that gap between those two?
Yeah, there’s a little bit of history on this… We’re not the first people who have looked at photonic computing. People have realized first I think in the ‘80s that you might be able to do neural network evaluation with photonics. And that field kind of like petered out for a while, but we came at it from a totally different approach. We were looking at quantum computers, and we had a different motivation and a different set of funding pitches for getting money to build these machines. We really took integrated photonics - that’s a field that, largely, the way that you experience integrated photonics is through silicon photonics transceivers. These are devices in data centers that send your communications, your data between different server racks, maybe they’re hosting a game you’re playing, maybe they’re mediating your iMessages, something like that, but we use that core technology to start doing computing.
Now, how do you do it with lasers? In the quantum computing case, we weren’t using lasers, we were using single photons, but that’s a story for another day. Lasers are devices that generate light. You give them an electrical signal, and they’re able to start generating light, and they’re used in all traditional communications. The internet that we’re talking over right now is over fiber optics, and the communication is sent using lasers to do that. So we’re able to leverage that same infrastructure, lasers and silicon photonics, to actually do the core computations that happen in deep learning. These computations are tractable really for anybody to understand. Deep learning as a field is really just relying on multiplication and addition to do evaluations. So the mathematics behind deep learning are relatively straightforward. Understanding how to come up with these things isn’t straightforward, but when you actually run these programs, it’s a lot of multiplication and addition. And indeed, you can use lasers and silicon photonics to do multiplies and addition. I’m happy to go into how that works a bit later on.
Yeah, I’m curious… Probably, the main thing people think about is like GPUs, with computation on deep learning… And you’re exactly right - I was just pulling up, while you were talking, the recent work from Microsoft and NVIDIA, I think on the Megatron model. They just have a figure - it’s like, I think most people in our audience have heard of BERT, heard that in the context of AI, which had 340 million parameters in the model. GPT-3, which I think, Chris, you just had a conversation with someone about on –
Last week, yeah.
[07:57] Yeah, 175 billion parameters. And then this Megatron, which is the new natural language generation model, which was talked about just recently - 530 billion parameters. Just commenting and following up on your conversation about these large models is pretty insane. I don’t know – I think generally, people have in their mind racks and racks of GPUs when they’re thinking of this…
With a photonic computer - and this is maybe just like a very simple question, but to have something in people’s mind where… So I’ve been near some quantum computing labs where they’ve got dilution fridges, and it looks like this whole vacuum system, and all that. And then I think about laser labs, with all these lasers, and you’ve got these lenses and stuff… That seems very different to me then that “Oh, I’m going into the data center. I see these racks of computers.”
Just as a very simple question, if you were to go look at a set of photonic computers, what does that look like form factor-wise and connection-wise, in comparison to some of these other things?
Well, I would say the biggest deployments of AI are in hyperscaler environments and in cloud environments. To play in those spaces, you need to avoid doing anything weird. A big dilution refrigerator or a vacuum - that’s going to be a problem. And if you look at photonic computers, the things that we’re developing, they just look like a normal silicon chip. They do have optical fibers that come out. You do need to get light into the processors, so that it can do the calculations, but it’s really just a standard computer chip that looks like a card sitting in a server, but there’s a little addition of having a laser in there.
Maybe just on the larger picture here - there are so many different ways that people are looking at powering AI computation, because the market opportunity is so big… And we’re all trying to power these 500 billion weight neural networks. It’s going to take a lot of computing power, and that power is the principal problem. Computer chips are getting way too hot. That’s really one of the fundamental things that we’re interested in trying to help with.
When you said it was one of the principal things, is that kind of primary value proposition when you’re saying, “Go with photonics”, laser-driven, it doesn’t heat up, you don’t have the massive problem that you have to deal with in a data center, where you have a bunch of, is that the primary thing? Are there other performance characteristics or non-performance characteristics that play into the field in general, in terms of the field itself having a value proposition?
Yeah, so I can say some really general things about computers. If you have one processor and you run a program on that processor, you will get one processor worth of performance. If you take two processors and try to scale it up, it’ll be just less than two processors’ worth of performance. When you get to 1,000 nodes, you’re going to be looking at something that is kind of doing the amount of computation that half the number of units will be doing.
Yeah.
So, what I’m getting at with that is, as you try to power bigger and bigger neural networks, you need to scale up. And the reason that you need to scale up is that the individual computer chips that we build today, Intel’s chips, AMD’s chips, NVIDIA’s chips - they all consume a lot of power, and you kind of have to spread them out; and they’re getting to be really big. So this heat problem, in addition to what’s known as Amdahl’s Law scaling, where every time you add a unit of compute, you do not get a unit of performance out - those two things work together. So really, power efficiency is tied to compute scaling, it’s tied to compute per chip. There’s really a maximum amount of heat that you can pull out of a processor. So it’s just this whole story that’s built around - you can’t be trying to dissipate kilowatts of power in a single chip. You’re not going to be able to cool it, and you’re certainly not going to be able to continue a roadmap where you’re pumping more power through chips. It’s performance, energy efficiency, and all that stuff is fundamentally linked, and they sort of hamper your ability to scale out.
[12:06] I know that we’ve talked on the show a couple times about the environmental impacts of what we’re doing with these large models, and I think there was that one study - I forgot how old it is now… It’s probably a couple years old now, where it was talking about one of these large language models is like running five cars into the ground for their whole life period to train once and you know, and we’re training them multiple times. So as you’re talking about power, a lot of what I’m thinking is related to that sustainability side of things. In terms of that power requirement side of things - I mean, lasers still require power, I’m assuming, but… So I don’t really have a sense of what scales are we looking at on a GPU accelerated AI, versus photonic base…
Yeah, maybe we can start with the bigger picture around the energy scaling problem.
Sounds good.
When you double the number of transistors on a computer chip, that’s Moore’s law. It should happen every 18 months. Maybe we’re a little bit behind schedule on that, but Moore’s law is mostly okay. Every time you double the number of transistors on the chip, they’re going to need to use less energy in order for that chip to not get really hot. Since around 2005, when we’ve shrunk the transistor, the amount of energy that they use isn’t shrinking commensurately, and so the chips are getting hotter and hotter and hotter. It’s pushed us to a spot where - you’ve heard a lot about system on chip and Apple’s recent announcement of the M1 chips.
Yeah.
These are a system-on-chip platforms, they have lots of different functionality, but it turns out that if they turned all of those functionalities on at the same time, you would really hit the thermal limits for the system. This is a result of the fact that the energy scaling and transistors hasn’t continued with their shrinking. That’s called Dennard scaling. So that’s what’s toast right now.
When you look at solutions from NVIDIA, for example, they’re really pushing the limits of what’s possible to cool. I can tell you, we have a packaging team at Lightmatter; their job is to make sure that you can pull the heat out of your computer chips, and we’re all very impressed with how much heat they’re able to get out of the A100 processor. It’s something like 450 watts.
There’s a new chip from Intel, Ponte Vecchio, and that chip is 600 watts. Really cool technology, it’s awesome, but this power thing - it’s a real challenge. It turns out that once you get to those kinds of numbers, Ponte Vecchio is water-cooled. So you really have to go from the heatsink and a fan, which is what you’ll find in your computer at home, unless you’re an enthusiast and you love water cooling… You have to move towards water cooling and after that, you’re seeing advertisements from Azure, Microsoft Azure, where they’re doing immersion cooling. So they take computer chips, and put them underneath apparently edible oil. Don’t ask me how I know it’s edible.
From the diner?
Yeah, I don’t know if it’s fried grease, but it’s apparently edible. I don’t know why you would ever eat it. Maybe it’s like a reusable type material.
I don’t know why they would advertise it as edible. It’s an odd characteristic to note.
Nick Harries: It shows safety, I guess. If you can eat it, it’s probably pretty safe. It sounds like a long chemical word, so… Edibility is probably a good sign.
The basic point here is that, if you look at technologies for powering AI right now - they are all based on transistors. There’s never been a computer that’s not based on transistors. This is the way that the world does computation. But we’ve run into this fundamental challenge around shrinking how much energy the chips are using. And you can’t really do that going forward. So if you look at the Department of Energy’s estimate for energy consumption, in 2030 - so in about eight years - 10% of the entire planet’s energy consumption will be on compute interconnect. If you know about compound annual growth rates, if we look at 2040, you’re talking about most the overwhelming majority of the power being used on this. You always have to think in business about the used case that could motivate using that much of the planet’s power, because that would cost a lot of money.
[16:15] I think what happens is that you will start to see that progress in AI will slow, because of the heat problem, because it puts a lot of financial pressure on data centers, and the people who scale these things. To be clear, those neural networks that Daniel was talking about - those are trained by the biggest companies in the world. And they’re hundreds of millions of dollars for the supercomputer, and running those models costs like $10 million. These are massive scale problems, and I would go out on a limb and say it’s already probably an uncomfortable amount of money for those companies that they’re spending on these things.
That’s the state of things today. Everything’s based on transistors, and you’ve really got Dennard scaling, which by the way, it’s underpinned by quantum mechanics. It’s not something that we can fix. TSMC doesn’t have a solution for how they’re going to get rid of this little pesky energy scaling problem. I wish they did, but that’s kind of where photonic computing comes in. So we don’t really have to worry about the quantum tunneling effects, because we’re not using transistors. It’s a totally different type of device. And so that part of the scaling doesn’t matter so much. Hopefully, that starts to give you a picture of—we just sort of said transistors - that’s how people do computation. For deep learning, we think we can do it with optics. We think we’ll use a lot less energy, and we’ll get rid of that whole energy scaling problem. Just walk around it, because they’re using different physics.
Nick, you just started getting to where I really was interested in discussing next, which is around like, okay, we think we can do sort of these computations related to Deep Learning and AI with photonics… I’m curious, I guess a first question is where we add in that process? Is it like this is being done and has been being done, and I suspect you’ve done certain things? How far are we to doing training and our inference in a reasonable way with photonics?
Alright, so there’s been a history here… I’ve been building these systems as part of my graduate work, and now the company at Lightmatter for about a decade. And over that time, we’ve demonstrated a number of the applications running language processing on these chips - that was done at MIT - along with a bunch of other cool applications of the processor technology.
At Lightmatter, we announced the Mars chip, and that was at Hot Chips 2020. Hot Chips is a conference on computer architecture, and that chip is capable of running state-of-the-art neural networks. What we’re up to right now at Lightmatter is we are gearing up to start delivering our processors to customers; very big companies that are interested in energy-intensive AI and trying to power their roadmap. They really care a lot about how do you get to the trillion parameter neural network, and beyond.
[20:04] So we’re very far along in this journey; we’re quite confident that you’re going to get to see it, and we’ve built lots of prototypes along the way, peer-reviewed stuff in Academia… It’s been published in Nature, Nature Photonics, Nature Physics… All the Natures. So it’s real stuff, and we’re going to be selling it.
Yeah, and before we go into the actual, like, what you’ll be selling, and how it’s related to AI and such, I’m curious - that’s been a long road… I’m sure it hasn’t been all smooth sailing. Have there been points along that path where you’re like, “Oh, this is like… I don’t think we’re going to make it”? Or what have been major bumps in the road or major achievements along the way that have happened?
Entrepreneurial war stories here.
Yeah.
Yeah, building companies is a lot of work. Hiring people, hiring really good people - it takes a lot of time. I have a lot of entrepreneurial war stories. In terms of challenges along the way, I’d say that the first couple years at the company we were spending a lot of time trying to figure out exactly what the photonic compute architecture would look like. We knew it was silicon photonics, we had GlobalFoundries as our partner fab. But we had to figure out exactly what we were going to build. And there are a lot of ways to build these processors, make no mistake. We’ve patented about 70 ways to do that. But there’s only one that really works well, from what we’ve seen. So narrowing that down was a lot of work, building teams is super hard, and especially building teams where you’ve invented a new field. Silicon Photonics is a technology that’s been deployed for about 10 years now, so data centers have been using it in long-haul communications, across the country communications have been using silicon photonics, but people were not trained in using silicon photonics for building computers, because it didn’t really exist. So that’s some of the big stuff.
The other piece is our – you know, you have to build a supply chain for these things, and it’s not straightforward. There are a lot of people who touch the hardware, from start to finish. And some of the steps are with companies that are smaller, and you have to work with them to try to get things productized and stable, and all that sort of thing. You can think about lasers, for example.
So can you describe a little bit – for those of us who are trying to wrap our minds around what this kind of chip is like…? Obviously, you can’t describe in detail the internals, nor would you want to, but give me a sense of what a photonic chip is like. What are some of the considerations that go into it, for those of us who are brand new to this field, and just trying to—I know at the end of the day you’re producing a board that’s going to go into my computer, but what’s this little photonic magical thing that’s inside that? How does that work? What’s different about it from the way I might already be thinking about these other architecture chips that we’ve talked about?
I can say something—
Okay.
—sort of boring, which is… Have you heard of Google TPU? So GV, formerly Google Ventures, is an investor, and our technology looks a lot like a TPU.
Okay, yeah.
It is a matrix processor, with a lot of other things wrapped around it. It’s not just doing matrix processing. There’s a lot of things wrapped around it, emphasis on that, but it’s basically doing linear algebra. We have multiple processor cores. Our product’s actually a quad core computer, just like an Intel quad core chip, and each one of those cores is doing linear algebra. If you were to look at the chip and just follow the light, what you would see is values from an image; let’s say we have a pixel that’s red, green or blue, and there’s an intensity for each pixel, from zero to one. What you’d see is a brightness of light corresponding to that intensity number, if we’re doing image processing, kind of like being distributed around the chip, and then you’d see these photonic units in a two-dimensional array, receiving the signals, and then doing multiplication and addition. That’s basically it. So it would look to you a lot like a TPU. It is like a box, like a 2D- array of Multiply-Accumulate Units, except you would see light being distributed to all of the individual components.
[24:19] You talked a lot about heat and its sort of motivation in this. I imagine that there’s a number of – you think about the heat problem, but also with this sort of chip, I imagine there’s a lot of sort of interference-type things if we’re thinking about light. Is there different considerations, like maybe you have some benefits in the heat side of things, but are there other shielding or interference or other challenges related to that, in terms of building the physical unit that you have to take into consideration, that you wouldn’t have to take into consideration with another type of computer?
Yes and no. We’re using light. Light signals can’t interfere with each other. That’s kind of an interesting property. However, whenever you build a metal wire, it ends up being kind of an antenna. And so in chip design, there are these things called antenna rules, and digital chips have to worry about these, too. We have to observe these things, so that we don’t get unwanted signals from radio hosts or something getting into the chip and messing up the optical computation through the wires. But otherwise, I think that it really just looks like a mixed-signal chip. There’s analog circuits, there’s digital circuits, and then the photonic circuits, which for all practical purposes, are just analog-type circuits. So you have to be careful about coupling unwanted signals through antennas that you didn’t intend to be there, but otherwise - no, not really.
So I could still have all my RGB lighting in my computer case and have flashing, cool stuff?
Yes. Yeah, if that’s the question - absolutely. It turns out it’s very hard… We have these things – and I should have mentioned this… We have these things called waveguides. They’re about 300 nanometers wide, and about 200 nanometers tall, and they’re optical wires that are on our chips. They’re really tiny, and we have hundreds of thousands of them on the chip. It’s very hard to get light into those. And so if you shined a flashlight on our waveguides, you get nothing into the waveguide. You have to really want to get the light in there for that to work out, which is fortunate and unfortunate.
I’m kind of curious, if you’re looking at the wire world that we’ve been in for so long, and the physics, as it gets smaller and smaller, and they’re running into all challenges, heat and otherwise, how does that translate? You just talked about how small that optical wire is, if you will, and the fact that because of that, you don’t get the light into it… What are some of the – kind of in the science of this, what are some of the limitations as you are constantly trying to get to smaller and smaller architectures to scale out? What are the things you have to think about?
Yeah, it’s an interesting question. At first glance, you would be like - alright, you’ve got a new computer. We’re going to want to shrink it every 18 months and have a photonic Moore’s law or something, some scaling law. You could go down that path, and the way you would do it is you’d start out—we work at a 1,550 nanometers wavelength. You could go down to 1,310 nanometers, and then you could go down to 900 nanometers… By the way, your eye will just start to pick this up, and it will look like red… And so on down. And then maybe you get to ultraviolet, which is what gives you skin cancer. And then we keep going. We get to X-rays. And the whole time, these optical components are shrinking; they’re going from 300 by 200 nanometers, eventually at the X-ray scale, to a few nanometers dimensions. You could do that, it’s possible, but it turns out that the light sources that you need to follow that path would be pretty tricky. And when you build optical wires, they care a lot about the quality that you build those wires with. The light wants those wires to be very smooth; so if you’re not really careful about the quality of those wires as you shrink it, you’re going to have a hard time. And luckily, it turns out we don’t need to shrink them at all.
[28:18] Well, you just took the next question right out of my mouth on that one, so keep going, please.
Yeah, so it turns out that, in a traditional computer, if I said that your CPU is clocked at three gigahertz, you’d be like, “Yeah, that totally makes sense”. If I say 20 gigahertz, your like, “Hmm, I’ve never heard of that”. It probably will never be a thing, because I remember—when I was in undergrad in 2005, I had a three gigahertz processor then, if I recall correctly. So that frequency hasn’t been scaling. With optics, the frequency you can operate at it is very high. We work at about 1550 nanometers. That corresponds to a frequency of 193 terahertz. That’s a lot of bandwidth.
Yeah.
It turns out that you will never practically get anywhere near that kind of bandwidth, because you have to talk to the thing. Ultimately, these things have to talk to electronic computers, because that’s how the world works. And so you get limited. Maybe you can do 20 gigahertz, 50 gigahertz, something like this. But the point is that we’re able to turn up the clock frequency to obviate the need for shrinking these things; because we’re going to give you more performance per unit area through the clock frequency.
And there’s one other thing. So yeah, we can go fast and clock, but we can do something else special, which is we can use multiple colors at exactly the same time. If you remember, we have this 2D array that’s doing matrix computation. Each one of the elements in the array can process different colors of light at the same time. Each element could do three colors at the same time, 16 colors at the same time, something like this. What that means is that in the same area, that same array size, two colors of light - you’ve just doubled the compute in the unit area. Three colors of light, three times the compute per unit area, and you see – there’s really no need, in principle, for this sort of shrinkage of the photonic devices.
Well, as Chris knows, I’m always interested in the very practicalities of things, and I’m imagining, “Okay, we have this processor, and it works, and it can do some computations”. You have it integrated into some system… But ultimately, I have TensorFlow, and I know how to run TensorFlow on a regular computer or on a GPU, because there’s underlying libraries, which eventually get mapped into some type of machine code that is mapped onto this processor. So maybe just describe – like, okay, you have this photonic computer… How do you even start to think about that process of integrating software into the system, and making sure you support tooling that people want to use, and that sort of thing?
Yeah, that’s a huge amount of work. So building the photonic computer is really hard in terms of the physics and engineering, but building a software stack that integrates with PyTorch and TensorFlow is a ton of work, but it’s something that we’ve bid off, and we call it IDIOM. IDIOM is Lightmatter’s Software Development Kit. What we allow you to do is take neural networks that you’ve built in PyTorch or TensorFlow, import our libraries, and we have a compiler that can take the emissions from PyTorch and TensorFlow and build machine code that runs on Envise. Envise are photonic computer.
So we have our own instruction set architecture, and our compiler emits that; ISA is what people call it. Instruction Set Architecture. It’s a ton of work. We have a pretty big software team here. I expect by the end of next year that we’ll have significantly more software engineers than hardware engineers, even though we’re building these crazy chips. Generally, that’s the trend… Certainly in the machine learning space, but in computer chips in general, because people - they don’t really care about computer hardware. They just care that it’s fast and it keeps getting better, and that it doesn’t have errors, and it’s not annoying to use.
[32:13] So we’re very focused on just delivering the same experience that you’re used to with PyTorch and TensorFlow. And you can just use our alien technology and not worry about what’s under the lid; but it’s a very big effort.
If we’re thinking about that, just to clarify for a moment… If we’re going to PyTorch or TensorFlow and we have your chip in the system, and we have IDIOM there, it’s basically taking that, converting it into something that works on your hardware in that context. But from our standpoint, as data scientists or software developers or deep learning engineers, our workflow is more or less the same, if I’m understanding you correctly? Is that accurate?
That’s right. Right now, we’re targeting inference. The way you’d imagine this working is you’ve got a neural network that you’ve trained in PyTorch or TensorFlow, and then we have a bridge to ONNX, and then ONNX to our compiler, and you’re good to go.
That makes sense.
Yeah. So the chip can do training, but we’re not focused on it right now. They’re very, very different markets, with very different things that you optimize for. Happy to talk about that, but inference and training are really drastically different beasts.
Yeah, maybe you could touch on that a little bit, because I think maybe at least for people getting into deep learning, a lot of the tutorials on whatever they see is all about training, right? So it becomes this perception that training is the major problem to solve as a data scientist or an AI researcher at your day job. Could you maybe, on some of your work with Lightmatter, but also just working with clients and understanding their needs, maybe speak into that in terms of why inferencing is such a big market, but also really important for practical use cases of AI?
Yeah, so training is R&D mode. You’re building a tool that takes inputs, and then tells you something about those inputs. If that’s all we ever did in machine learning, no companies would fund this stuff, because the ultimate goal of building those models is that you want to use them for something. If you’re Google or Facebook, these neural networks get deployed at massive scale, and their job is to take a user query, run a neural network on the query, give you results. And sometimes there are lots of neural networks in that chain, and that is where the overwhelming majority of the energy footprint of AI will be over time.
Training is a really big deal. You gave the example - I think MIT Tech Review did that analysis that you were quoting, where it was like five cars over their whole lifetime is how much energy the training was using, and I think the equivalence was in carbon emissions. So yeah, inference is deployment, which is where most of the scale is, and training is just doing the R&D, that lets you do that thing that makes you money. Because the training stuff doesn’t make you money.
As you go, I actually want to go back for one second and touch on something to combine two things that you said. As we’re doing this and we get to, as practitioners, keep the same workflow (more or less) that we’re doing, and we are still able to use the tools that we’re already using to be productive, we’ve now been supercharged by having these photonic processors in there… I may not be using the right terminology on that. And we’ve talked about scaling without having that limitation from – the thermal limitation that we’ve talked about. To go back for just a second, out of curiosity, is your limiting factor then the receptor of the light inside there being able to detect the number of different lightwave frequencies, the number of different colors that are available, so that in theory, so long as you can build better and better receivers for more and more colors in there, they can distinguish—
Yes.
—then you’re essentially unbounded by that constraint, until you hit whatever that practical limit may eventually be?
Yeah, that’s a great question. So what are the ultimate bounds on using color and frequency.
[36:09] Yeah.
Detectors are pretty broadband, so they can detect a large number of different colors at the same time. No problem there, but there’s a device called the multiplexer (MUX), or a demultiplexer (demux). If you can think of the Pink Floyd album cover of a prism—
Yeah.
—and there’s a white light that comes in, and a rainbow that comes out the other side - that’s identically what I’m talking about. That specific kind of device that takes in a stream of multiple colors and spreads it out into the individual constituent colors. That device is what will limit you from getting to 64 colors or 128 colors. Eventually, it gets really hard to accurately separate those colors. When you’re at this point, scale-out becomes really important. By the way, that’s a long ways away—
Okay.
—in terms of the roadmap for photonic computing. When we’re at 64 colors, the amount of compute on the chip is something like hundreds of Peta OPS.
Wow.
It’s crazy. So like decadal timescales. But then, when you get there, what’s really important is that you can efficiently take lots of cores, lots of processor cores, and connect them to each other. Earlier, I was talking about Amdahl’s law, and the fact that you add another unit of compute, and unfortunately, because of communications, you don’t get another unit of throughput. We’ve invented an interconnect technology that - surprise, surprise - uses light. It’s a Wafer-scale computer chip, 8 inch x 8 inch, so it’s the size of your laptop screen practically, and it allows you to let your processors talk optically, and it can dynamically configure how they’re connected. Do you remember those old switch rooms where people are making phone calls, and they need to connect to some other…?
Yeah.
So imagine you have someone doing that, but at the microsecond time scale; they’re able to unplug the optical waveguides built into this computer chip, and plug them into different locations at crazy high speeds, and that allows you to scale it. So whenever you run out of colors, scale-out is going to be really important; and we’ve invented a technology for that, and it’s called Passage.
That’s really cool. I love that analogy, and I love the fact that you’re also thinking about scale out not in terms of always making things smaller and pushing those boundaries, but doing it in other creative ways as well. If I’m understanding right, you have the chip… So in terms of what you’re bringing to market and what you’re doing, you have the chip, or processor, you have this Passage technology, and then the IDIOM software integration.
I’m wondering if you could just highlight a few used cases or tests that you’ve done with these technologies in terms of models that we might be aware of about inference, time, or what differences are you seeing, and how is that impacting actual inferencing?
Yeah, so comparing our technology to competitors.
Right.
On the IDIOM side, it’s a software stack. The goal is really just to make sure that it doesn’t slow down the hardware, compared to what it’s theoretically capable of. So we’ll leave that on the table. On the Envise side, on our website, you can see a comparison between Envise server and NVIDIA’s DGX A100 server. We use a very small fraction of the power, so that chip’s quoted at about 80 Watts, compared to NVIDIA’is 450 Watts, and a pretty significant boost on ResNet50, BERT, and then I believe we also looked at DLRM. I don’t have the numbers off the top of my head, but in many cases, it’s multiple times faster. But what’s extremely important from a scale-up perspective is if you look at the power; and just like if you think about it this way - we can up the performance. We’re not anywhere near that power envelope. So we can keep going up there. There’s a point though - and this is why we invented Passage - where making Envise faster is great, but it’s just sitting there, waiting for work. It’s like this ultra-capable person, and you hand them a paper once a day, and they’re just bored out of their mind. So we really had to invent Passage as a way to keep Envise busy. So that should give you an idea.
[40:13] It does. I know this is day-to-day stuff for you, but this is quite remarkable for those of us not in the field. Where do you see this going? I mean, is this what everything inevitably has to go to? Do you think that the field will stay with lots of different technologies that are competing? What does the future look like to you? How should the rest of us be thinking about the years forward in our career, versus where we’re at now? How much change is coming?
So what we’re doing is targeted AI. For general-purpose computing, where you’re running Windows and you’re playing a video game, probably photonic computing will not be able to contribute meaningfully. There are some really hard challenges in running general programs. To be Turing-complete, you need to have these nonlinear operations. If you’re a programmer, you can think about branching. Like, this ability to have conditionality, if and then. Doing these nonlinear conditional type behaviors with optics is super-hard.
You mentioned a future where potentially there are lots of competing technologies… I think that’s exactly what will happen. We’ve had this incredible platform with transistors and Moore’s law scaling. That’s been so general purpose. We use it for everything. What you’re going to see going forward is you’ll have photonic computers for AI, and our plan is to help them dominate that field, because we think that they’re very well-suited to the mathematics that underlies deep learning. And I think you’re going to see quantum computers. I love quantum computing, I did my PhD in quantum computing, but I’m not as bullish on the right-now timeline. I think they’ve got quite a while left, but I’m rooting for everybody working on it.
In general, I just think it’s the case that we’ve reached the limit of transistors, being able to do everything good enough, and you’re going to see a bunch of different types of technology platforms, all competing. You’ll see analog electronics at the edge, you’ll see digital electronics for running your games, you’ll see photonic compute units for doing your deep learning…
My goal is to have all of Google run on Lightmatter’s processors. So when you say, “Hey, Google”, that goes through Envise. That’s where I want to get to. So you can kind of see… This is how I see it. These technologies are all just suited to different kinds of problems.
We’re really looking forward to seeing how you bring this technology to market. It’s incredibly impressive. I’m blown away by the amount of work that’s gone into this, and just the amazing thought process that goes along with it. We’ll include links that we’ve talked about in our show notes. For everyone listening out there, definitely check out Lightmatter. See what they’re doing. There’s links that we’ll post there with some of these benchmarks and other things.
But Nick, it’s been a real pleasure to talk to you. It’s just amazing stuff that you’re doing, and I’m really looking forward to maybe talking next year and seeing how far you got, and how many Envise processors are running in Google.
Awesome. Well, thanks for having me, guys.
Our transcripts are open source on GitHub. Improvements are welcome. 💚