Practical AI – Episode #160

Friendly federated learning 🌼

with Daniel Beutel, one of the creators of Flower

All Episodes

This episode is a follow up to our recent Fully Connected show discussing federated learning. In that previous discussion, we mentioned Flower (a “friendly” federated learning framework). Well, one of the creators of Flower, Daniel Beutel, agreed to join us on the show to discuss the project (and federated learning more broadly)! The result is a really interesting and motivating discussion of ML, privacy, distributed training, and open source AI.

Featuring

Sponsors

RudderStack – Smart customer data pipeline made for developers. RudderStack is the smart customer data pipeline. Connect your whole customer data stack. Warehouse-first, open source Segment alternative.

Me, Myself, and AI – A podcast on artificial intelligence and business produced by MIT Sloan Management Review and Boston Consulting Group. Each episode, Sam Ransbotham and Sheervin Khodabandeh talk to AI leaders from organizations like Nasdaq, Spotify, Starbucks, and IKEA. Me, Myself, and AI is available wherever you get your podcasts. Just search Me, Myself, and AI.

Notes & Links

📝 Edit Notes

Transcript

📝 Edit Transcript

Changelog

Click here to listen along while you enjoy the transcript. 🎧

Welcome to another episode of Practical AI. This is Daniel Whitenack. I’m a data scientist with SIL International, and I’m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How was your Thanksgiving, Chris? It was U.S. Thanksgiving, for those listeners that aren’t in the U.S. and might not be aware.

It was very good. Nice family stuff, flew around the plane things like that. Now we’re into the holiday season, and looking forward to see what kind of machine learning gifts are under the tree this year.

Yes. Well, in the spirit of distributing machine learning to all the boys and girls, maybe not by Santa… [laughter] But a couple weeks ago, you and I had a conversation about federated learning. Now, neither you or I is an expert in that area or a practitioner in that area, although I think it was a good conversation. But today we’re privileged to have Daniel Beautel with us, who is one of the creators of Flower, which is one of the open source federated learning frameworks that we talked about. He’s a co-founder at Adap and a visiting researcher at University of Cambridge. Welcome, Daniel.

Thanks. Thanks for having me.

Yeah. Well, as you heard, Chris and I were talking about federated learning without being experts in federated learning… So maybe to follow up on that conversation and maybe for people that didn’t hear that conversation, could you just give us a sketch of what federated learning is, and then we can take it from there?

Yeah, of course, I’m happy to. So federated learning is a way to train models across multiple data sets. That’s the very easy take on it. So you might be wondering, how does this work? The way you do it in federated learning - and let’s just start off by giving an example… Let’s say we have, for example, a group of hospitals, they have some in-house data, but due to regulations, they cannot share this data, and they cannot put this data in the cloud, and they can’t use the usual machine learning workflow where you basically collect all of the data in a central repository and then train your model on it. So that’s not an option for them. So they might be interested in using federated learning, and how would a federated learning setup then work in such a scenario?

So the way it works is that you have your plain old machine learning model, let’s say it’s your network, for example a CNN that does some kind of image classification, maybe you want to look at radiology images, for example… And you would initialize this model in a central place - let’s call this a central server - and the central server would, after initializing the model, send this model out to all of the participating hospitals. So it would send the uninitialized model, but there are other variants of it, just to say this for the sake of completeness… But in our initial example, just to explain the very basic version of it, they would send out the initialized model, so the model that hasn’t learned anything yet, the model would then be trained locally, within each hospital, on the data that is available locally… So each hospital, obviously, has a different data set. They would train the model, not until convergence, but they would only train it for a little while. So let’s say, they would train it for one or two epochs. And after they train the model for one of two epochs, they would send the updated model parameters or the gradients that they accumulated back to the central server. So that way, they don’t have to share the data; the data stays where it originated. The data always stays within each participating hospital, and the central server would only get the refined model parameters, so the model parameters that have been trained for one or two epochs; it would get that from all of the participating hospitals. And what the central server then does is it aggregates those parameters. In the simplest version, just as a weighted average of these parameters.

What I’ve just described, this way of initializing while sending it out, training it locally, collecting the updated parameters, and then aggregating the parameters - that is one single round of federated learning. And then what you usually do is you perform these rounds over and over again, until the model converges. And the interesting part about it, why organizations actually do this, is they get access to a lot more data than they had before. So we’ve probably all had this experience, especially in practical AI projects, that oftentimes there is just not enough data, and having more data beats any fancy model architecture. So in this case, federated learning solves this data access problem. They can collaborate on the model training without having to share the underlying training data. Yeah. That’s the gist of it.

That’s a good explanation. That’s much better than the one we were trying a few weeks ago. [laughs]

Yeah. We should link this episode to that one, because it took us half an hour to get there. [laughs]

We just need to voice over what he just said to what we said. Yeah, totally.

I may have left out a ton of detail, right?

I get it, but we can ask you questions and find out what some of that is, and looking forward to that. So as a starter, it’s very clear, given the data is distributed in terms of where it’s located and given laws and regulations and other such things that may constrain the training process with privacy concerns and stuff. It’s very clear what the advantage is in federated learning. What also might be considered some disadvantages, or maybe another way of asking it is, when you do consolidate the model, after you’ve done the federated learning and stuff, what is the delta in a trained model, versus if you had not done that, if you had been able to aggregate, kind of in the traditional way, all the data into one spot and train it in the traditional way we’ve done before federated learning? What’s the difference in what you get as an output, or is there much of one?

Yeah, there is a difference. The biggest difference, I shall say, is obviously in convergence time, because you have these rounds of communication, and also the averaging process has some impact there. Often, as researchers, we make these comparisons between centralized learning and federated aspects of it. The interesting bit is that this comparison is somewhat artificial, because it’s not something that one would face in reality very often. It’s either federated learning or nothing. We’ve seen this in the past, right? If you look a little bit at a journey that machine learning and deep learning now took - the summer around 2012 we realized that by making these models bigger, we suddenly get better accuracy. So there was this ImageNet moment, and then a couple of other moments like this afterwards. And we saw that we can achieve ever greater accuracy and other performance metrics with these models.

The thing is we always – when we read a research paper, for example, and when we look at these recent advances, it’s often quite fascinating, and it’s often in the context of web-scale companies like in Google or Facebook, who have these massive amounts of data in-house. But then, often in practice, there’s this realization that, “Okay, I read about this cool technique I’m trying to apply to my problem”, and suddenly, I don’t get amazing results that I expected to have. So the question is, what happened, right? And in many cases, the answer is really that the amount of data and the diversity that you have in your local data set is just not enough. And the interesting thing, and the thing that got us very interested in federated learning, was this realization that for many of those cases, you might not have a large data set on your own, but there are a lot of others just like you, who are facing the same challenge, and who might want to train the same model, but they also have some data, but not enough data for a very good model.

I mean, we could obviously solve this if we could put all of this data just in a single destination, in a single cloud account, and then train a model on it. But that’s something that just doesn’t happen. It doesn’t happen for regulatory reasons, it doesn’t happen for confidentiality reasons… For example, corporations, they have a lot of financial data, and they might want to have models that predict certain aspects about these data, but again, it’s a thing of confidentiality. It’s something they would never share. And the types of use cases that federated learning gets used for, sometimes we are surprised ourselves from where exactly these companies are hesitant to share data.

For example, there was one case where a couple of manufacturing companies - they’re all operating the same manufacturing machine, and they want to train a model that does predictive maintenance, basically, for this machine to predict whether this machine is likely going to fail, so whether they need to do some manual maintenance or something like that. And one would think that this is the case where they could just collaborate and they could just put all of their machine sensory data in a cloud account and train a predictive maintenance model. No, they don’t. Why don’t they collaborate on this? Well, the reason is that the data that they have from running these machines could allow others to see how often they run these machines, which could allow others to draw some conclusions about how many parts they’re producing, which is highly confidential. So even in those seemingly easy cases, in reality, it’s not that easy.

So that’s almost a perfect lead-in for what I wanted to ask next, is that federated learning, it sounds like, offers different business models from maybe some of the things we’ve done in the past, or even among competitors directly cooperating. So have you seen this start to happen yet, or maybe consortiums come into being? …and they may include direct competitors who are all in the same line of business, they want to protect their data so that they don’t give away competitive intelligence and federated learn through a consortium, or some other structure similar to that might be a way to everyone benefit from that and get the new model without giving away the secret sauce, so to speak? Do you expect to see more of that kind of thing?

We are sort of seeing this. The way it usually starts out is that organizations who are maybe not direct competitors, maybe they are somewhat in the same space, but they’re not the toughest competitors - they start to get together. In some cases, it can even be the sub-organizations of a larger enterprise, for example, because they are also often facing these restrictions for sharing data… But then we also see that, sometimes, even really strong competitors, they get together because they see something else as a threat to their business model, and they see that this is a way to collaborate without sharing this, as you call it, the secret source.

And the interesting bit is that the way I described federated learning in the beginning, is this is really end-to-end federated learning, where you initialize the model just globally, and then you train the model end-to-end, with all participating parties.

This is not the only model that’s possible. I want to describe one which I think is quite interesting, especially for this case where you have competing organizations collaborating, and it’s one where you train a certain part of the model in a federated fashion, across multiple data sets, and then other parts of the model you just train it yourself on your local data. So this is pretty interesting, because in such a Federation you can for example train the entire backbone of a model, but then the last few layers, the head of the model, you don’t train this in a federated fashion; you leave that up to each of the participating organizations to do it themselves. So everyone ends up with a similar, yet different model, and everyone has something where they say, “Okay, we benefit from this federation, but we are not giving away everything.”

The one important thing to mention though is that there are different types of federated learning. So you can roughly categorize it into two different types. One is this cross-silo type that we just talked about, where different organizations collaborate with each other. The other type that we often see also in scientific literature is the cross-device setting, where you would usually– typically, you would have one organization, for example, think about Google or Apple, for example, and this organization would have access to a large number of devices, for example, mobile devices, like in Android phones or iOS phones. And the goal, in this case, is also to train a model across all of these devices, and these devices, they hold data that is also where you wouldn’t want to upload this data to the cloud. So this is the cross-device setting where a single organization trains these models without access to re-underlying training data.

So Daniel, I think we mostly talked about some of the data-centric motivations for federated learning, or maybe privacy-focused, or whenever it is, competitive type of advantages… But I’m also thinking of the devices on which the actual training is happening. So if I’m thinking of the centralized model, I’m thinking of, “Oh, I’m going to spin up a pod of GPUs, a really expensive pod of GPUs, and do all my training there, and get my data there somehow. So am I correct that you could have some sort of infrastructure savings with this, where the actual computation is happening on those Edge devices and you’re doing a smaller amount of aggregation and updating of the model centrally? Could you talk to that a little bit and what people have seen and how they look at infrastructure in that way?

Yes. That’s a very interesting question. The answer is, as almost always in engineering, it depends. So as you noted correctly in the centralized setting, you have a pretty well-defined stack, and there’s not a lot that changes from one setup to another. You usually have some kind of an x86 processor, and then usually you have Nvidia GPU attached to that, you have Linux running on that machine, and then the biggest choice you have is whether to use TensorFlow or PyTorch, or JAX nowadays. In a federated setting, that’s quite different. In a federated setting, you can have anything as a client, starting from even a tiny embedded device. There’s research going on in that direction. Then you can have something like an Apple watch or a mobile phone, or you can some bigger device like a tablet or a laptop. You can have your standard x86 server that I just described, or you can even have a much larger compute cluster if you’re in the cross-silo setting, where you have a ton of data, and one of these organizations has massive in-house infrastructure. You can have HPC cluster as a client.

So this is obvious is quite interesting and also challenging from infrastructure and just a software perspective. In some cases, you can actually– and there is some recent research, for example, from a group in Cambridge that I’m involved with, about the CO2 impact of these workloads, comparing for example the CO2 impact… And this, obviously, quite related to your question, about the CO2 impact of federated workloads versus central workloads. And the interesting bit is that it’s not – you can’t say it in general. Actually, it’s quite an interesting thing, because I originally expected federated learning to do much worse, because you have these communication rounds, and it takes longer to converge, so obviously, it must have a higher CO2 impact… It turned out that that’s not necessarily the case, because in some situations– the reason is once you hear it, it’s quite obvious, but it was surprising to me.

In the central setting, the major impact on the CO2 emissions is the cooling. So you have active cooling of your GPU clusters. In the federated setting, you don’t necessarily have cooling. You have additional cost for communication, but then if you have a mobile edge device, these edge devices, they are usually passively cooled. So they’re running the workload and they produce the result, without ever needing energy for cooling. So that can be quite good, but obviously, it depends a lot on the workload, the type of model you train, the number of communication runs you do, and other aspects.

In terms of infrastructure costs, this answers this question as well, because you can have– in some cases, if you have, for example, the cross-device setting, then obviously, if you are not the one operating these devices, then you don’t have to pay for the energy that goes into training. Usually, when companies do this, they’re very careful about it. They do it in a very careful way. So they wait until the device is plugged and until the device is connected to WiFi, until it’s fully charged and idle, and only then they do the federated learning, to not impact the user experience or to not drain the battery, or things like that.

In the cross-silo setting, I wouldn’t say that there’s much of a difference. In terms of infrastructure, you need – well, each company needs the infrastructure they would need anyways, and then you need one additional server. So that’s pretty similar. Especially in the cross-silo setting where you often have large models, you do have a lot of network bandwidth that you need. So that’s something that you should consider.

You talked a little bit about the training time. You talked a little bit about what’s happening on the device. I think what’s happening in the back of my mind is I’m thinking, “Okay, I’ve got all of these devices and there’s various axes along which things could change”, right? I could have the computational power of that edge device, or the client. And then I’ve got also the number of samples that are available for training on that device. I’m thinking if– and maybe you could speak to this… So I’m thinking, in the scenario of a low-power edge device or a phone, I’m going to have very few samples, which might be a quick update on that device of the model and communicate the parameters back… Whereas, as the amount of data that you have on the client is larger, you need more computational power, or at least more time to do the update. Is that how that trade-off happens in practice?

Yes, absolutely. I mean, yeah, obviously, if you have more data on a device, you need more time to train the model on the data. But this is actually also a very interesting aspect, not just in terms of practical things like communication bandwidth and so on, but it’s also quite interesting from a more fundamental perspective, namely both in the cross-device setting and the cross-silo setting, usually the data in these partitions, as we like to call them, the data in these partitions is coming from different distributions. So it’s what we like to call it, Non-IID data. And this, actually, has an impact on the learning process. There are certain scenarios which are very rare in a practical setting, but there are scenarios, actually, where the data distribution within each partition can be so different that it’s just not possible that these workloads converge. And this is something where a lot of research is going on on how to make federated learning more robust towards such scenarios. And yeah, the practical aspects of it are also quite interesting, because if you have multiple clients in the same workload, one of these clients just has very few data examples, and another client has tons of data examples…

For example, we all know that one type of person that takes very few photos when they’re on vacation, and we all know that other type of person who takes a ton of photos when they’re on vacation. So this is a very practical example for different amounts of data on each device. In such a scenario, when you instruct a client to do, for example, one epoch on their data, then obviously the update will be coming back much, much faster.

So what you want to have in your entire system is some robustness towards clients who either take a very long time because they have so much data, or towards even real stragglers who, I don’t know, maybe the device is suddenly getting busy with other things, which delays the update coming back. So this is something where your software infrastructure needs to be able to handle these kinds of cases.

And it’s also something where you need appropriate ways of handling it on the server side. The easiest thing to do is obviously to discard those clients that are taking a long time, that are stragglers, but then there are more clever ways to approach this, for example, to let this client know, “Hey, your time is running out. We are about to close around. Why don’t you submit your partial update?” But then your server side and the aggregation logic, it needs to be able to handle those partial updates coming from clients.

So Daniel, you’ve talked a little bit about certain client devices being stragglers from one perspective, but I’m curious in terms of how the federated learning community is thinking about things like bias in data… So if I am a data scientist in a central location, I’m seeing maybe updates to my model, but I’m not seeing the data that is producing those updates to the weights and biases of my model. So if there’s bias in terms of those end client devices, like maybe 97% of my client devices are being operated by males, and I have some gender bias in the data that’s coming back… Are there ways that the community is thinking about that, and ways to address that sort of– I guess, maybe there’s a term for it. I’m thinking of it like client bias. Yeah, any thoughts there?

Yes, absolutely. It’s a very good question, and it’s a very important question. There are different ways to think about it. One way, or one topic that the community thinks about a lot is how to address that from an algorithmic perspective. So there are approaches, for example queue fair federated learning, that tackled this from an algorithmic perspective. So when you collect updates, you can do this in a certain way, and you can try, for example– there are many different approaches, but one thing you could do is you could try to address that, for example, through the averaging process. It’s a weighted averaging. So there are ways to influence this.

Another perspective is from a more intuitive and more practical perspective, in the sense that you can think of federated learning as a way compared to centralized learning to actually overcome bias, because you can– not overcome it completely, that’s not what I mean, but help to overcome it, in the sense that you can suddenly get access to more training data, and hopefully more representative training data, and then you can make better decisions about how to train your model and what kind of pieces of data to include in your training process, how to sample these data examples that you have on the clients, and a lot of those related questions.

Well, Daniel, this is Practical AI, so we definitely should get into the practicalities of how federated learning can be implemented… And I think you’re probably one of the best people to speak to that, because you’ve been heavily involved in one of the creators of the Flower framework. So maybe just to start out our discussion around that, could you talk about the backstory of Flower, the motivation behind it, and what it is?

Yeah, absolutely, trying to do my best. So when we started out on this journey, obviously, we got excited about federated learning for the reasons I was stating earlier. So we were actually in real industry projects, and we were facing these challenges where we saw that the data these organizations had in-house was just not enough; but we saw that there were other organizations who had similar challenges, and we saw the potential to build collaborative approaches there. And the only way to do these kinds of collaborations would be– the only way that would be feasible would be if the underlying training would not have to be shared. This was the setting that we were in when we first looked at federated learning. And at the time, we obviously looked at our solutions, but there wasn’t really a solution that was a really good fit for our requirements.

One of our requirements was - obviously because we were looking at these practical problems, was that we could build a system that we can then, at a later point, move into production. So obviously, you would start out with some prototyping and see if you could get such a workload to converge. But then at a later point, if you cannot move this into production, then why would you invest in this, right? So this was one of our hard requirements.

And then for moving a federated learning or federated analytics workload into production, there are a ton of associated challenges with that. I was hinting at this large heterogeneity that we see on the client side, so being able to integrate within a better device, a mobile device server, HPC cluster… This is something we thought was on our priority list. So at the time, we didn’t really see any solution that was a fit to the requirements that we had. We had to shift our focus a little bit away from building this one particular system that we had in mind, and we shifted and focused away to, first, building the infrastructure that we had in mind for it. Out of that, we built a prototype for that, and then out of that prototype, we gathered a lot of learnings, obviously, and eventually, at the beginning of last year, Tanner, my co-founder and I, we said, “Okay, let’s start a company and build this infrastructure to bring these advances that we see and this huge potential to make this really accessible for others to use as well.”

The Flower framework is probably obvious by now that one of the reasons the Flower framework is there, is that we want to enable everyone to build such workloads, because there’s a lot of details going on under the hoods that are not easy to implement… And if you just want to do federated learning, it would obviously be a huge hurdle for others to, first, build this infrastructure, before they then build the actual workload.

We wanted to make this easy. We want to make it easy to start in research, and then gradually enhance these workloads and move them to production, eventually, and then to operate them in production. This is also something we haven’t quite seen in other frameworks. Other frameworks that we’ve seen are usually focused on one thing, for example, focused on being a good simulation engine, but then you can’t take these workloads and move them into production.

The other opportunity that we saw - and this is part of this user journey, making it easy to start to prototype something - is the opportunity to be compatible with all of the machinery frameworks that we are seeing out there. So we see huge excitement about TensorFlow and PyTorch. Obviously, those are the dominating frameworks, I should say. Now there’s a lot of excitement about JAX by many people. And there are these other frameworks which are also relevant, sometimes relevant for very specific cases. And the opportunity that we saw is, well, based around this story, you have an existing machine learning project - what’s the minimal amount of code changes that you have to do in order to federate this thing? And we have code examples on that, where you can take an existing workload and then federate it in less than 20 lines of code, which is– actually, I still find it amazing, given the amount of things that are going on under the hood.

Yeah. And you mentioned supporting all of these different frameworks, which does seem like a big task. And I’m kind of looking through the Flower usage examples and the documentation… I also love just– I mean, you explicitly say it’s a friendly framework, which I think is great. You talked about accessibility. You’ve got a very friendly Flower logo. So yeah, I think it puts up an inviting front for people, which I think is cool, because it can be, like you said, a very overwhelming, complicated thing to get into.

You were talking about supporting these different frameworks, and maybe you could give a sense of– it seems like a big task to support all of those in this way. And I see that the main way in which you wrap things with Flower is creating this Python class, maybe, that wraps certain methods. And within those, you can define your own TensorFlow or PyTorch or whatever ways to fit, or get parameters of a model or whatever it is. Did you purposely create that structure because you had this vision of supporting the multiple frameworks? Am I representing that accurately?

Yes, absolutely. We call it Flower, the Friendly Federating Learning Framework, exactly for that reason. So we want to be friendly in many different dimensions, actually. We want to be friendly when it comes to different machine learning frameworks, we want to be friendly when it comes to different device types, we want to be friendly when it comes to different transport mechanisms. This is not something that is upfront on the website, but we have different transport mechanisms built in, and you can swap these out, actually. So the building and support for different frameworks - this was something that we intended to do from the very beginning. And there’s different layers to this that are important or at least interesting to understand.

One layer is the client class that you just described. So when you build your client in Python, then you would create a subclass of client. So flower.client.client is what the class is called, or a subclass of flower.client.NumPyClient, which is even easier to implement. You basically just need to add these few lines of code that then call into your existing machine learning pipelines, which is, on the one hand, a simple concept, but on the other hand, a very powerful concept, because it allows you– when you implement these classes, it allows you to call arbitrary Python libraries. So for example, one good and one important example for that is to support the differential privacy.

We sometimes get requests, “Hey, does Flower come with differential privacy built in?” And actually, the answer is, we don’t have to, because you can for example for a PyTorch based workload, you can use a slabber called apackos which gives you a differentiatial private SD optimizer that you can plug into your workload, and then you can just use it. And the amazing thing about this is that the Flower framework doesn’t even have to change. If there’s a new library coming out, a new approach coming out for what you can do on the client side, you can just integrate it with arbitrary code.

The other layer that is maybe interesting to understand, maybe not so much for researchers who do most of their day-to-day work in Python, but to others who want to maybe more deeply integrate this in the automotive setting or something similar to that, they wouldn’t want to use Python for their on-device processing. So they would want to use a different language, for example C, to do that. In the automotive world, there’s this C dialect called MrC that you have to use for safety purposes. For example, it prevents you from using recursion and other things like that, things that are considered unsafe in the automotive world. And in those scenarios, you can still integrate your device with Flower by directly handling the events that are coming from the server.

So in the end, Flower’s been designed in a way where the client-side is actually rather easy to implement. And if you have something that is running on C or C++, all you would have to do is you would have to establish a connection to the server, the server would then, occasionally, select this client, and when it selects a client, it sends it a message. You on the client side have to handle this message. You can do your processing. It doesn’t have to be any of the well-known machine learning frameworks. You can hand-code the type of model that you have, and then you send back a message containing your update, for example, the gradients that you collected.

That’s awesome. I love that client-agnostic focus. It’s cool.

One of the things I was curious about – because as a practitioner, I’m kind of in and out, and I’ll do other things in my job, and when I’m coming back in, I’m having to kind of go, “How did I do that before?”, and stuff. And one of the things that I’ve noticed in the industry is that the barriers to be able to access or utilize machine learning are getting lower, and there’s a lot of tools around usability coming out. What does the story look like for Flower and maybe for federated learning at large, as you have more users out there of various technical capability, and maybe gradually having that technical requirement going lower and lower as the tooling gets better? How will federated learning fit into that world where more users with less specific skill in this area are accessing these tools and creating models of various types? What does that look like there?

That’s a great question. So I’m sometimes saying that we’ve been, or maybe we still are not perfectly sure that in a pre-TensorFlow era when it comes to federated learning. It was the case for a long time that if you wanted to build a federated learning workload, you usually had a research scientist type of person start out to prototype this, make a simulation of it, and if that converges, then you could make the decision that you want to have this in production, but then you would basically start from scratch and you would implement it in a, quote-unquote, “real system”, with Java or C++ or something like that. So you had to build these systems by hand. For example, there’s a blog post that compares federated learning frameworks, and before Flower was around, the conclusion was really if you want to build this workflow, a federated learning system, and you want to build it in really a production environment, then your best option is to just build it from scratch by hand. They’ve recently updated this blog post to say that, for their scenario, they choose to use Flower for that. Obviously, I’m happy about that, but it’s still not a super, super polished experience.

So Flower makes it a lot easier to start out on that journey, but it’s still a couple of moving pieces that you should understand to make informed decisions about how to configure your workload, for example. That’s something that is obviously one of our priorities, to make this even easier, to make it even less likely that if you are not an expert on this, that you are configuring something, building something that might not be a good choice in production.

One of the things that we take very seriously is that we build in the right defaults. So one of the defaults, for example, that the Flower framework is following that is for certain types of workloads is its go-to recommendation, is that the Flower framework, when it gets updates from clients, it does not persist these updates in any way. So there’s these individual updates from clients, they could allow you to peek into it and to draw at least some minor conclusions about the client’s data set, and therefore, the recommendation is to receive these updates, only keep them in memory, and only for the minimum amount of time absolutely necessary. So once you aggregated it with other updates, you can safely discard it.

And another very related thing is that, for example, the server does not log any client-specific metrics by default. So those are things that we are trying to build in, that if you just start the server with all defaults, that it makes something that takes a sensible approach. But then, obviously, there are more advanced users and they want to customize it, so the perspective is make defaults safe, as safe as we can, and then allow more advanced users to customize these workloads.

So as we close out here, I’m interested to hear about what is one or a couple things that really excites you about the future of Flower? And maybe its applications within the wider context of federated learning. What’s the one thing or the couple things that really get you excited about where this is headed, or maybe within the roadmap of Flower?

There are a ton of things that get me excited, both from a research perspective, but also from a practical perspective. From a research perspective, we just launched a preview of a new feature that we are calling the virtual client engine. The virtual client engine is something that– well, it manages clients as virtual clients. So those clients, they don’t actually exist in memory. And what this gives you - it sounds pretty trivial, but what this you is amazing scalability for your research workloads.

We did a survey of research papers and we looked at what the scale of these workloads is in the research of those experiments, and the vast majority of papers, they used up to 100 clients, and also up to 100 clients doing work concurrently, so training concurrently, for example. So you can have a large client pool, you can have a client pool of, I don’t know, 10,000 clients, but then they would have only 100 of them participating in the same round.

This is something that is likely due to resource constraints, because those workloads can get very heavy, and the systems that we read about from industry - they are at a vast, different scale. So they have millions or tens of millions, or even hundreds of millions of clients in such a workload. And this is quite interesting and also quite an important challenge to address, because obviously, we want to have research that eventually translates to the real world, to a practical setting. And if the scale and research is a very different scale from the practical settings, it’s less likely that the research that we are conducting will translate into the practical setting.

So the virtual client manager is one thing where we demonstrated on quite average hardware actually we ran a workload with 15 million clients in it, and 1,000 of these clients training concurrently, and it just worked super well. So I’m quite excited about that one, and especially quite excited to see what the community is going to do with that. That’s from a research perspective, and we have a couple of things in the pipeline that we are going to announce over the coming months.

Also, from a practical perspective, that’s maybe even more exciting in terms of the real outcomes that we are going to see. And that one initiative that we are, for example, participating in is called Netperf, which is hosted by ML Commons, which is the organization that emerged out of MLPerf. So Netperf is a way to use federated evaluation to get a better understanding of the performance of medical AI models. It also requires federated infrastructure. We put the paper on archive. The entire Netperf, we put the paper on archive a couple of weeks ago; a super-interesting read, I recommend it. And this is something where you can really see that the real-world impact and real-world improvements that we are going to see from this can be very profound, because it’s about medical AI and medical– getting better performance estimates in medical AI is actually a very fundamental challenge. And once we have these better estimates, it is much safer to roll out medical AI models much, much faster. And apart from Netperf, there are also a couple of other initiatives in the medical AI space, in the drug discovery space that I’m very excited about, because any advance our infrastructure will help in generating can have a very profound impact on society as a whole. So that’s something I’m quite keen on contributing to.

Well, Daniel, I’m super excited about all of the things that you’ve mentioned in terms of things on the roadmap of research with Flower or practical uses of Flower and federated learning, and I really appreciate you joining us and talking us through everything on the podcast. I appreciate it, and we’ll include some show notes in our show posting for Flower and all the wonderful things that you’ve talked about… But yeah, thank you so much for joining and looking forward to keeping tabs on Flower.

Thanks for having me.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

0:00 / 0:00