In the realm of cross-device federated machine learning, it's common for clients to intermittently connect and disconnect from the federated learning network. This variability can stem from:
In a series of posts featuring code, we will delve into cross-device federated learning using the FEDn framework. More details about the framework can be found here:
the FEDn framework.
Our discussion will span the entire spectrum from research to real-world deployment, addressing aspects like:
In this first example, we will set up a local development environment for experimenting with cross-device use-cases on single machines. We will run intermittent clients solving a classification problem using incremental learning. In future posts in the series we will show how to seamlessly add production-grade features and scale it in a production-like setting via FEDn Studio.
The figure below illustrates the scenario. We will run clients that:
This completes one cycle in our setup, which we then repeat a configurable number of cycles.
These clients will collectively solve a classification problem. We assume that each client collects a fresh batch of training points during each cycle. During this time (a random time delay with max 10s) they remain offline from the FL network. They then connect for 120s making themselves available to participate in training on that local data batch. Naturally, you can modify the parameters to your liking. The global model trained as such (using FedAdam) is compared to the alternative that clients instead send their collected data batches to a server.
As can be seen, under these intermittent client conditions, FEDn trains a federated model (red) that reaches the same level of performance as the centralized baseline trained on all data on the server (blue), with convergence close to a simulated case where one client (orange) or all clients (green) send their collected data to a central server.
Here we used FedAdam with a fixed learning rate 1e-2 as the server-side aggregator. It is possible that hyperparameter tuning, or adapting the learning rate, could improve convergence further. This was not the focus of this experiment though - the objective was to set up a local experiment environment where clients connect intermittently and validate the robustness of FEDn in this scenario. In future parts of this series we will build on this in different ways as we explore various aspects of cross-device FL.
The code as well as additional experiments and examples for how to use the FEDn Python API to analyze training results can be found in our GitHub repository:
https://github.com/scaleoutsystems/fedn
A core design goal of FEDn is to meet the requirements of research and development while ensuring a sustainable route to real-world distributed deployments. Specifically, our aim is to assist researchers and developers in transitioning from initial experimentation to secure distributed deployments. In these deployments, elements like token-based client authentication and the managed operations of the server-side in Federated Learning (FL) are facilitated by the framework. Our approach within the framework is to offer a versatile local sandbox, designed to operate efficiently on individual laptops and workstations. This local setup is not merely a simulation of FEDn; it is a genuine distributed setup functioning on localhost. Consequently, projects developed in this manner can be deployed without any code modifications to FEDn Studio—a production version of FEDn that runs on any Kubernetes cluster. Scaleout provides a public instance of Studio as a Software as a Service (SaaS) to streamline the development phase as much as possible.