Research 7 min read

Why AI Misses What Matters in a Storm

The case for expert models over all-in-one systems

Muhammad Usama

Muhammad Usama

Salman Toor

Salman Toor

June 24, 2026

While self-driving cars generate up to 4 terabytes of data for every single hour on the road, roughly equivalent to streaming 1,600 hours of HD video, teaching them to drive safely in all environmental conditions remains one of the biggest hurdles in artificial intelligence. When we train standard, all-in-one neural networks on real-world driving data, they work seamlessly on a sunny afternoon but often struggle to generalize across extreme edge cases like heavy rain, blinding sleet, or dense snow.

To build truly safe autonomous systems, we have to look closely at the data, and completely rethink how AI "thinks" about the weather.

The High Stakes of Road Safety

Why does this matter so much? Because the real world is unpredictable, and the stakes couldn't be higher.

According to global status reports on road safety, traffic crashes still claim roughly 1.19 million lives annually worldwide. While highly regulated regions have seen modest decreases in road fatalities over recent years, a staggering 53% of these fatalities still occur on rural roads. These are environments where unpredictable weather, lack of street lighting, and high speeds mix into a dangerous cocktail.

For an autonomous car to drop those numbers to zero, it cannot afford to be a "fair-weather driver."

Recognizing the critical need for this research, this project, Mixture of Experts models Tailored for Fleet Intelligence, is funded by Vinnova (Sweden's innovation agency), with Scaleout leading the development alongside core partners AI Sweden and Zenseact. This collaboration brings together cutting-edge AI orchestration, public innovation support, and massive real-world driving data to push the boundaries of vehicle safety.

The Unsung Heroes of Autonomous Driving: Datasets Like ZOD

We can't solve the road safety challenges without high-quality data. This is why multi-modal datasets like the Zenseact Open Dataset (ZOD) are absolute game-changers for the industry.

Clear weather sample
Clear · 59% of data
Cloudy weather sample
Cloudy · 20% of data
Rain weather sample
Rain · 17% of data
Adverse weather sample
Adverse conditions · 3% of data

Image samples: Zenseact

Building a repository like ZOD is anything but trivial. It isn't as simple as strapping a camera to a dashboard and driving around the block. Creating a production-ready, benchmark dataset requires an immense, non-trivial engineering pipeline:

  • Collection: Deploying fleets of vehicles equipped with high-resolution cameras, LiDAR, and radar, all well calibrated to capture high-frequency data simultaneously.
  • Annotation: Human-in-the-loop and automated labeling of millions of frames. Objects (pedestrians, vehicles, signs) must be meticulously boxed and tracked in 3D space with millimeter precision.
  • Systematic Preprocessing: Synchronizing timestamps across completely different sensor types, filtering out sensor noise, and anonymizing faces and license plates to satisfy strict global privacy laws.

The industry very much needs more of these large-scale, open-source efforts if we ever want autonomous vehicles to scale safely and globally. Yet, even when pulling off this massive engineering feat, real-world data presents a brutal, built-in bottleneck: severe class imbalance. Clear Weather 59%, Cloudy 20%, Rain 17% and Highly Adverse (Fog, Snow, Storms) 3%.

The "Jack of All Trades, Master of None" Problem

Because bad weather only makes up a tiny fraction of the training data, standard AI models often suffer from a frustrating phenomenon known as Representational Interference, which drastically impacts the core task of object detection.

To understand representational interference, imagine you are trying to learn how to play two different instruments at the exact same time, say, the piano and the drums.

The heavy, rhythmic muscle memory required to keep a steady beat on a bass drum with your foot might actively disrupt the delicate, precise finger control you need for a classical piano piece. Because your brain is trying to optimize for both tasks simultaneously using the same set of muscles, the skills clash. You may end up playing both poorly.

In AI, representational interference (often called gradient interference) happens when a single, monolithic neural network tries to learn completely different concepts using the exact same pool of digital "neurons" (weights).

When a standard AI model, like a baseline DINO object detector with a ResNet-50 backbone, is fed heavily unbalanced driving data, it tries to please everyone at once:

  • The Majority Rules: Because 59% of the data is clear and sunny, the model's feedback loops (called gradients, which act as mathematical corrections) are overwhelmingly focused on optimizing for perfect conditions. It learns exactly what a pedestrian looks like in crisp, bright daylight.
  • The Clash: When the model encounters a rare 3% adverse condition, like a pedestrian obscured by heavy snow, it tries to adjust its weights to understand this blurry, low-contrast shape.
  • The Interference: The corrections needed to understand the snowy image conflict with the corrections needed for the sunny images. Because the sunny data has the vast majority of the "votes," its gradients override or cancel out the snowy gradients.
  • The Bottleneck: The model is forced to compromise. In trying to find a middle ground that satisfies both scenarios, it averages out its learning. It defaults to assuming conditions are good, failing to specialize in the exact adverse conditions where human lives rely on it most.

Enter CAMoE: The Panel of Weather Experts

To fix this, we need to look for alternatives to the one single AI model to learn conflicting representations of the world. Instead, we can introduce a specialized architecture called Context-Aware Mixture-of-Experts (CAMoE).

Standard MoEs
Standard MoE routing architecture
Standard MoE routing: every input is sent through the same shared pool of experts, regardless of weather context.
CAMoE
CAMoE context-aware routing architecture
CAMoE routing: a context-aware gating network reads the conditions in each frame and routes it to the matching weather experts, instead of one shared pool.

Think of CAMoE not as a single driver, but as a backseat corporate boardroom filled with specialized experts and a brilliant manager.

Instead of relying on one general-purpose neural network to handle every scenario, CAMoE breaks the system down into two main components:

  • The Experts: Multiple smaller, specialized neural networks. We might have a "Sunny Expert," a "Heavy Rain Expert," and a "Nighttime Expert." Each one is trained to spot pedestrians, cars, and obstacles in its specific domain.
  • The Gating Network (The Manager): A context-aware algorithm that looks at the incoming video feed, instantly identifies the environmental context (e.g., "It's 10 PM and pouring"), and dynamically routes the data to the right expert, or blends the opinions of a few experts.

Safer Driving Without the "AI Tax"

The best part about the CAMoE architecture? It achieves performance boosts in adverse weather without a massive increase in parameter count (or computational cost).

Metric Monolithic DINO CAMoE
mAP 50 61.4 65.7
mAP 50:95 33.9 37.3

In traditional AI, making a model smarter usually means making it bigger, which requires massive amounts of expensive computing power inside the car. But because CAMoE only activates the relevant "experts" for any given frame of data, the car's computer isn't working any harder than a standard model would. It's just working smarter.

Conclusion: Smarter Edge AI for the Real World

Handling highly variable and sensitive data is a fundamental, real-world challenge in production computer vision for autonomous driving. The street doesn't care if our AI model was mostly trained on sunny California roads; it expects perfection when navigating a pitch-black, rain-slicked highway.

By utilizing the Context-Aware Mixture-of-Experts architecture, we can keep the global model converging effectively across diverse environmental conditions. Instead of forcing a single neural network to compromise its vision, we give it the tools to adapt.

This targeted specialization ensures that autonomous systems can adapt to adverse environments without the need for computationally expensive monolithic models. It paves the way for a safer, more robust generation of Edge AI.

It brings us one step closer to self-driving technology we can trust in any weather, anywhere on Earth.

Share this article

Muhammad Usama

Muhammad Usama

ML Researcher at Scaleout.

Salman Toor

Salman Toor

Co-Founder & CTO at Scaleout, and Associate Professor at Uppsala University.