Why AI Misses What Matters in a Storm
The case for expert models over all-in-one systems
Muhammad Usama
Salman Toor
June 24, 2026
While self-driving cars generate up to 4 terabytes of data for every single hour on the road, roughly equivalent to streaming 1,600 hours of HD video, teaching them to drive safely in all environmental conditions remains one of the biggest hurdles in artificial intelligence. When we train standard, all-in-one neural networks on real-world driving data, they work seamlessly on a sunny afternoon but often struggle to generalize across extreme edge cases like heavy rain, blinding sleet, or dense snow.
To build truly safe autonomous systems, we have to look closely at the data, and completely rethink how AI "thinks" about the weather.
The High Stakes of Road Safety
Why does this matter so much? Because the real world is unpredictable, and the stakes couldn't be higher.
According to global status reports on road safety, traffic crashes still claim roughly 1.19 million lives annually worldwide. While highly regulated regions have seen modest decreases in road fatalities over recent years, a staggering 53% of these fatalities still occur on rural roads. These are environments where unpredictable weather, lack of street lighting, and high speeds mix into a dangerous cocktail.
For an autonomous car to drop those numbers to zero, it cannot afford to be a "fair-weather driver."
The Unsung Heroes of Autonomous Driving: Datasets Like ZOD
We can't solve the road safety challenges without high-quality data. This is why multi-modal datasets like the Zenseact Open Dataset (ZOD) are absolute game-changers for the industry.
Image samples: Zenseact
Building a repository like ZOD is anything but trivial. It isn't as simple as strapping a camera to a dashboard and driving around the block. Creating a production-ready, benchmark dataset requires an immense, non-trivial engineering pipeline:
- Collection: Deploying fleets of vehicles equipped with high-resolution cameras, LiDAR, and radar, all well calibrated to capture high-frequency data simultaneously.
- Annotation: Human-in-the-loop and automated labeling of millions of frames. Objects (pedestrians, vehicles, signs) must be meticulously boxed and tracked in 3D space with millimeter precision.
- Systematic Preprocessing: Synchronizing timestamps across completely different sensor types, filtering out sensor noise, and anonymizing faces and license plates to satisfy strict global privacy laws.
The industry very much needs more of these large-scale, open-source efforts if we ever want autonomous vehicles to scale safely and globally. Yet, even when pulling off this massive engineering feat, real-world data presents a brutal, built-in bottleneck: severe class imbalance. Clear Weather 59%, Cloudy 20%, Rain 17% and Highly Adverse (Fog, Snow, Storms) 3%.
The "Jack of All Trades, Master of None" Problem
Because bad weather only makes up a tiny fraction of the training data, standard AI models often suffer from a frustrating phenomenon known as Representational Interference, which drastically impacts the core task of object detection.
To understand representational interference, imagine you are trying to learn how to play two different instruments at the exact same time, say, the piano and the drums.
The heavy, rhythmic muscle memory required to keep a steady beat on a bass drum with your foot might actively disrupt the delicate, precise finger control you need for a classical piano piece. Because your brain is trying to optimize for both tasks simultaneously using the same set of muscles, the skills clash. You may end up playing both poorly.
In AI, representational interference (often called gradient interference) happens when a single, monolithic neural network tries to learn completely different concepts using the exact same pool of digital "neurons" (weights).
When a standard AI model, like a baseline DINO object detector with a ResNet-50 backbone, is fed heavily unbalanced driving data, it tries to please everyone at once:
- The Majority Rules: Because 59% of the data is clear and sunny, the model's feedback loops (called gradients, which act as mathematical corrections) are overwhelmingly focused on optimizing for perfect conditions. It learns exactly what a pedestrian looks like in crisp, bright daylight.
- The Clash: When the model encounters a rare 3% adverse condition, like a pedestrian obscured by heavy snow, it tries to adjust its weights to understand this blurry, low-contrast shape.
- The Interference: The corrections needed to understand the snowy image conflict with the corrections needed for the sunny images. Because the sunny data has the vast majority of the "votes," its gradients override or cancel out the snowy gradients.
- The Bottleneck: The model is forced to compromise. In trying to find a middle ground that satisfies both scenarios, it averages out its learning. It defaults to assuming conditions are good, failing to specialize in the exact adverse conditions where human lives rely on it most.
Enter CAMoE: The Panel of Weather Experts
To fix this, we need to look for alternatives to the one single AI model to learn conflicting representations of the world. Instead, we can introduce a specialized architecture called Context-Aware Mixture-of-Experts (CAMoE).
Think of CAMoE not as a single driver, but as a backseat corporate boardroom filled with specialized experts and a brilliant manager.
Instead of relying on one general-purpose neural network to handle every scenario, CAMoE breaks the system down into two main components:
- The Experts: Multiple smaller, specialized neural networks. We might have a "Sunny Expert," a "Heavy Rain Expert," and a "Nighttime Expert." Each one is trained to spot pedestrians, cars, and obstacles in its specific domain.
- The Gating Network (The Manager): A context-aware algorithm that looks at the incoming video feed, instantly identifies the environmental context (e.g., "It's 10 PM and pouring"), and dynamically routes the data to the right expert, or blends the opinions of a few experts.
Safer Driving Without the "AI Tax"
The best part about the CAMoE architecture? It achieves performance boosts in adverse weather without a massive increase in parameter count (or computational cost).
| Metric | Monolithic DINO | CAMoE |
|---|---|---|
| mAP 50 | 61.4 | 65.7 |
| mAP 50:95 | 33.9 | 37.3 |
In traditional AI, making a model smarter usually means making it bigger, which requires massive amounts of expensive computing power inside the car. But because CAMoE only activates the relevant "experts" for any given frame of data, the car's computer isn't working any harder than a standard model would. It's just working smarter.
Conclusion: Smarter Edge AI for the Real World
Handling highly variable and sensitive data is a fundamental, real-world challenge in production computer vision for autonomous driving. The street doesn't care if our AI model was mostly trained on sunny California roads; it expects perfection when navigating a pitch-black, rain-slicked highway.
By utilizing the Context-Aware Mixture-of-Experts architecture, we can keep the global model converging effectively across diverse environmental conditions. Instead of forcing a single neural network to compromise its vision, we give it the tools to adapt.
This targeted specialization ensures that autonomous systems can adapt to adverse environments without the need for computationally expensive monolithic models. It paves the way for a safer, more robust generation of Edge AI.
It brings us one step closer to self-driving technology we can trust in any weather, anywhere on Earth.