When a vision system keeps learning after it ships, accuracy on the new domain is the easy question. The four numbers that actually keep it safe are quieter than that.

Continuous-learning metrics matter whenever a model has to keep learning after it ships, rather than being trained once and frozen. That is the normal condition for vision systems that move through the world. An ADAS stack drives into new countries, seasons, weather, and road furniture. A defence sensor is redeployed to a new theatre, meets unfamiliar platforms, camouflage, or adversary tactics, or is swapped onto different optics.

In every case the data seen in the field drifts away from the data the model was trained on — and retraining from scratch on the entire history each time is usually off the table, blocked by data volume, edge-compute limits, latency, privacy, or classification constraints.

Once updates have to be incremental, you need measures that answer not only "is the new accuracy good?" but "what did this update cost us elsewhere?" In safety- and mission-critical systems, an update that silently breaks an existing capability is far more dangerous than one that simply fails to improve.

The stability–plasticity tradeoff

Every continual learner is caught between two opposing pressures. Plasticity is the capacity to absorb new information — to actually change when the world changes. Stability is the capacity to hold on to what has already been learned.

The Stability–Plasticity Tradeoff

Every update balances keeping what already works with learning something new.

STABILITY

Hold the old

PLASTICITY

Learn the new

TOO RIGID

Loss of plasticity

OVERWRITES

Catastrophic forgetting

Fig. 1: The inherent opposition between adaptation and retention.

The two pull against each other, and there is no single correct setting — the right balance depends on how fast your domain shifts and how costly forgetting is for your application. The whole purpose of the measures below is to make this tradeoff visible and quantified, rather than something you discover only after a failure in the field.

The four measurements

Two of the numbers look backward in time — what the latest update did to what you already had. Two look forward — what the system is now poised to learn next. Together they sit on a single timeline around the task you are training right now.

The Dynamics of Continual Learning

Learning a new task right now alters your past knowledge and shapes how easily you will adapt to future challenges.

EARLIER

URBAN

NIGHT

NOW

FOG

RAIN

LATER

RURAL

EARLIER • BACKWARD

EFFECT OF NEW TRAINING ON WHAT WAS ALREADY LEARNED

NOW • PLASTICITY

HOW WELL THE SYSTEM ABSORBS THE IMMEDIATE TASK IN FRONT OF IT

LATER • FORWARD

A HEAD START ON FUTURE DOMAINS NOT YET EXPERIENCED

Fig. 2: Timeline framework tracking learning updates.

Plasticity

Can it still learn?

How good does it get immediately after seeing the new data?

Plasticity measures how well the system learns the task or domain it is being trained on right now. The intuition: drop the model into a genuinely new condition — a new camera, a new operating region — and see how close it comes to the performance you would reach if you had trained a fresh model on that condition alone.

High plasticity means the system adapts readily and makes good use of new data. Low plasticity — sometimes called loss of plasticity — means it has gone rigid and can no longer take on new conditions effectively, a well-known failure mode in systems updated many times over a long deployment.

Forgetting

What did we lose?

How much did old capabilities decay while we learned new ones?

Forgetting measures how much performance on previously mastered tasks decays once the system has moved on. The intuition: recall the best the model ever did on, say, night-time pedestrian detection, then check how it does on that same task today after months of further updates — the gap is forgetting. It is reported as a loss, so smaller is better.

This is the metric that matters most for safety. A system can show excellent accuracy on its latest domain while having quietly degraded on a capability you still depend on — and forgetting is what surfaces that hidden regression before it shows up as a missed detection in the field.

Forgetting

Forgetting measures how much performance on previously mastered tasks decays once the system has moved on to newer updates.

WHAT DID WE LOSE

HOW MUCH OLD CAPABILITIES DECAYED WHILE WE LEARNED NEW ONES

SAFETY CRITICAL

SURFACES HIDDEN REGRESSION BEFORE IT SHOWS UP AS A FAILURE IN THE FIELD

Fig. 3: Measuring the regression gap in historical tasks.

Backward transfer

Does new learning help or hurt the old?

The fuller, signed picture — of which forgetting is only the bad half.

Backward transfer looks backward in time and asks how learning newer tasks affected the older ones. Negative backward transfer means new learning hurts old performance — interference, where an update overwrites existing capabilities. Positive backward transfer is the happy case — learning a new but related domain actually improved an older one. Learning to detect vehicles in fog might also sharpen detection in rain, so knowledge has flowed usefully backward. Near zero means the tasks sit in isolation, with new learning neither helping nor harming the old.

Forward transfer

Is experience compounding?

How much does everything learned so far prepare it for what's next?

Forward transfer looks the other way and asks how much everything the system has already learned prepares it for a task it has not properly trained on yet. The intuition: before you give the model real training data for a brand-new domain, how much better than a from-scratch, knows-nothing model does it already do, purely on the strength of related past experience?

Positive forward transfer means accumulated experience generalises and gives the system a head start — a model that has already seen many environments should need less data and adapt faster to the next one. For fielded systems this is a strong indicator of how quickly and cheaply you will be able to bring a new deployment online.

Seeing the tension

The cleanest way to see the tradeoff is to watch what happens when you act on one metric. Say your fielded model keeps forgetting night-time pedestrian detection every time it learns a new desert or fog domain, and you decide to fix it.

The standard moves all amount to making the model more rigid: anchor the weights that mattered most for the old tasks so the optimizer is penalised for changing them; freeze the backbone and only let the head move; or replay a buffer of old examples heavily enough that their gradients dominate. Any of these will do exactly what you wanted — forgetting drops and backward transfer climbs toward zero or positive.

When a genuinely new theatre arrives, there is far less capacity left to reshape, so plasticity falls. And because its representations are now pinned to the structure of past domains, they generalise poorly to the unfamiliar one, so forward transfer suffers too. Push the lever the other way to recover that ground — unfreeze the backbone, raise the learning rate, fully fine-tune on each new domain — and plasticity and forward transfer come straight back, but the very same freedom lets new gradients overwrite the old competencies, and forgetting and backward transfer slide back into the red.

Interactive

Experience Replay

To prevent forgetting, the system maintains a memory buffer of past experiences, interleaving old data with new training.

Stability BALANCED Plasticity

Plasticity 60%

Forgetting −21.3

Bwd Transfer +8.5

Fwd Transfer +8.5

Locked down The old capabilities are protected — forgetting is near zero — but there is almost no room left to learn a new theatre. Plasticity and forward transfer have collapsed.

Fig. 4: Moving parameters between rigid stability and highly adaptive plasticity.

There is no setting that wins on all four at once. You are choosing where on that line to sit — and the only way to choose well is to be able to see all four numbers at the same time.

What did this update cost us elsewhere?

The stability–plasticity tradeoff

The Stability–Plasticity Tradeoff

STABILITY

PLASTICITY

The four measurements

The Dynamics of Continual Learning

Can it still learn?

What did we lose?

Forgetting

Does new learning help or hurt the old?

Is experience compounding?

Seeing the tension

Experience Replay

Viktor Valadi