Predictive Agent Interaction Representation for SMART-Style Traffic Simulation
2026-04-01
A checkpoint-first plan for improving SMART-style multi-agent traffic simulation with a training-time predictive interaction regularizer and counterfactual stress diagnosis.
SMART baseline CAT-K reference PAIR fine-tuning Counterfactual stress tests
\mathcal{L}_{\text{SMART-PAIR}} = \mathcal{L}_{\text{task}} + \lambda_{\text{PAIR}}\mathcal{L}_{\text{PAIR}}
Keep the deployed SMART generator unchanged. Add a training-only objective that forces hidden states to predict future multi-agent interaction structure.
Inference footprint: unchanged SMART rollout path. PAIR is a fine-tuning regularizer, not a bigger deployed simulator.
SMART treats driving as next-token prediction over learned motion tokens and map context.
CAT-K improves closed-loop behavior by selecting model-likely tokens closer to logged futures.
Logged-future realism may not fully test whether agents react correctly when another agent changes behavior.
Research claim boundary: we are not claiming CAT-K is weak. We are asking whether logged-future metrics under-test changed-interaction response quality.
\hat{\tau}_{1:N} \sim p_{\theta}(\tau_{1:N}\mid H,M)
Can the simulator sample joint futures that look like the dataset?
\hat{\tau}_{-j}^{cf} \sim p_{\theta}(\tau_{-j}\mid H,M,\operatorname{do}(\tau_j=\tilde{\tau}_j))
If one agent behaves differently, do the nearby agents react plausibly?
\text{closeness to logged future}\;\neq\;\text{correct response under changed interaction}
\mathcal{L}_{\text{PAIR}} = 2 - 2\cos\left(\hat{z}_{i,r},\operatorname{sg}(z^+_{i,r})\right)
BC and closed-loop fine-tuned reference checkpoints are available locally.
Small fixed cache is enough for load, rollout, and validation-prefix checks.
Diverse lead-braking pairs across 10 scenarios for first diagnosis.
Implemented behind disabled-by-default config flags.
| Model | Open acc | Open loss | Closed ADE ↓ | RMM ↑ | Interaction ↑ | Map ↑ | Scenarios |
|---|---|---|---|---|---|---|---|
| BC | 0.8198 | 2.3412 | 0.8633 | 0.6966 | 0.7701 | 0.7724 | 8 |
| CLSFT | 0.8163 | 2.9705 | 0.7836 | 0.7182 | 0.7889 | 0.8076 | 8 |
Interpretation: the closed-loop fine-tuned checkpoint improves closed-loop ADE, RMM, interaction, and map metrics on this small fixed prefix. This makes it a credible reference before testing PAIR.
A lead vehicle is forced to follow a plausible braking token sequence. The model must generate the follower response.
\tilde{\tau}_{lead} = \operatorname{Brake}(\tau_{lead}^{*}), \qquad \hat{\tau}_{follower}^{cf} \sim p_{\theta}(\cdot \mid \operatorname{do}(\tilde{\tau}_{lead}))
17a010edfe6d47b3, follower 15, lead 22.
On the 20-pair diagnostic subset, the fine-tuned checkpoint shows more forced lead-braking collisions than the BC checkpoint despite similar normal-condition gaps.
This suggests the stress harness may reveal behavior not captured by the small aggregate realism prefix.
The subset is small. The intervention family is narrow. No videos were generated in this pass. The result must be replicated with stronger controls before being treated as evidence.
Decision gate: do not claim PAIR helps until matched PAIR, shuffled PAIR, BC, and CLSFT are compared on the same frozen stress suite.
\theta_{PAIR} = \operatorname{FT}\left( \theta_{BC}, \mathcal{L}_{task}+\lambda\mathcal{L}_{PAIR} \right)
Target latents come from the true future token structure.
\theta_{shuffle} = \operatorname{FT}\left( \theta_{BC}, \mathcal{L}_{task}+\lambda\mathcal{L}_{PAIR}^{shuffle} \right)
Same capacity and training path, but future targets are mismatched.
If matched PAIR improves stress behavior while shuffled PAIR does not, the evidence points toward future-interaction structure rather than extra parameters or extra optimization steps.
| Variant | Initialization | Extra objective | Purpose |
|---|---|---|---|
| BC | author BC checkpoint | none | behavior-cloning baseline |
| CLSFT / CAT-K reference | author fine-tuned checkpoint | closed-loop fine-tuning | strong reference |
| SMART-PAIR | author BC checkpoint | matched PAIR | proposed method |
| Shuffled PAIR | author BC checkpoint | shuffled PAIR | negative control |
Invariant 1: with pair.enabled=false, author checkpoints must strict-load with zero missing or unexpected keys.
Invariant 2: PAIR must be training-only unless an explicit inference-time reranking experiment is introduced.
Invariant 3: no restricted checkpoints, processed WOMD files, or private raw logs enter public slides or GitHub history.
SMART-PAIR is a conservative modification: keep SMART’s deployed autoregressive generator, but regularize the training representation toward future interaction structure.
A simulator should be judged not only by logged-future realism, but also by whether it reacts plausibly when another agent’s future changes.
\text{standard realism} + \text{counterfactual reaction quality} + \text{representation evidence}
SMART-PAIR | Predictive Agent Interaction Representation