X-Cache · Cross-Chunk Block Caching

Urban

Dense traffic, pedestrians, storefronts.

Seven urban clips with the highest static texture density in the test split. The cosine gate runs near-saturated; skip rate stays at 71.4% and the residual sits on lane edges and far-field foliage.

7 clips
264 frames each
PSNR 51.4 dB
skip 71.4%
speedup 2.7×

frame 0 / 264

skip 71.4%

speedup 2.7×

Baseline · full compute X-Cache · 2.7× faster

0:00 / 0:22

Drag the seam · scrub the timeline · ← / → to step one frame · space to play. Top-right strip is the live DiT block ribbon — gold = anchor, blue = recomputed, green = reused, indigo = KV-update protected.

Highway

Elevated express ring & ordinary motorway.

Three clips. Long depth of field, rapid forward motion, sparse foreground. The gate skips 71.6% of blocks — and decoded 7-cam PSNR climbs to 54.7 dB because most pixels are sky/asphalt that absorb latent perturbations cleanly.

3 clips
PSNR 54.7 dB
skip 71.6%
speedup 2.7×

frame 0 / 264

skip 71.6%

speedup 2.7×

Baseline · full compute X-Cache · 2.7× faster

0:00 / 0:22

Drag the seam · scrub the timeline · ← / → to step one frame · space to play. Top-right strip is the live DiT block ribbon — gold = anchor, blue = recomputed, green = reused, indigo = KV-update protected.

U-turn

Maximum cross-chunk motion in the split.

Three clips where the ego vehicle executes a sharp heading change. Adjacent chunks are the most different in the dataset — yet the cross-chunk fingerprint still survives, with skip rate 71.3% and no chunk-boundary drift visible in the per-frame PSNR trace.

3 clips
PSNR 52.0 dB
skip 71.3%
speedup 2.7×

frame 0 / 264

skip 71.3%

speedup 2.7×

Baseline · full compute X-Cache · 2.7× faster

0:00 / 0:22

Drag the seam · scrub the timeline · ← / → to step one frame · space to play. Top-right strip is the live DiT block ribbon — gold = anchor, blue = recomputed, green = reused, indigo = KV-update protected.

Scenario / camera	PSNR ↑ (dB)	SSIM ↑	LPIPS ↓	Skip	DiT	Speed
Urban street · n=7
F-C	53.83	0.9988	3.6e-4	71.4 %	1.392 s	2.7×
F-W	50.27	0.9987	4.3e-4
S-FL	49.49	0.9985	5.1e-4
S-FR	48.69	0.9984	5.2e-4
S-RL	48.59	0.9985	4.8e-4
S-RR	48.07	0.9985	5.2e-4
Rear	51.77	0.9986	4.7e-4
7-cam	51.37	0.9990	3.3e-4
Highway · n=3
F-C	54.87	0.9989	2.6e-4	71.6 %	1.365 s	2.7×
F-W	54.38	0.9988	2.3e-4
S-FL	53.08	0.9987	2.8e-4
S-FR	52.20	0.9987	2.9e-4
S-RL	52.48	0.9987	2.5e-4
S-RR	51.90	0.9986	3.0e-4
Rear	53.42	0.9987	3.2e-4
7-cam	54.67	0.9991	1.9e-4
U-turn · n=3
F-C	54.60	0.9987	4.3e-4	71.3 %	1.364 s	2.7×
F-W	51.79	0.9987	3.6e-4
S-FL	49.29	0.9985	4.6e-4
S-FR	49.18	0.9985	4.7e-4
S-RL	48.87	0.9985	4.0e-4
S-RR	48.82	0.9984	4.9e-4
Rear	52.51	0.9986	4.2e-4
7-cam	52.04	0.9990	3.1e-4

X-Cache^v1.0

Real-time world simulation breaks every existing cache.

Cache along a different axis.

↦ same chunk · adjacent step

↧ same step · adjacent chunk

Block-level residual cache

Structure & action-aware fingerprint

Dual-metric gate

Per-(t, b) adaptive threshold

What happens inside one chunk.

Sample noise & init context

Compute fingerprints

Gate decision

KV update — protected

Four guardrails that keep approximation contained.

KV-update protection

Anchor block (F_n = 1)

Step-0 protection (optional)

Adaptive threshold floor

Drag the curtain. There is no quality drop.

Dense traffic, pedestrians, storefronts.

Elevated express ring & ordinary motorway.

Maximum cross-chunk motion in the split.

Same seed, same conditioning, 2.7× faster — and the pixels prove it.