06 — Implementation Roadmap, Risks & Testing¶
Part of the Lattice networking suite design set. See also: README · 00 Overview · 01 High-Level Design · 02 Netcode LLD · 03 Auth Service · 04 Social Library · 05 Engine Integration
1. Purpose & Scope¶
This document sequences how Lattice gets built, in what order, and how we de-risk it. It turns the architecture described in 01-high-level-design.md and 02-netcode-lld.md into a phased delivery plan with concrete deliverables and exit criteria, a module dependency graph, a risk register, a testing/QA strategy, the supporting tooling, a directional team shape, and the success targets we measure against.
All parameters (60 Hz tick, 20–30 Hz snapshots, ~100 ms interp buffer, ~1200 B MTU, four reliability channels, X25519 + ChaCha20-Poly1305, Ed25519 tokens, STUN + hole punch + relay) are inherited unchanged from the shared design brief and the Overview. This doc does not re-specify them; it schedules their implementation.
No calendar dates. All sequencing here is relative. Durations are abstract units ("d" buckets in the Gantt) expressing rough proportional effort and ordering, not committed dates. Treat them as a dependency-aware ordering, not a contract.
2. Guiding Strategy¶
Five principles drive the ordering of everything below.
-
Data-plane core first, headless, before any engine binding. The hardest, highest-risk code (transport, reliability, prediction/rollback, replication) is built and proven inside
lattice-coreagainst a tiny headless harness — a console process that links the core directly with no engine, no renderer, no GC. We do not write a single line of Unity/Unreal/Godot glue until the core can run an authoritative session end-to-end against itself. This keeps the inner loop fast (compile + run in seconds), keeps determinism testable, and prevents engine quirks from masquerading as netcode bugs. -
Walking skeleton early. As soon as Phase 1 lands, we wire a thin but end-to-end path — socket → handshake → connection → a single replicated counter — through the headless harness. It does almost nothing, but it exercises every layer's seams. Every later phase thickens this skeleton rather than building a new pillar in isolation.
-
Vertical slices over horizontal layers. We prefer "one feature working all the way through" to "one layer fully complete." The MVP (§7) is a vertical slice: a dedicated-server authoritative shooter with prediction + rollback in Unity. Reaching it touches transport, replication, prediction, the C ABI, and one binding — proving the whole stack thin before we make any one part wide.
-
In-process loopback transport from day one. A swappable loopback/
MemoryTransportthat delivers packets in-process (with a programmable network-condition shim) lets us test reliability, replication, and rollback deterministically and fast, without sockets. The real UDP transport and the loopback transport implement the same internal interface, so most tests run against both. -
Bake in observability and replay before scale. Record/replay, the packet inspector, and the core metrics (RTT, bandwidth, rollback frequency, tick budget, CCU) are tooling we build alongside the features they observe — not after. You cannot debug rollback or interest management at scale without them.
flowchart LR
A["Headless harness<br/>+ loopback transport"] --> B["Walking skeleton<br/>(thin end-to-end path)"]
B --> C["Vertical slice / MVP<br/>(authoritative slice + prediction)"]
C --> D["Widen: topologies,<br/>shared authority, services"]
D --> E["Scale & harden:<br/>interest mgmt, load, security"]
E --> F["Stretch: web / WASM /<br/>WebTransport"]
3. Module Dependency Graph¶
What must exist before what. Edges mean "depends on / must precede." The spine runs left-to-right through lattice-core; the control plane and data-plane binaries hang off it; bindings come last.
graph TD
subgraph Core["lattice-core (C++20)"]
SOCK["UDP socket + I/O loop"]
LOOP["loopback / MemoryTransport"]
REL["reliability channels<br/>(4 channels, frag/reassembly)"]
CRYPTO["crypto handshake<br/>(X25519 + ChaCha20-Poly1305)"]
CONN["connection lifecycle<br/>(handshake, keepalive, disconnect)"]
SER["serialization<br/>(bit-pack, quantize, delta)"]
REP["replication<br/>(NetworkObject, snapshots)"]
PRED["prediction / rollback /<br/>reconciliation + lag comp"]
AOI["interest management (AoI)"]
OWN["ownership + authority transfer<br/>(shared/distributed mode)"]
NAT["NAT (STUN, hole punch,<br/>relay client)"]
ABI(["extern \"C\" C ABI"])
end
SOCK --> REL
LOOP --> REL
CRYPTO --> CONN
REL --> CONN
CONN --> SER
SER --> REP
REP --> PRED
REP --> AOI
PRED --> OWN
REP --> OWN
CONN --> NAT
REP --> ABI
PRED --> ABI
AOI --> ABI
OWN --> ABI
ABI --> BU["lattice-unity"]
ABI --> BUE["lattice-unreal"]
ABI --> BG["lattice-godot"]
ABI --> BW["lattice-web (WASM)"]
NAT --> RELAY["lattice-relay"]
RELAY --> P2P["P2P host + relay fallback<br/>+ host migration"]
OWN --> P2P
subgraph Control["Control plane (.NET 8)"]
AUTH["lattice-auth"]
DIR["lattice-director<br/>(matchmaking + fleet)"]
SOC["lattice-social"]
end
AUTH --> DIR
DIR --> GS["lattice-gameserver fleet"]
REP --> GS
AUTH --> GS
DIR --> MM["matchmaking flow<br/>(connect to session)"]
BU --> MM
P2P --> MM
BU --> DEMO["reference demo<br/>(shooter slice)"]
GS --> DEMO
Key precedence rules encoded above:
- Transport before everything: socket + loopback → reliability → connection. Nothing replicates until a connection exists.
- Serialization before replication; replication before prediction. You cannot predict/reconcile state you cannot snapshot and delta.
- Replication before both interest management and ownership. AoI filters replicated objects; ownership annotates them.
- Prediction (authoritative mode) before shared/distributed authority. Server-authoritative is the proving ground; distributed authority (ownership transfer) reuses its machinery.
- Relay before P2P fallback; NAT before relay client. Direct hole punching is attempted first, relay is the guaranteed fallback, and host migration depends on ownership transfer.
lattice-authbeforelattice-director; both before the matchmaking flow. Tokens are issued by auth, consumed by director and game server, before any client can be placed.- The C ABI gates all bindings. No binding work begins until the ABI surface is stable enough to wrap.
4. Phased Plan¶
Each phase lists deliverables, exit criteria (the gate that says "done, move on"), and which design docs it implements. Phases are ordered by dependency; some overlap is expected and shown in the Gantt (§5).
Phase 0 — Foundations¶
Implements: repo/build scaffolding underpinning all docs; ABI contract from 05.
Deliverables
- Monorepo layout per the README (lattice-core/, bindings/, data-plane/, control-plane/, game-sim/, docs/).
- Native build (CMake) for lattice-core on Windows/Linux/macOS; .NET 8 build for control-plane projects; one CI pipeline that builds + tests both toolchains on every push.
- C ABI scaffold: extern "C" header, opaque-handle pattern, error-code convention, and a generated-binding smoke test (one trivial call from C#).
- Headless test harness: a console host that links lattice-core directly and can spin up N in-process peers.
- Network-condition simulator primitive: a transport shim that injects latency, jitter, loss, reorder, and duplication, usable by both loopback and UDP paths.
- Unit-test framework, formatter/linter, sanitizer builds (ASan/UBSan/TSan) wired into CI.
Exit criteria
- git clone → one command builds core + services + runs the (empty) test suite green in CI on all three OSes.
- A C# program can call into the C ABI and get a value back across P/Invoke.
- The condition simulator can drop/delay/reorder packets in a unit test with reproducible seeds.
Phase 1 — Transport Core¶
Implements: 02 Netcode LLD §§ transport, reliability, fragmentation, crypto.
Deliverables
- Non-blocking UDP socket + I/O loop; loopback/MemoryTransport sharing the same interface.
- Four reliability channels (Unreliable / Unreliable-Sequenced / Reliable-Unordered / Reliable-Ordered) with sequence numbers + ACK bitfields.
- Fragmentation/reassembly keeping packets ≤ ~1200 B MTU.
- Crypto handshake (X25519 key agreement, ChaCha20-Poly1305 AEAD, replay protection).
- Connection lifecycle: connect, keepalive/heartbeat, timeout, graceful + ungraceful disconnect.
- Basic congestion control / pacing tuned for tick-rate traffic.
- Walking skeleton: handshake → connection → one replicated integer ticking across the harness.
Exit criteria - Reliable-Ordered delivers a stream intact under 30% loss + 150 ms RTT + reorder in the simulator. - Encrypted session established and a tampered/replayed packet is rejected. - Fuzzing the packet parser yields no crashes (ASan clean). - Walking skeleton runs two in-process peers exchanging state continuously.
Phase 2 — Replication MVP¶
Implements: 02 Netcode LLD §§ serialization, replication, snapshots, interpolation.
Deliverables
- Serialization: bit-packing, quantization (bounded positions, compressed quaternions), delta vs. last-acked baseline.
- NetworkObject / NetworkBehaviour model and [Networked]-style replicated property registration (via the ABI).
- Snapshot generation at 20–30 Hz, delta-compressed; per-client baseline/ack tracking.
- One authoritative server + dumb clients: clients render interpolated remote state (~100 ms buffer); no prediction yet.
- Spawn/despawn replication and initial-state (full snapshot) delivery.
Exit criteria
- A server can replicate 100+ moving NetworkObjects to multiple clients with smooth interpolation under jitter/loss.
- Delta compression measurably reduces bandwidth vs. full snapshots (recorded in metrics).
- Late-joining client receives correct full state then resumes deltas.
Phase 3 — Prediction, Rollback & Lag Compensation¶
Implements: 02 Netcode LLD §§ prediction/rollback/reconciliation, lag compensation; completes server/host-authoritative mode.
Deliverables - Client input/command buffer keyed by tick; client-side prediction of the local player. - Server reconciliation: rollback to confirmed tick + re-simulate buffered inputs on misprediction. - Extrapolation / dead reckoning ("estimated physics for destination") for remote entities when fresh data is missing. - Server-side lag compensation (rewind world to shooter's view for hit validation). - Tick-budget instrumentation: per-tick rollback cost, rollback frequency, re-sim depth.
Exit criteria - Local player feels responsive (no input lag) at 150 ms RTT; corrections are visually smooth. - A hitscan shot registers correctly against a moving target under lag compensation in a replay test. - Rollback re-simulation stays within the per-tick CPU budget (§9) for the target object count. - Server/host-authoritative mode is feature-complete and matches Photon-Fusion-class behaviour in the harness.
Phase 4 — Topologies (Dedicated + P2P + Relay + Migration)¶
Implements: 01 High-Level Design §§ topologies; 02 § NAT/relay.
Deliverables
- game-sim shared library compiled into both lattice-gameserver and the client (P2P host) — byte-for-byte identical sim.
- Listen-server / P2P host topology selectable at runtime.
- NAT traversal: STUN-style reflexive discovery + UDP hole punching.
- lattice-relay (TURN-like) as guaranteed fallback when hole punching fails.
- Host migration: detect host loss, transfer session + authority, resume play.
Exit criteria - The same sim binary runs as dedicated server and as P2P host with identical results in a replay-equivalence test. - Two peers behind simulated NATs connect directly when possible and automatically fall back to relay when not. - Host migration completes without a full session teardown; clients continue within an acceptable stall window.
Phase 5 — Shared / Distributed Authority¶
Implements: 00 Overview § decision 3.2; 02 § ownership/authority.
Deliverables
- Per-object ownership + authority mode on NetworkObject (mixed modes within a session).
- Authority transfer protocol (request/grant/handover) with conflict resolution.
- Eventual-consistency reconciliation for distributed-authority objects.
- Ownership-scoped RPC routing.
Exit criteria - Two peers can pick up / hand off ownership of the same object without state divergence or duplication. - A session mixes server-authoritative and shared-authority objects correctly side by side. - Authority transfer survives the network-condition simulator (loss/reorder during handover).
Phase 6 — Control Plane¶
Implements: 03 Auth Service; 01 § director/fleet.
Deliverables
- lattice-auth: identity + Ed25519-signed session tokens; dual-node deployment (HA, no single point of failure).
- lattice-director: matchmaking, session directory, fleet orchestration; allocates game servers and relay capacity.
- Token validation path: game server validates tokens against auth before admitting players.
- PostgreSQL + Redis wiring (accounts, sessions, presence).
Exit criteria - Client logs in → receives a signed token → is matched by director → placed onto a game server it can join, fully end-to-end. - Auth survives one node failing (the surviving node serves tokens). - Director allocates a relay when a P2P session needs fallback.
Phase 7 — First Engine Binding (Unity → Unreal → Godot)¶
Implements: 05 Engine Integration.
Deliverables
- lattice-unity: native plugin + idiomatic C# API (P/Invoke) over the C ABI; MonoBehaviour-style NetworkObject/[Networked] mapping.
- End-to-end vertical slice / MVP (§7) shipped in Unity: dedicated-server authoritative shooter with prediction + rollback, using auth + director.
- Then lattice-unreal (UE module) and lattice-godot (GDExtension), each reaching the same vertical slice.
- Cross-engine conformance harness (§8) exercising each binding against the same scenarios.
Exit criteria - The Unity vertical slice plays the authoritative shooter slice with prediction, against a real dedicated server, placed via director, authenticated via auth. - Unreal and Godot reach the same slice; the conformance suite passes identically across all three bindings. - No binding requires changes to the C ABI surface beyond additive ones (ABI stays stable).
Phase 8 — lattice-social¶
Implements: 04 Social Library.
Deliverables
- lattice-social: friends, presence, parties, invites over WSS; social graph in PostgreSQL; Redis/NATS presence + messaging.
- Standalone usage (service-only) and integrated usage (surfaced through the bindings, e.g. party → session join).
- Presence updates tied to session lifecycle from the director.
Exit criteria - Two players add each other, see presence, form a party, and party-join into the same session via director. - Social runs standalone (independent of a game session) and integrated, with the same API.
Phase 9 — Hardening, Scale & Security¶
Implements: cross-cutting non-functional requirements across all docs.
Deliverables - Interest management tuned for scale (grid/AoI per-client filtering) validated at target CCU. - Load testing to target CCU per game server and per relay (§9). - Soak / long-run stability (memory leaks, drift, reconnection storms). - Security audit: handshake, token handling, relay abuse, P2P/shared-authority cheat surface; fuzzing all wire parsers. - Full observability dashboards (RTT, bandwidth/player, rollback frequency, CCU, tick budget, relay throughput).
Exit criteria - Target CCU per game server sustained within the tick budget; per-relay throughput target met. - 24h+ soak run with no leaks, no unbounded drift, clean reconnection handling. - Security review actioned; no high-severity findings open; parsers fuzz-clean.
Phase 10 — (Stretch) Web / WASM / WebTransport¶
Implements: 00 Overview § decision 3.3 (web backend); 05 § lattice-web.
Deliverables
- lattice-web: core compiled to WASM (Emscripten); QUIC/WebTransport backend implementing the transport interface (browsers cannot do raw UDP).
- Browser vertical slice reaching feature parity with the native slice where the platform allows.
Exit criteria - A browser client joins the authoritative slice over WebTransport with acceptable latency/jitter behaviour. - The WASM build passes the cross-engine conformance suite for the supported feature subset.
5. Phase Sequencing (Gantt)¶
Relative ordering and overlap. Durations are abstract units, not calendar dates; they encode proportional effort and dependencies only.
gantt
title "Lattice — Relative Phase Sequencing (abstract durations, not dates)"
dateFormat X
axisFormat %s
section Core data plane
"P0 Foundations" :p0, 0, 3
"P1 Transport core" :p1, after p0, 5
"P2 Replication MVP" :p2, after p1, 5
"P3 Prediction + rollback" :p3, after p2, 6
"P4 Topologies + relay + migration" :p4, after p3, 5
"P5 Shared / distributed authority" :p5, after p4, 4
section Control plane
"P6 Auth + director (overlaps P3-P4)" :p6, after p2, 5
section Bindings & demo
"P7a Unity binding + MVP slice" :p7a, after p3, 4
"P7b Unreal binding" :p7b, after p7a, 3
"P7c Godot binding" :p7c, after p7b, 3
"P8 Social (standalone + integrated)" :p8, after p6, 4
section Hardening & stretch
"P9 Hardening / scale / security" :p9, after p5, 5
"P10 Web / WASM / WebTransport (stretch)" :p10, after p9, 4
Notable overlaps: the control plane (P6) can proceed in parallel once Replication MVP (P2) exists, since auth/director are largely independent of prediction internals. The Unity binding + MVP slice (P7a) can start as soon as prediction (P3) is feature-complete and the ABI is stable, even while topologies (P4) and shared authority (P5) continue in the core.
6. The Walking Skeleton¶
The earliest end-to-end artifact (delivered at Phase 1, thickened thereafter):
sequenceDiagram
participant C as "Client peer (harness)"
participant S as "Server peer (harness)"
C->>S: "UDP handshake (X25519)"
S-->>C: "session keys established"
C->>S: "connect / keepalive"
loop "every tick (60 Hz)"
S->>S: "advance counter"
S-->>C: "snapshot (Reliable-Ordered for demo)"
C->>C: "apply + display counter"
end
It replicates a single integer and nothing more, but it exercises socket, handshake, connection, serialization, and snapshot delivery seams. Every subsequent phase replaces a stubbed piece of this path with the real thing.
7. MVP vs. Full vs. Stretch¶
| Tier | Definition | What it proves | Phases |
|---|---|---|---|
| MVP | A dedicated-server authoritative shooter slice with client prediction + rollback + lag compensation, playable in Unity, with players authenticated via lattice-auth and placed via lattice-director. |
The whole stack works thin, end-to-end: custom encrypted UDP transport → replication → prediction/rollback → C ABI → one engine binding → control plane. This is the smallest thing that validates the architecture and the Photon-Fusion-class claim. | P0–P3, slice of P6, P7a |
| Full | Both authority models (server/host and shared/distributed); all three native topologies (dedicated, listen-server, P2P-relayed) with NAT traversal, relay fallback, and host migration; Unity + Unreal + Godot bindings; lattice-social; interest management, load testing, and security hardening at target scale. |
The complete product as scoped across 00–05. | P0–P9 |
| Stretch | Web: lattice-web (WASM) with a QUIC/WebTransport backend reaching the browser. |
Reach beyond native engines without compromising the native path. | P10 |
The MVP is the project's primary risk-buy-down: if prediction/rollback over the custom transport feels right in a real engine at realistic latency, the central technical bet is proven and everything after is widening, not gambling.
8. Testing & QA Strategy¶
Quality is built per-layer, not bolted on. The strategy mirrors the dependency graph: lower layers get the most exhaustive automated coverage because everything above depends on them.
8.1 Test types¶
| Layer / concern | Approach |
|---|---|
| Unit tests | Per-module: serialization round-trips, quantization precision bounds, ACK/sequence logic, delta encode/decode, ring buffers, crypto vectors. Run on every push, all three OSes. |
| Determinism / replay tests | Record inputs + tick stream; re-run the sim and assert byte-for-byte identical state. The same recording must reproduce identically across platforms (the determinism guarantee that rollback and P2P depend on). Replay equivalence between dedicated-server and P2P-host sim binaries. |
| In-process loopback tests | The MemoryTransport runs N peers in one process for fast, deterministic integration tests of reliability, replication, prediction, and authority transfer — no sockets, seedable, CI-friendly. |
| Network-condition simulation | Every integration test can be run through the condition shim: programmable latency, jitter, packet loss, reorder, duplication with reproducible seeds. Matrix the core scenarios across representative profiles (LAN, good broadband, mobile, lossy/high-RTT). |
| Soak / long-run | 24h+ runs watching for memory leaks, state drift, sequence-number wrap bugs, reconnection storms, and host-migration churn. |
| Load / scale tests | Synthetic-client swarms driving CCU per game server and per relay to target (§9), measuring tick budget, bandwidth, and rollback frequency under load. |
| Security tests | Wire-parser fuzzing (ASan/UBSan), handshake tampering/replay, token forgery/expiry, relay abuse/amplification, and P2P/shared-authority cheat-surface probing (illegal authority claims, ownership-transfer abuse, fabricated state). |
| Cross-engine conformance | One scenario suite run against every binding (Unity, Unreal, Godot, web) asserting identical observable behaviour, so a binding can't silently diverge from the core. |
8.2 Metrics & observability¶
First-class, emitted by the core and surfaced in dashboards from early phases: RTT, bandwidth per player (and per snapshot), rollback frequency + re-sim depth, per-tick CPU / tick budget headroom, CCU per server and per relay, relay throughput, packet loss/retransmit rates, and NAT/relay fallback rates. These metrics are also the acceptance instruments for the exit criteria in §4 and the targets in §9.
flowchart LR
CORE["lattice-core<br/>(instrumented)"] --> M["metrics emitter"]
GS["lattice-gameserver"] --> M
RLY["lattice-relay"] --> M
M --> DASH["dashboards:<br/>RTT, BW/player, rollback freq,<br/>tick budget, CCU, relay tput"]
M --> ALERT["alerts on<br/>budget / loss / CCU thresholds"]
9. Success Metrics & Targets¶
Directional targets that the exit criteria (§4) and load tests (§8) measure against. These express the Photon-Fusion-class performance goal concretely.
| Metric | Target | Notes |
|---|---|---|
| Sim tick rate | 60 Hz fixed | Per brief default. |
| Snapshot rate | 20–30 Hz | Delta-compressed against last ack. |
| Interpolation buffer | ~100 ms | Tunable per game. |
| Packet size | ≤ ~1200 B (sub-MTU) | Fragment above this. |
| Added latency budget (transport) | Single-digit ms over raw RTT | Pacing/processing overhead kept minimal. |
| Rollback cost per tick | Re-simulation stays within the per-tick CPU budget at target object count | Measured as tick-budget headroom under worst-case correction depth. |
| Bandwidth per player | Single-digit to low-tens of KB/s typical | Driven by AoI + delta + quantization; measured per snapshot rate. |
| CCU per game server | Hundreds of concurrent players per instance (game-dependent) | Validated under load with interest management on. |
| CCU / throughput per relay | High concurrent relayed sessions per relay node | Relay is forward-only; bandwidth, not CPU, is the usual ceiling. |
| NAT direct-connect success | Majority of peers connect directly; 100% connectivity via relay fallback | Direct rate is best-effort; relay guarantees the floor. |
| Determinism | Byte-for-byte identical replays cross-platform | Hard requirement for rollback + P2P. |
10. Risk Register¶
Likelihood and impact are Low / Medium / High. Risks are ordered roughly by overall severity.
| # | Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|---|
| R1 | Determinism across platforms — floating-point / compiler / arch differences break replay & rollback. | High | High | Determinism test suite from Phase 0; control FP (avoid undefined ordering, consider fixed-point or strict FP for sim math); replay-equivalence gate in CI across all OSes; isolate sim math from platform libs. |
| R2 | Rollback CPU cost — deep/frequent re-simulation blows the per-tick budget at scale. | Medium | High | Tick-budget instrumentation from Phase 3; bound re-sim depth; cheap deterministic sim; partial/scoped rollback (only mispredicted objects); profile under load in Phase 9; AoI reduces predicted set. |
| R3 | NAT traversal success rates — hole punching fails for symmetric/CGNAT, pushing load onto relay. | High | Medium | Relay fallback is a guaranteed floor (100% connectivity); measure direct-connect rate in load tests; multiple STUN-style probes; tune hole-punch strategy; capacity-plan relay for realistic failure rates. |
| R4 | Binding maintenance burden across 4 engines — keeping Unity/Unreal/Godot/Web in sync with core. | High | Medium | Keep the C ABI thin, additive-only, and stable; generate as much binding glue as possible; cross-engine conformance suite (§8) catches drift; per-engine owners; stagger bindings (Unity first, prove the pattern). |
| R5 | Security / cheat surface in P2P & shared-authority modes — a peer with authority can cheat. | High | High | Server-authoritative is the default for competitive play; validate all authority claims and ownership transfers; signed tokens (Ed25519); encrypt + authenticate the wire; relay can't be amplification-abused; explicit cheat-surface testing (§8.1); document P2P trust model honestly (per 00 non-goals). |
| R6 | Scope / timeline — six-plus subsystems and four bindings is large; risk of spreading thin. | Medium | High | Vertical-slice/MVP discipline (§7) proves the architecture before widening; strict phase exit criteria; control plane and bindings overlap only after their core dependencies are stable; web is explicitly stretch (P10). |
| R7 | Relay bandwidth cost — relayed sessions are pure egress; cost scales with relay traffic. | Medium | Medium | Maximize direct-connect rate (R3) so relay is fallback, not default; per-relay throughput targets + capacity planning (§9); director allocates relay only when needed; monitor relay throughput as a first-class metric. |
| R8 | Congestion control mistuning — a custom CC tuned for tick traffic misbehaves on real networks. | Medium | Medium | Test across the network-condition matrix (§8); pacing tuned for tick-rate not bulk; soak + load testing; conservative defaults with telemetry-driven tuning. |
| R9 | Host migration correctness — authority/state lost or duplicated when a host drops. | Medium | Medium | Build on proven ownership-transfer machinery (P5 before relying on it broadly); test migration under loss/reorder; bounded stall window in exit criteria; replay tests across migration events. |
| R10 | ABI churn breaking bindings — core API changes ripple into every binding. | Medium | Medium | Freeze the ABI surface early; additive-only evolution with versioning; opaque handles hide internals; conformance suite + smoke tests fail fast on breakage. |
quadrantChart
title "Risk exposure (likelihood x impact)"
x-axis "Lower likelihood" --> "Higher likelihood"
y-axis "Lower impact" --> "Higher impact"
quadrant-1 "Manage closely"
quadrant-2 "Plan for"
quadrant-3 "Monitor"
quadrant-4 "Mitigate actively"
"R1 Determinism": [0.85, 0.9]
"R5 P2P cheat surface": [0.8, 0.88]
"R2 Rollback CPU": [0.55, 0.85]
"R6 Scope/timeline": [0.5, 0.82]
"R3 NAT success": [0.8, 0.55]
"R4 Binding burden": [0.78, 0.52]
"R7 Relay cost": [0.5, 0.5]
"R8 Congestion tuning": [0.5, 0.48]
"R9 Host migration": [0.5, 0.5]
"R10 ABI churn": [0.45, 0.5]
11. Tooling¶
Built alongside the features they support, not after.
| Tool | Purpose | First needed |
|---|---|---|
| Network-condition simulator | Programmable latency/jitter/loss/reorder/duplication shim, seedable; drives both loopback and UDP transports. | Phase 0 |
| Headless test harness | Console host linking lattice-core directly; spins up N in-process peers for fast iteration and integration tests. |
Phase 0 |
| Record / replay | Capture inputs + tick stream; deterministically re-run for debugging, determinism tests, and regression. | Phase 1–3 |
| Packet inspector / visualizer | Decode and visualize the wire format (channels, acks, fragments, snapshot deltas) for debugging transport/replication. | Phase 1–2 |
| Metrics dashboards | RTT, bandwidth/player, rollback frequency, tick budget, CCU, relay throughput. | Phase 2 onward |
| Load-test client swarm | Synthetic clients to push CCU per server / per relay to target. | Phase 9 |
| Reference demo game | The authoritative shooter slice (the MVP) — doubles as integration test, conformance target, and showcase. | Phase 7a |
12. Team Shape (Directional)¶
Directional and role-based, not date- or headcount-committed. Roles can be combined on a small team or split on a larger one.
| Role | Focus | Engaged during |
|---|---|---|
| Core net engineers (2–3) | lattice-core: transport, reliability, serialization, replication, prediction/rollback, interest management, ABI. The critical path. |
P0–P5, P9 |
| Services engineer (1–2) | lattice-auth, lattice-director, fleet orchestration, storage; later assists lattice-social. |
P6, P8 |
| Per-engine binding owners (1 per engine) | Idiomatic Unity / Unreal / Godot / Web wrappers; keep bindings conformant as the ABI evolves. | P7, P10 |
| Tooling / infra engineer (1) | Build/CI, network simulator, record/replay, packet inspector, metrics, load harness. | P0 onward |
| QA / test engineer (1) | Determinism/replay suites, condition-matrix runs, soak/load, conformance, security testing. | P0 onward |
| Demo / integration engineer | The reference shooter slice; often the binding owner for the lead engine. | P7 |
The critical path runs through the core net engineers; bindings, services, and tooling are scheduled to depend on stable core milestones (see §3, §5). Security testing in Phase 9 may pull in a specialist or external audit.
13. Summary¶
Build the data-plane core first, headless, and prove the architecture with a single vertical slice — a dedicated-server authoritative shooter with prediction + rollback in Unity (the MVP) — before widening into more topologies, shared authority, more engines, social, and scale. The module dependency graph (§3) fixes the ordering: transport → reliability → connection → serialization → replication → prediction → interest management/ownership → bindings, with auth/director gating matchmaking and relay gating P2P fallback. The biggest risks — cross-platform determinism, rollback CPU cost, and the P2P/shared-authority cheat surface — are bought down early and continuously by determinism/replay tests, tick-budget instrumentation, server-authoritative defaults, and a network-condition simulator that every integration test can run through. Web is explicitly the stretch tier.