Skip to content

02 — Netcode Low-Level Design (lattice-core)

Status: Design (authoritative for the transport + netcode core). Scope: The internals of lattice-core — the C++20 library that every Lattice data-plane binary (lattice-gameserver, lattice-relay) and every engine binding (lattice-unity, lattice-unreal, lattice-godot, lattice-web) links against. Audience: Core/netcode engineers, the binding-API author (05-engine-integration.md), and the roadmap author (06-implementation-roadmap.md).

Read first: 00-overview.md (vocabulary, locked decisions) and 01-high-level-design.md (topologies, data plane vs control plane). Cross-references: token formats from 03-auth-service.md; the C ABI surface consumed by 05-engine-integration.md; phasing in 06-implementation-roadmap.md.

This is the deepest document in the set. It specifies wire formats, state machines, and algorithms precisely enough to implement from. Where a parameter is given, it is the default; everything tunable is marked (configurable) and surfaced through the config struct exposed across the C ABI (see §19).


Table of Contents

  1. Design goals & invariants
  2. Layered architecture
  3. Packet & header formats
  4. Connection lifecycle & handshake
  5. Reliability system & channels
  6. Congestion control & bandwidth management
  7. Fragmentation & reassembly
  8. Encryption, replay protection & key rotation
  9. NAT traversal & relay protocol
  10. Simulation model & clock synchronization
  11. Client-side prediction & rollback reconciliation
  12. Remote-entity smoothing: interpolation & dead reckoning
  13. Lag compensation
  14. State replication
  15. Interest management / Area of Interest
  16. RPC system
  17. Authority models
  18. The shared simulation library mechanism
  19. Public C ABI surface
  20. Serialization codegen
  21. Networked type system & custom types
  22. Defaults & tuning summary

1. Design goals & invariants

  • Single implementation. Reliability, congestion control, crypto, serialization, replication, prediction/rollback, and the fixed-tick simulation loop are implemented exactly once in C++20 and shared by server, P2P host, and client.
  • Stable C ABI boundary. No C++ types, exceptions, or STL containers cross the public surface. Everything is POD structs, opaque handles, and extern "C" functions (§19).
  • Photon-Fusion-class feel. Prediction, snapshot interpolation (~100 ms buffer), extrapolation/dead reckoning for late packets ("estimated physics for destination"), and server-side lag compensation are built in, not bolted on.
  • Both authority models coexist per-object. An object can be server/host-authoritative or shared/distributed-authoritative; the replication and prediction layers branch on a per-object AuthorityMode (§17).
  • Zero heap churn in the hot path. Per-tick work uses pre-sized ring buffers and arena allocators; the only allocations on the data plane happen at connection setup and on capacity growth.
  • Endian-safe, version-tolerant wire format. All multi-byte integers are little-endian on the wire (the overwhelming majority of targets are LE; BE hosts byte-swap at the serialization boundary). Every datagram carries a protocol version and the connection carries a negotiated schema hash (§20).

Hard invariants (must hold for any conforming implementation):

# Invariant
I1 The simulation advances only in whole fixed ticks; render time is decoupled and interpolates between sim states.
I2 Identical sim code produces identical results given identical (tick, inputs, object-state) on every role — server, host, predicting client.
I3 Every encrypted packet's nonce is unique for the lifetime of a key; reuse is a fatal protocol error.
I4 A reliable message is delivered exactly once and (on ordered channels) in order.
I5 Authority over an object is held by exactly one peer at any tick; transfer is atomic w.r.t. tick.

2. Layered architecture

lattice-core is a strict layer stack. Each layer depends only on the layer below and exposes a narrow internal interface upward. The public C ABI sits on top; the OS socket sits at the bottom.

flowchart TB
    subgraph PUB["Public C ABI (abi/)"]
        ABI["extern C : handles, POD structs, callbacks"]
    end
    subgraph SIM["Simulation (simulation/)"]
        TICK["Fixed-tick loop, clock sync, input buffering"]
        PRED["Prediction, rollback, reconciliation, lag comp"]
    end
    subgraph REP["Replication (replication/)"]
        NETOBJ["NetworkObject / NetworkBehaviour, dirty tracking"]
        SNAP["Snapshot ring buffer, baseline+delta, AoI"]
        RPC["RPC routing"]
    end
    subgraph SER["Serialization (serialization/)"]
        BITS["Bitstream packers, quantization, codegen"]
    end
    subgraph FRAG["Fragmentation (transport/)"]
        FR["Split / reassemble messages over MTU"]
    end
    subgraph RELI["Reliability / Channels (reliability/)"]
        CH["4 channels: seq numbers, ACK bitfield, RTO, ordering"]
    end
    subgraph CONN["Connection (transport/)"]
        CONNMGR["Lifecycle state machine, heartbeats, MTU discovery"]
    end
    subgraph CRY["Crypto (transport/)"]
        AEAD["X25519 handshake, ChaCha20-Poly1305 AEAD, replay window"]
    end
    subgraph SOCK["Socket (transport/)"]
        UDP["Non-blocking UDP socket(s) + pluggable QUIC/WebTransport backend"]
    end

    ABI --> SIM
    SIM --> REP
    REP --> SER
    SER --> FRAG
    FRAG --> RELI
    RELI --> CONN
    CONN --> CRY
    CRY --> SOCK
    SOCK -->|"datagrams"| NET(("Network / NAT / lattice-relay"))

2.1 Per-layer responsibilities

Layer Module Responsibility Does not do
Socket transport/socket Open non-blocking UDP socket(s), recv/send raw datagrams, dispatch by 4-tuple, pluggable backend (native UDP, or QUIC/WebTransport for web). Reliability, ordering, crypto.
Crypto transport/crypto X25519 ECDH, ChaCha20-Poly1305 AEAD per packet, nonce management, replay window, key rotation (§8). Anything below the AAD-protected header.
Connection transport/connection Lifecycle FSM, handshake driving, heartbeats, RTT/timeout bookkeeping, Path-MTU discovery, peer registry (§4). Channel semantics.
Reliability/Channels reliability/ Four delivery channels: sequence numbers, ACK + ACK bitfield, RTO/RTT estimation, retransmission, ordering buffers (§5). Splitting messages > MTU.
Fragmentation transport/fragment Split a serialized message > MTU into fragments, reassemble on receipt (§7). Choosing what to send.
Serialization serialization/ Bit-packed writers/readers, quantization (compressed floats, smallest-three quats), codegen for [Networked] members (§20). Knowing object identity.
Replication replication/ NetworkObject/NetworkBehaviour model, dirty tracking, baseline+delta snapshots, snapshot ring buffer, AoI, RPC routing, authority (§14§17). Stepping physics.
Simulation simulation/ + prediction/ Fixed-tick loop, clock sync, input sampling/buffering, prediction, rollback/reconciliation, dead reckoning, lag compensation (§10§13). Game-specific rules (that lives in game-sim/).
Public C ABI abi/ The extern "C" surface: create/destroy worlds & connections, register networked types, pump ticks, marshal callbacks (§19). Implementing any of the above.

The boundary between simulation/ (engine-provided loop + buffers) and game-sim/ (the shared simulation library, §18) is the most important seam in the suite: game-sim/ is the user's deterministic game code; simulation/ drives it identically on every role.


3. Packet & header formats

Everything on the wire is a single UDP datagram (the QUIC/WebTransport backend tunnels the same logical frames inside QUIC streams/datagrams). A datagram is:

[ Outer datagram header ][ AEAD-encrypted payload ][ 16-byte Poly1305 tag ]

The outer header is AAD (authenticated but not encrypted) so a receiver can route, derive the nonce, and reject replays before decrypting. The payload — one or more channel messages — is encrypted.

3.1 Outer datagram header (AAD, cleartext)

Field Size Description
protocol_id 4 B Magic + protocol version; mismatched versions are dropped silently. Little-endian.
packet_type 1 B 0x00 Handshake, 0x01 Payload, 0x02 Keepalive, 0x03 Disconnect, 0x04 ChallengeResponse, 0x05 RelayCtrl.
connection_id 8 B Server-assigned per-connection id (0 during initial handshake). Enables NAT-rebind survival without re-handshake.
key_epoch 1 B Which session key generation encrypted this packet (rotation, §8.4).
packet_seq 8 B (var) Per-direction monotonically increasing packet sequence; also the AEAD nonce source and the replay-protection counter. Varint-encoded on the wire (1–9 B).

After decryption the payload begins. Bit-packing applies inside the payload, not to the outer header (the outer header is byte-aligned for cheap routing).

3.2 Payload framing — channel messages

The decrypted payload is a back-to-back sequence of channel messages, each:

Field Size Description
channel_id 4 bits 0 Unreliable, 1 Unreliable-Sequenced, 2 Reliable-Unordered, 3 Reliable-Ordered (room for 16 logical channels; §5).
msg_flags 4 bits bit0 is_fragment, bit1 is_rpc, bit2 is_snapshot, bit3 reserved.
(per-channel header) var See §3.3.
length 14 bits Body length in bytes (max 16 383; bounded by MTU anyway).
body var Bit-packed serialized content.

Multiple small messages coalesce into one datagram up to the safe MTU payload (~1200 B, configurable). This amortizes the 30-byte-ish header + 16-byte tag overhead.

3.3 Per-channel headers

Unreliable (channel 0) carries no extra header. The others:

Unreliable-Sequenced (1) — newest wins, old drops:

Field Size Description
sequence 16 bits Per-channel sequence; receiver discards anything ≤ last seen.

Reliable-Unordered (2) / Reliable-Ordered (3):

Field Size Description
message_id 16 bits Per-channel reliable message id (wraps; compared with sequence-difference arithmetic).
ack_seq 16 bits Highest packet_seq the sender has received from this peer (piggy-backed ACK).
ack_bitfield 32 bits ACKs the 32 packets before ack_seq (bit n = ack_seq − (n+1) received).

ACKs ride on every reliable message and on keepalives, so a single lost ACK is recovered by the next packet — there are no standalone ACK packets in steady state.

3.4 ACK bitfield semantics

ack_seq      = 1042        // latest packet_seq received from peer
ack_bitfield = 0b...1011   // bit0 -> 1041 received, bit1 -> 1040 received,
                           // bit2 -> 1039 MISSING, bit3 -> 1038 received, ...

A single ACK therefore confirms up to 33 packets. Because ACKs are redundant across packets, the sender treats a packet as ACKed the first time any received ACK covers it.

3.5 Fragment header

Present when msg_flags.is_fragment is set (see §7):

Field Size Description
fragment_group 16 bits Id shared by all fragments of one message.
fragment_index 8 bits 0-based index of this fragment.
fragment_count 8 bits Total fragments (≤ 256 ⇒ max ~300 KB per message at 1200 B each).

3.6 Handshake packet bodies

Handshake (packet_type 0x00/0x04) bodies are byte-aligned (not bit-packed) and described in §4.2.


4. Connection lifecycle & handshake

4.1 State machine

stateDiagram-v2
    [*] --> Disconnected
    Disconnected --> Connecting : connect(endpoint, token)
    Connecting --> Handshake : ConnectionRequest sent, Challenge received
    Handshake --> Connected : key confirmation OK, token valid
    Connecting --> Timeout : no response (retry budget exhausted)
    Handshake --> Timeout : challenge/confirm timeout
    Handshake --> Disconnected : token rejected / version mismatch
    Connected --> Connected : heartbeat / payload (RTT, MTU probe)
    Connected --> Disconnecting : disconnect() called
    Connected --> Timeout : no packet within timeout window
    Disconnecting --> Disconnected : Disconnect ack or grace elapsed
    Timeout --> Disconnected
    Disconnected --> [*]
State Meaning Entry actions Exit triggers
Disconnected No association. Free per-connection state. connect().
Connecting Request sent, awaiting challenge. Send ConnectionRequest (carries client ephemeral X25519 pubkey + connect token); start retransmit timer (250 ms, exp backoff, max 10 tries (configurable)). Challenge received → Handshake; budget exhausted → Timeout.
Handshake ECDH done locally; confirming keys + validating token. Compute shared secret, derive session keys, send ChallengeResponse (encrypted confirmation). Confirm OK → Connected; rejected → Disconnected; timeout → Timeout.
Connected Steady state. Assign connection_id; start heartbeat (every 1 s if idle) and timeout timer (10 s); begin Path-MTU discovery. disconnect() → Disconnecting; silence → Timeout.
Disconnecting Graceful close. Send 3× Disconnect (unreliable, redundant); 200 ms grace. Ack or grace → Disconnected.
Timeout Lost peer. Surface OnDisconnected(reason=Timeout) callback. → Disconnected.

4.2 Handshake wire exchange (encrypted, token-gated)

The handshake establishes a confidential, authenticated channel and proves the client holds a valid connect token issued by lattice-director / signed per 03-auth-service.md (Ed25519). It is a challenge–response to defeat spoofing and amplification.

sequenceDiagram
    autonumber
    participant C as Client
    participant S as Server / Host
    C->>S: ConnectionRequest { proto_id, client_eph_pub (X25519, 32B), connect_token }
    Note over S: Verify token signature (Ed25519) + expiry + audience.<br/>Do NOT allocate heavy state yet (anti-DoS).
    S->>C: Challenge { server_eph_pub (32B), challenge_nonce (16B, HMAC of client addr) }
    Note over C: ECDH(client_eph_priv, server_eph_pub) -> shared secret.<br/>HKDF -> {tx_key, rx_key}. 
    C->>S: ChallengeResponse { AEAD( echo challenge_nonce, token_id ) }
    Note over S: Reconstruct shared secret, decrypt+verify.<br/>Token bound to this address. Allocate connection_id.
    S->>C: ConnectionAccepted { AEAD( connection_id, server_tick, mtu_hint ) }
    Note over C,S: Connected. Subsequent packets use packet_type=Payload.

Key schedule. shared = X25519(eph_priv, peer_eph_pub). Then HKDF-SHA256(shared, salt = challenge_nonce, info = "lattice/v1/" + role) yields directional keys: each side encrypts with its own tx_key and decrypts with rx_key, so the two directions never share a key (eliminates a class of nonce-reuse bugs). key_epoch starts at 0.

Anti-DoS. The server allocates no per-connection memory until ChallengeResponse verifies. challenge_nonce is HMAC(server_secret, client_addr || time_bucket) so the server is stateless across step 2→3 (cookie pattern). The connect token is single-use (token_id recorded in Redis for its short TTL; see 03-auth-service.md).

4.3 Heartbeats & timeout

In Connected, if no packet has been sent for 1 s, send a Keepalive (carries the piggy-backed ACK + RTT probe timestamp). If no packet has been received for timeout (default 10 s, configurable) the connection transitions to Timeout. NAT rebind (source address changes but connection_id matches and AEAD verifies) updates the stored endpoint without re-handshaking — survives Wi-Fi/cellular handoff.

4.4 Path-MTU discovery

Start at the conservative safe payload (1200 B). Periodically probe larger sizes with a padded, DF-flagged Keepalive; on ACK, raise the cap; on repeated loss at a size, back off. The discovered MTU bounds fragmentation (§7) and coalescing.


5. Reliability system & channels

Lattice exposes four reliability channels (reliable.io / Gaffer-on-Games lineage). The caller picks a channel per message; the engine never silently upgrades a channel.

# Channel Guarantee Typical use
0 Unreliable May drop, may dup, may reorder. Voice frames, cosmetic effects.
1 Unreliable-Sequenced May drop; never delivers stale (older than last seen). Snapshots, dead-reckoning state — newest wins.
2 Reliable-Unordered Delivered exactly once; any order. Independent events (pickups, one-shot RPCs).
3 Reliable-Ordered Delivered exactly once, in order. Chat, sequenced game events, critical RPCs.

Snapshots ride channel 1 by default: a missing snapshot is simply superseded by the next one (interpolation/extrapolation covers the gap, §12), so retransmitting stale world state is pointless. Reliable channels are for events.

5.1 RTT / RTO estimation

Per connection, RTT is sampled from ACK round-trips on probe-timestamped packets (Karn's algorithm: ignore samples from retransmitted packets). Smoothed RTT + variance per RFC 6298:

srtt   = (1 - alpha) * srtt   + alpha * sample      // alpha = 1/8
rttvar = (1 - beta)  * rttvar + beta  * |srtt - sample|  // beta = 1/4
rto    = clamp(srtt + 4 * rttvar, RTO_MIN=50ms, RTO_MAX=1s)

5.2 Send / ACK / resend pseudocode

// ---- SEND (per reliable message) -------------------------------------------
function send_reliable(channel, bytes):
    msg.id        = channel.next_id++          // 16-bit, wraps
    msg.payload   = bytes
    msg.sent_time = now()
    msg.acked     = false
    channel.unacked.push(msg)                  // ring buffer keyed by msg.id
    enqueue_for_packing(channel.id, msg)       // coalesced into next datagram

// Each outbound datagram records which (channel,msg.id) it carried,
// keyed by the datagram's packet_seq, in sent_packets[packet_seq].

// ---- ON ACK (parse ack_seq + ack_bitfield from any inbound reliable msg) ----
function on_ack(ack_seq, ack_bitfield):
    process_one_ack(ack_seq)
    for n in 0..31:
        if bit_set(ack_bitfield, n):
            process_one_ack(ack_seq - (n + 1))

function process_one_ack(packet_seq):
    pkt = sent_packets[packet_seq]
    if pkt == null or pkt.processed: return
    pkt.processed = true
    update_rtt(now() - pkt.sent_time)          // Karn: skip if pkt was a retransmit
    for (channel_id, msg_id) in pkt.carried:
        m = channels[channel_id].unacked.find(msg_id)
        if m and not m.acked:
            m.acked = true
            channels[channel_id].unacked.remove(msg_id)
    congestion_on_ack(pkt)                      // see §6

// ---- RESEND (called once per tick) -----------------------------------------
function tick_resend():
    for channel in reliable_channels:
        for msg in channel.unacked:
            if not msg.acked and now() - msg.sent_time >= rto:
                msg.sent_time = now()
                msg.retransmitted = true        // poisons RTT sample (Karn)
                enqueue_for_packing(channel.id, msg)
                congestion_on_loss()            // RTO firing == loss signal

5.3 Ordering buffers

  • Reliable-Ordered (3): receiver keeps a reassembly window indexed by message_id; delivers contiguously from next_expected_id, buffering out-of-order arrivals until the gap fills. Window size bounds memory (default 256 in-flight).
  • Reliable-Unordered (2): receiver keeps a dedup set (sliding bitfield of recently seen message_ids) and delivers immediately, dropping duplicates.
  • Unreliable-Sequenced (1): receiver keeps only last_seen_seq; delivers iff seq_greater(incoming, last_seen_seq) using wrap-aware comparison; updates last_seen_seq.

Wrap-aware comparison (16-bit): seq_greater(a,b) = ((a > b) && (a - b <= 0x8000)) || ((a < b) && (b - a > 0x8000)).


6. Congestion control & bandwidth management

Two cooperating mechanisms: a congestion mode that scales the aggregate send rate, and a priority accumulator that allocates the resulting budget across objects.

6.1 RTT-based good/bad mode

A simple, robust controller (Gaffer-on-Games style) that avoids buffer-bloat by reacting to RTT, augmented by loss from the ACK stream:

GOOD mode: RTT under threshold (e.g. < 250 ms) and loss low.
           -> raise send rate toward MAX (e.g. 60 -> ... -> target pps).
BAD mode:  RTT over threshold or sustained loss.
           -> drop to a conservative rate (e.g. 10 pps) immediately.

Hysteresis: must stay GOOD for >= T_good (e.g. 1s, doubling up to 60s) before
            promoting; any BAD trigger demotes instantly and resets the timer.

The send rate is expressed as a bandwidth budget (bytes/sec), derived from packets_per_sec * mtu. Mode changes scale that budget.

6.2 Pacing

The budget is paced, not bursted: a token-bucket meters bytes out per tick so we never dump a whole snapshot at once (which would spike RTT and trip BAD mode). Tokens accrue at the budget rate, capped at ~2 datagrams of burst.

6.3 Priority / accumulator-based per-object allocation

Snapshots usually want to send more object state than the budget allows. Each replicated object carries a priority accumulator:

per object o (relevant to client c):
    priority(o) = base_priority(o)                 // designer-set / type default
                * distance_factor(o, c)            // AoI: closer = higher (§15)
                * staleness_factor(o)              // ticks since last sent to c
                * authority_factor(o)              // owned/predicted objects boost

each snapshot build:
    for o in relevant(c): o.accum += priority(o)
    sort relevant(c) by accum desc
    while budget_remaining > size_estimate(next o):
        write_delta(o); budget_remaining -= size; o.accum = 0   // reset after send
    // unsent objects keep their accumulated priority -> sent soon, no starvation

This guarantees: high-priority/near objects update every snapshot; distant/low objects update less often but never starve (accumulator monotonically rises until sent). The budget itself is the congestion-controlled value from §6.1.


7. Fragmentation & reassembly

Messages larger than the path MTU payload (default ~1200 B) are fragmented at the serialization/fragmentation boundary — above reliability, so each fragment is an ordinary channel message and benefits from per-channel ACK/retransmit.

  • Split: chop the serialized message into ceil(len / mtu_payload) fragments (≤ 256). Each carries the fragment header (§3.5) and is sent on the same channel as the parent (reliable messages ⇒ reliable fragments).
  • Reassemble: the receiver keys a reassembly buffer by (peer, channel, fragment_group), tracks a received-bitset of size fragment_count, and assembles when all bits set. A per-group timeout (e.g. 2 s) reclaims abandoned partials.
  • Unreliable fragments are best-effort: if any fragment is lost the whole group is dropped at timeout (acceptable for the rare oversized unreliable message; snapshots are delta-compressed precisely to avoid fragmentation).
function reassemble(peer, channel, frag):
    g = groups[(peer, channel, frag.fragment_group)] or new(frag.fragment_count)
    if g.bitset[frag.fragment_index]: return            // dup
    g.bitset[frag.fragment_index] = true
    g.parts[frag.fragment_index]  = frag.body
    g.received += 1
    if g.received == frag.fragment_count:
        msg = concat(g.parts)
        deliver_to_channel(channel, msg)
        groups.erase((peer, channel, frag.fragment_group))

8. Encryption, replay protection & key rotation

8.1 Cipher suite

  • Key exchange: X25519 ECDH (ephemeral on both sides ⇒ forward secrecy).
  • AEAD: ChaCha20-Poly1305 (libsodium crypto_aead_chacha20poly1305_ietf).
  • KDF: HKDF-SHA256 → directional tx_key/rx_key.
  • Tokens: Ed25519-signed connect tokens (issued by control plane, 03-auth-service.md).

8.2 Per-packet AEAD

nonce = LE96( key_epoch (8 bits) || packet_seq (88 bits) )   // 12-byte IETF nonce
ciphertext, tag = AEAD_encrypt(key = tx_key,
                               nonce = nonce,
                               plaintext = payload,
                               aad = outer_datagram_header)   // §3.1

The outer header is AAD: tampering with connection_id, key_epoch, or packet_seq fails the tag check. Because packet_seq is monotonic and never reused per key, nonces are unique by construction (invariant I3).

8.3 Replay protection

The receiver keeps a sliding window (default 1024 packets) of seen packet_seq per (connection, key_epoch):

function accept_packet_seq(seq):
    if seq > window.high:
        window.slide_to(seq); mark(seq); return ACCEPT
    if seq <= window.high - window.size: return DROP_TOO_OLD
    if window.is_set(seq):                return DROP_REPLAY
    window.mark(seq);                     return ACCEPT

This tolerates reordering within the window while rejecting duplicates and old replays. The check runs on the AAD-verified header before full decryption cost where possible.

8.4 Key rotation

Long sessions rotate keys to bound the data encrypted under one key and to cap nonce-space usage. Either side may initiate a re-key (in-band reliable control message carrying a fresh ephemeral pubkey); both derive epoch+1 keys via HKDF over the new shared secret, then bump key_epoch. Both epochs are briefly accepted (overlap window) so in-flight packets under the old key still decrypt; the old epoch is retired after the window. The replay window is reset per epoch. Rotation is recommended on a time or byte-count threshold (e.g. every 15 min or 1 GiB, configurable).


9. NAT traversal & relay protocol

Lattice supports dedicated-server (no NAT problem — the server has a public address), listen-server/P2P-host, and relayed topologies (01-high-level-design.md). For P2P, the path is: discover reflexive addresses → attempt direct hole punch → fall back to lattice-relay.

flowchart LR
    subgraph CP["Control plane"]
        DIR["lattice-director (rendezvous/signaling)"]
        STUN["STUN-style reflexive (UDP 3478)"]
    end
    A["Peer A (NAT A)"]
    B["Peer B (NAT B)"]
    RLY["lattice-relay (UDP 7777, TURN-like)"]

    A -- "1: who am I?" --> STUN
    B -- "1: who am I?" --> STUN
    A -- "2: candidates via signaling" --> DIR
    B -- "2: candidates via signaling" --> DIR
    DIR -- "3: exchange candidates + relay alloc token" --> A
    DIR -- "3: exchange candidates + relay alloc token" --> B
    A -. "4a: simultaneous-open hole punch (direct)" .-> B
    B -. "4a: simultaneous-open hole punch (direct)" .-> A
    A == "4b: fallback if punch fails" ==> RLY
    B == "4b: fallback if punch fails" ==> RLY
    RLY == "forward by allocation, token-authed" ==> A
    RLY == "forward by allocation, token-authed" ==> B

9.1 Reflexive discovery (STUN-style)

Each peer sends a binding request to a STUN-style responder (UDP 3478); the response echoes the peer's reflexive (public) ip:port as seen from outside its NAT. Peers learn their host (local) + reflexive candidates and submit them to lattice-director as rendezvous.

9.2 Simultaneous-open hole punching

The director relays each peer's candidate set to the other. Both peers then send the normal ConnectionRequest handshake (§4.2) to all of the other's candidates simultaneously. Most NATs, having just sent an outbound packet to the peer's address, will accept the inbound one (the "hole"). The first candidate pair that completes the encrypted handshake wins; others are abandoned. The handshake's token gate doubles as authentication for the direct path — no separate P2P auth needed.

9.3 Relay fallback (TURN-like) — lattice-relay

If hole punching fails within a budget (e.g. 5 s / N attempts), peers fall back to lattice-relay (UDP 7777). The relay is dumb and oblivious to game content — it forwards encrypted datagrams by allocation, never decrypting them (the AEAD session is end-to-end between the peers; the relay only sees outer headers).

Allocation & forwarding protocol (packet_type 0x05 RelayCtrl):

Message Direction Fields Purpose
Allocate peer → relay relay_token (Ed25519-signed by director), session_id Reserve a relay slot; relay verifies token, returns an allocation_id + relayed address.
AllocateOk relay → peer allocation_id, relayed_addr The public ip:port the other peer should send to.
Bind peer → relay allocation_id, peer_allocation_id Pair two allocations into a forwarding channel.
Forward peer ↔ relay (opaque encrypted datagram) Relay rewrites src/dst per the bound allocations and forwards.
Refresh peer → relay allocation_id Keep-alive; allocation expires (e.g. 30 s) without it.

Abuse prevention: every Allocate requires a short-lived, audience-scoped, Ed25519 relay token from lattice-director (consistent with 03-auth-service.md). The relay enforces per-token bandwidth/lifetime quotas and drops Forward frames whose connection_id doesn't match a bound allocation, so it can't be used as an open reflector or amplifier. Because payloads stay AEAD-encrypted end-to-end, a compromised relay cannot read or forge game state.

The relay is path-only. It changes addressing, not semantics: from the peers' point of view a relayed connection is identical to a direct one — same handshake, same channels, same crypto. This keeps the shared simulation library (§18) blind to topology.


10. Simulation model & clock synchronization

10.1 Fixed-tick loop, decoupled from render

The simulation advances in fixed ticks (default 60 Hz, dt = 1/60 s, configurable). Rendering runs at the display rate and interpolates between the two most recent sim states (invariant I1). The classic accumulator loop:

accumulator += frame_time           // real wall-clock delta since last frame
while accumulator >= DT:
    sample_input_for_tick(current_tick)
    simulate_tick(current_tick)      // calls into game-sim/ (§18)
    current_tick += 1
    accumulator -= DT
alpha = accumulator / DT             // 0..1
render(interpolate(prev_state, cur_state, alpha))

simulate_tick is identical code on server, host, and predicting client — it is the game-sim/ entry point invoked through simulation/.

10.2 Clock sync between client and authority

Clients must run ahead of the authority by roughly the one-way delay so that tick-stamped inputs arrive at the server just before the server simulates that tick (Fusion-style). Lattice keeps a continuously-estimated offset:

// piggy-backed on keepalives/snapshots:
server_tick_at_recv = snapshot.server_tick
est_one_way         = srtt / 2
target_client_tick  = server_tick_at_recv + ceil(est_one_way / DT) + input_lead
// input_lead (e.g. 1-2 ticks) absorbs jitter so inputs aren't late.

The client nudges its tick rate slightly (speeds up / slows down by a fraction of a tick per frame) to converge current_tick toward target_client_tick without hard snapping — a software PLL. Large discontinuities (e.g. after a stall) force a resync.

10.3 Input sampling, tick-stamping & buffering

  • Each tick the client samples the input device into a compact input command struct (game-defined) and tick-stamps it.
  • The client sends a sliding redundant window of the last N input commands (e.g. last 3, configurable) every input packet on Unreliable-Sequenced — losing one input packet is recovered by the next because it re-includes recent commands. Inputs are never sent reliably (a late input is useless).
  • The server keeps a small jitter buffer of received inputs per client. It consumes one input per client per server tick. If the buffer underruns (input not yet arrived), it repeats the last input (or applies a game-defined "no input" default) and flags the tick for that client; an over-full buffer (client running too far ahead) gently slows the client via the clock-sync nudge.

11. Client-side prediction & rollback reconciliation

For locally-owned, predicted objects (the player's own pawn under server/host authority, or any object this peer has state-authority over), the client simulates immediately rather than waiting a round-trip — then reconciles against the authoritative snapshot.

11.1 Data structures

  • Input ring buffer: inputs[tick] = InputCommand, retained for the last PREDICTION_WINDOW ticks (e.g. 64).
  • Predicted-state ring buffer: states[tick] = PredictedState (the predicted result of applying inputs[tick]), same window.
  • The authoritative snapshot carries, per owned object, the last_processed_input_tick the server applied (the ACK of inputs in state-space).

11.2 Algorithm (predict → compare → rewind → resimulate)

// ---- Each client tick: predict forward -----------------------------------
function client_tick(t):
    inputs[t] = sample_input()
    apply_local(predicted_objects, inputs[t])     // game-sim/ simulate_tick on owned objs
    states[t] = snapshot_local(predicted_objects)
    send_input_window(t)                          // §10.3

// ---- On authoritative snapshot for predicted objects ---------------------
function on_authoritative_snapshot(snap):
    ack_tick = snap.last_processed_input_tick
    // 1. Compare server's authoritative state at ack_tick to what we predicted.
    if approx_equal(states[ack_tick], snap.state):
        discard_inputs_up_to(ack_tick)            // prediction was right; nothing to do
        return
    // 2. MISPREDICTION: rewind owned objects to the authoritative state.
    set_state(predicted_objects, snap.state)      // overwrite with truth at ack_tick
    // 3. Re-apply every input AFTER ack_tick, re-running the SAME sim code.
    for t in (ack_tick + 1) .. current_tick:
        apply_local(predicted_objects, inputs[t])
        states[t] = snapshot_local(predicted_objects)  // refresh corrected history
    discard_inputs_up_to(ack_tick)
    // 4. Optional: feed the correction delta into the smoother (§12.4) so the
    //    visual position eases to the corrected one instead of snapping.

Because apply_local is the same game-sim/ code the server ran (invariant I2), matching inputs yields matching state and approx_equal holds the vast majority of ticks — reconciliation does real work only on genuine mispredictions (a collision the client didn't foresee, another player's interaction, etc.).

11.3 Misprediction correction timeline

sequenceDiagram
    autonumber
    participant CL as "Client (predicting)"
    participant SV as "Server (authority)"
    Note over CL: tick 100: input I100 -> predict P100 (move right)
    CL->>SV: input I100 (tick-stamped, redundant window)
    Note over CL: ticks 101..104: keep predicting P101..P104
    Note over SV: server applies I100 at its tick 100,<br/>but a wall stops the pawn (client didn't know)
    SV-->>CL: snapshot { state@100 = blocked, last_processed_input = 100 }
    Note over CL: now at tick 105. Compare P100 (moved) vs server (blocked) -> MISMATCH
    Note over CL: rewind owned pawn to server state@100 (blocked)
    Note over CL: re-apply I101..I104 over corrected state -> P101'..P104'
    Note over CL: hand correction delta to smoother -> ease over a few frames (no snap)

12. Remote-entity smoothing: interpolation & dead reckoning

Objects this client does not own/predict (other players, server-driven NPCs) are rendered from received snapshots. Two regimes, plus blending to hide errors.

12.1 Snapshot interpolation (render in the past)

The client holds an interpolation buffer and renders at render_time = now − INTERP_DELAY (default ~100 ms, configurable — roughly one snapshot interval plus jitter margin at a 20–30 Hz send rate). It finds the two buffered snapshots straddling render_time and interpolates:

render_time = client_now() - INTERP_DELAY
(s0, s1)    = bracketing_snapshots(buffer, render_time)
t           = (render_time - s0.time) / (s1.time - s0.time)   // 0..1
draw(lerp(s0.pos, s1.pos, t), slerp(s0.rot, s1.rot, t))

Rendering ~100 ms in the past means there are almost always two snapshots to interpolate between → buttery motion, no extrapolation guesswork in the common case.

12.2 Extrapolation / dead reckoning for late packets — "estimated physics for destination"

When the next snapshot is late (none available past render_time), Lattice does not freeze the object — it extrapolates from the last known state using its velocity (and acceleration, if replicated), i.e. dead reckoning. This directly satisfies the brief's "estimated physics for destination" requirement: the receiver estimates where the object should be now by integrating its last-known motion forward.

// last authoritative state s_last at time T_last with velocity v (and accel a)
dt_extra = clamp(render_time - T_last, 0, EXTRAP_MAX)   // EXTRAP_MAX e.g. 250 ms
est_pos  = s_last.pos + v * dt_extra + 0.5 * a * dt_extra^2
est_rot  = integrate(s_last.rot, s_last.angvel, dt_extra)
draw(est_pos, est_rot)

EXTRAP_MAX caps how far we dead-reckon (beyond it, hold position / fade) so a long outage doesn't fling objects across the map.

12.3 Choosing interpolate vs extrapolate

if a snapshot exists at/after render_time:  INTERPOLATE  (§12.1)   // normal
else if (render_time - T_last) <= EXTRAP_MAX: EXTRAPOLATE (§12.2)  // late packet
else:                                         HOLD last pose       // long gap

12.4 Blending / error correction (no snapping)

When a fresh snapshot arrives after extrapolation, the estimated pose and the new authoritative pose usually differ. Snapping is jarring, so Lattice eases: it keeps a visual-position offset = (rendered − authoritative) at the moment of correction and decays it to zero over a short window (e.g. 100–200 ms) using exponential smoothing or a critically damped spring. The same smoother handles the prediction correction delta from §11.2 step 4, so both owned and remote corrections are visually graceful.


13. Lag compensation

In server/host-authoritative mode, hit/interaction validation must account for the fact that a client acted on what it saw — which is INTERP_DELAY + latency in the past. The authority therefore rewinds.

  • The authority keeps a position history ring per lag-compensated object: for the last ~1 s of ticks it stores each object's transform (and hitbox) at that tick.
  • A client action (e.g. a shot) is tick-stamped with the client's render_time view. The server reconstructs the world as the client saw it:
function validate_hit(shooter, ray, client_view_tick):
    // rewind candidate targets to where they were at the client's view tick
    for target in candidates(shooter):
        hist = position_history[target].sample(client_view_tick)  // interp between stored ticks
        restore_hitbox(target, hist)
    result = raycast(ray)                 // against rewound hitboxes
    for target in candidates: restore_hitbox(target, current)     // un-rewind
    return result

The rewind clamps to the history window and to a max compensation (e.g. 250 ms) to bound how much a high-latency client can "shoot into the past." This is the server-authoritative counterpart to client prediction: prediction makes your actions feel instant; lag compensation makes them land fairly. (Shared-authority objects are validated by their owner; see §17.)


14. State replication

14.1 Object model

  • NetworkObject — the unit of replication and authority. Has a stable network_id (server/host-assigned), a prefab/type id, an owner (peer), and an AuthorityMode (§17). Holds one or more NetworkBehaviours.
  • NetworkBehaviour — a component holding [Networked] properties and RPC endpoints. Engine bindings map this onto MonoBehaviour / UObject / Godot Node (05-engine-integration.md); the core sees a flat schema.
  • [Networked] property — a replicated field. The serializer codegen (§20) generates pack/unpack + dirty-detection for each.

14.2 Dirty tracking

Each NetworkBehaviour holds a dirty bitmask (one bit per [Networked] member). Writing a member through its generated setter (or a per-tick value-diff for plain fields) sets its bit. Snapshot building reads only dirty members; clean members are delta-elided.

14.3 Baseline + delta compression

The authority keeps, per client, the last acknowledged snapshot tick (the baseline). Each new snapshot is encoded as a delta against that baseline: only members whose value differs from the baseline are written, prefixed by a changed-member bitmask. The client ACKs snapshots (snapshots ride the ACK machinery via the channel headers), advancing the baseline. A new or just-relevant client gets a full (baseline-from-zero) snapshot first.

delta(object, baseline_tick, target_tick):
    base = history[object][baseline_tick]   // or zero-baseline for fresh objects
    cur  = history[object][target_tick]
    write changed_mask                      // which members differ base->cur
    for m in members where cur[m] != base[m]:
        write_quantized(m, cur[m])          // §14.6

14.4 Snapshot ring buffer & tick-based property history

The authority retains a ring buffer of recent ticks of each object's state (used for deltas, lag compensation §13, and rollback §11). Default depth ~1 s of ticks (≈60 entries at 60 Hz, configurable). Property history is tick-indexed so any subsystem can ask "what was member X at tick T?".

14.5 Eventual consistency for shared mode

For shared/distributed-authority objects, each owner is the source of truth for its objects and broadcasts their state; non-owners converge toward the latest received value (last-writer-wins with tick tiebreak, §17.2). There is no single global serialization point, so the system is eventually consistent: brief divergence during contention/transfer, resolved deterministically by tick. Snapshots remain delta-compressed per the same machinery.

14.6 Quantization & bit-packing

To shrink snapshots:

Type Encoding Notes
Bool 1 bit
Enum / small int ceil(log2(range)) bits Range declared in schema.
Position float Compressed float: quantize to world-bounded fixed-point (e.g. [-4096, 4096] at 1 mm ⇒ ~23 bits/axis) Per-field bounds & precision in schema.
Velocity Compressed float, coarser bounds Used by dead reckoning (§12.2).
Rotation (quat) Smallest-three: drop largest component (2-bit index + 3× ~10-bit) ⇒ ~32 bits Reconstruct dropped component from unit-length.
Strings/blobs length-prefixed bytes Avoid in hot snapshots; prefer ids.

All packing is little-endian and occurs in the bitstream writer; BE hosts byte-swap scalars before packing (endian safety, §20.3).


15. Interest management / Area of Interest

To scale concurrent users (CCU), the authority sends each client only the objects relevant to it. Relevancy is computed against a spatial partition (uniform grid by default; octree for sparse 3D worlds, configurable).

flowchart TB
    subgraph WORLD["World partitioned into grid cells"]
        direction TB
        R1["( c0,0 )  ( c1,0 )  ( c2,0 )  ( c3,0 )"]
        R2["( c0,1 )  [ c1,1 P ]  [ c2,1 ]  ( c3,1 )"]
        R3["( c0,2 )  [ c1,2 ]  [ c2,2 ]  ( c3,2 )"]
        R4["( c0,3 )  ( c1,3 )  ( c2,3 )  ( c3,3 )"]
    end
    P["Client viewer P in cell ( 1,1 )"]
    AOI["AoI = viewer cell + neighbour ring (bracketed cells)"]
    P --> AOI
    AOI --> SUB["Subscription set: objects in bracketed cells -> replicated"]
    R1 -. "out of AoI -> scoped OUT" .-> NONE["not replicated"]

15.1 Mechanism

  1. Insert each NetworkObject into its grid cell on move (cheap cell re-bucket).
  2. For each client, compute its AoI: the viewer's cell plus a neighbour ring sized to view distance (or a query against the octree).
  3. The subscription set = objects in AoI cells, filtered by per-object/per-type relevancy rules (e.g. always-relevant objects, team scoping, ownership).
  4. Feed the subscription set into the priority accumulator (§6.3); distance modulates priority so near objects update more often.
  5. On entering/leaving AoI, emit spawn/despawn to the client (full baseline on spawn, despawn message on leave).

15.2 Why it scales

Replication cost per client becomes O(objects in AoI) rather than O(all objects); combined with the congestion-controlled budget and priority accumulator, the server degrades gracefully under load (distant objects simply update less often). Scoping also reduces cheating surface (clients never receive state they can't see).


16. RPC system

RPCs are event messages (one-shot actions/notifications), complementary to property replication (continuous state). They are routed by target and carried on the appropriate reliability channel.

16.1 Reliability

  • Reliable RPC → Reliable-Ordered (3) or Reliable-Unordered (2). Use for actions that must happen (door opened, item granted).
  • Unreliable RPC → Unreliable (0) / Unreliable-Sequenced (1). Use for cosmetic/transient events (a footstep sound) where loss is fine.

16.2 Target routing

Target Meaning Typical sender
ToServerOrHost Client → authority (request an action). Predicting client.
ToOwner Authority → the object's owning client. Server/host.
ToAll Authority → every relevant client (AoI-scoped). Server/host.
ToTarget(peer) Authority → a specific client. Server/host.

Clients may only originate ToServerOrHost RPCs on objects (the authority validates and re-broadcasts), preventing clients from forging ToAll. RPC payloads use the same codegen serializer as properties.

16.3 Ordering vs state, and RPC-vs-property guidance

  • An RPC is delivered between the property states of the ticks it brackets; if exact state coupling matters, prefer a [Networked] property (state is self-correcting via snapshots; a dropped unreliable RPC is gone forever).
  • Use a property for anything with a current value (health, position, ammo) — it's delta-compressed, late-joiner-safe, and self-heals on packet loss.
  • Use an RPC for momentary events with no persistent value (play VFX, "you were kicked").
  • Reliable RPCs tied to state should be made idempotent or carry the tick they apply at, so a retransmit can't double-apply.

17. Authority models

Both models are first-class and decided per NetworkObject via AuthorityMode; they coexist in the same world.

17.1 Server/host-authoritative pipeline

client samples input -> sends tick-stamped input (Unreliable-Sequenced)
   -> client PREDICTS owned object locally (§11)
authority receives input -> jitter buffer -> applies in fixed tick (game-sim/)
   -> authority is source of truth -> writes authoritative state
   -> builds AoI-scoped, delta-compressed snapshot (§14) -> sends (Unreliable-Sequenced)
client receives snapshot -> RECONCILES owned object (rewind+resim, §11)
   -> remote objects via interpolation/dead-reckoning (§12)
hit validation uses lag compensation (§13)

Authority = the server (dedicated) or the host (P2P listen-server). The host is just a client that also runs the authoritative role for the session (same code — §18).

17.2 Shared/distributed authority

Each object has an owner holding state authority; the owner simulates and broadcasts it, others converge (eventual consistency, §14.5).

  • Ownership tokens. Authority over an object is represented by a monotonically increasing (authority_owner_id, authority_tick) pair stamped on its state. The pair is the token — it travels with snapshots and resolves conflicts.
  • State-authority transfer protocol. A peer requests authority (RequestAuthority, reliable, to the current owner or the host arbiter); on grant, the new owner increments authority_owner_id's claim with a higher authority_tick. The grant is broadcast; from the transfer tick on, only the new owner's writes are accepted (invariant I5 — exactly one owner per tick). A short overlap accepts the old owner's in-flight state stamped before the transfer tick.
  • Conflict resolution. Concurrent writes are resolved last-writer-wins with tick tiebreak: higher authority_tick wins; equal ticks break by authority_owner_id (deterministic everywhere). This makes convergence order-independent.
sequenceDiagram
    autonumber
    participant A as "Peer A (current owner)"
    participant H as "Host / arbiter"
    participant B as "Peer B (requester)"
    B->>H: RequestAuthority(object 42)
    H->>A: NotifyAuthorityChange(object 42 -> B @ tick T)
    H->>B: GrantAuthority(object 42, authority_tick = T)
    Note over A: stop writing object 42 after tick T
    Note over B: own object 42 from tick T, stamp state with (B, >=T)
    B-->>H: state(object 42, owner=B, authority_tick=T+1)
    H-->>A: relayed state (A accepts: higher authority_tick)

17.3 How both coexist per-object

The replication layer branches on AuthorityMode per object: server-authoritative objects flow through the predict/reconcile/lag-comp pipeline (§11/§13); shared-authority objects flow through the ownership-token/eventual-consistency pipeline (§17.2). A single world can hold both (e.g. server-authoritative characters + shared-authority physics props). Prediction is available to whoever holds authority — a shared-authority owner predicts its own objects exactly as a server-auth client predicts its pawn.


18. The shared simulation library mechanism

The locked decision: the same game/backend simulation code compiles into both the dedicated server and the client (a client can be a P2P host) — "compile-time shared simulation library." This is game-sim/ (repo layout).

18.1 What is shared vs injected

flowchart LR
    GS["game-sim/  (deterministic simulate_tick, networked schema)"]
    CORE["lattice-core (transport, replication, prediction)"]
    subgraph BUILDS["Same code, three roles"]
        SRV["lattice-gameserver : Role = DedicatedServer (authority, no local view)"]
        HOST["client as P2P host : Role = Host (authority + local player)"]
        CLI["client : Role = Client (predicts, no authority)"]
    end
    GS --> CORE
    CORE --> SRV
    CORE --> HOST
    CORE --> CLI
  • Shared at compile time: simulate_tick, the [Networked] schema, RPC handlers, game rules. Byte-for-byte the same object code in every binary (invariant I2).
  • Injected at runtime: a RoleContext describing topology and authority, not logic:
typedef enum { LATTICE_ROLE_DEDICATED, LATTICE_ROLE_HOST, LATTICE_ROLE_CLIENT } lattice_role;

typedef struct {
    lattice_role role;
    int          has_local_view;     // server may be headless (no rendering/prediction view)
    int          is_authority;       // owns server/host-auth objects this session
    int          relays_for_peers;   // host fans out to other clients
    uint64_t     authority_peer_id;  // who is the session authority
} lattice_role_context;

The simulation calls the same functions; what differs is who's authoritative, who predicts, and who relays — all decided by branching on RoleContext inside lattice-core, never inside game-sim/. A P2P host is literally LATTICE_ROLE_HOST: authority + a local player + fan-out to peers, running the identical simulate_tick.

18.2 Determinism considerations

Because the same code runs in multiple places and rollback resimulates history, determinism must be controlled:

  • Float vs fixed-point. Floats are not bit-identical across compilers/architectures. Lattice tolerates this for server-authoritative play (the server is the single source of truth; clients reconcile, so tiny float drift is corrected each snapshot). For shared-authority or deterministic-lockstep-style modes where peers must match exactly, the schema supports a fixed-point numeric mode (q32.32) and the math used in simulate_tick must be the fixed-point library — opt-in per project, configurable.
  • Deterministic ordering. Object iteration during a tick is by ascending network_id (never hash-map iteration order). RPC and input application order is tick-stamped and sorted. No wall-clock or RNG without a seeded, replicated PRNG.
  • No hidden nondeterminism. simulate_tick must not read uncontrolled globals (time-of-day, threads, locale). The core provides deterministic time (tick) and a seeded RNG handle.

18.3 What differs at runtime (summary)

Concern Dedicated server P2P host Client
Runs simulate_tick yes yes yes (prediction)
Holds authority yes yes only over its shared-auth objects
Predicts n/a local player owned objects
Relays to peers (clients connect directly) yes no
Local view/render usually headless yes yes

19. Public C ABI surface

The entire core is reachable only through extern "C" (no C++/STL/exceptions cross the boundary). The binding-API author (05-engine-integration.md) wraps these idiomatically. Illustrative (not exhaustive) surface:

// ---- Opaque handles --------------------------------------------------------
typedef struct lattice_world  lattice_world;   // a replicated simulation instance
typedef struct lattice_conn   lattice_conn;    // a connection/peer
typedef uint64_t lattice_netid;                // NetworkObject id

// ---- Lifecycle -------------------------------------------------------------
lattice_world* lattice_world_create(const lattice_world_config* cfg,
                                    const lattice_role_context*  role);
void           lattice_world_destroy(lattice_world*);

// ---- Connection management -------------------------------------------------
lattice_conn*  lattice_connect(lattice_world*, const char* endpoint,
                               const uint8_t* connect_token, size_t token_len);
void           lattice_disconnect(lattice_conn*);

// ---- Main pump (called by the host loop; never blocks) ---------------------
void           lattice_world_recv(lattice_world*);              // drain sockets
void           lattice_world_tick(lattice_world*, uint32_t tick); // step sim + reconcile
void           lattice_world_send(lattice_world*);              // build+pace snapshots/acks

// ---- Replication registration (driven by codegen, §20) --------------------
void           lattice_register_type(lattice_world*, const lattice_type_desc*);
lattice_netid  lattice_spawn(lattice_world*, uint32_t type_id, lattice_authority_mode);
void           lattice_despawn(lattice_world*, lattice_netid);

// ---- Property access (bitstream-backed; codegen emits typed wrappers) ------
void           lattice_set_prop(lattice_world*, lattice_netid, uint16_t prop_id,
                                const void* data, size_t len);   // marks dirty
size_t         lattice_get_prop(lattice_world*, lattice_netid, uint16_t prop_id,
                                void* out, size_t cap);

// ---- RPC -------------------------------------------------------------------
void           lattice_rpc(lattice_world*, lattice_netid, uint16_t rpc_id,
                           lattice_rpc_target, lattice_channel,
                           const void* args, size_t len);

// ---- Callbacks (engine implements; no exceptions may propagate out) --------
typedef struct {
    void (*on_connected)   (void* ctx, lattice_conn*);
    void (*on_disconnected)(void* ctx, lattice_conn*, int reason);
    void (*on_spawn)       (void* ctx, lattice_netid, uint32_t type_id);
    void (*on_despawn)     (void* ctx, lattice_netid);
    void (*on_rpc)         (void* ctx, lattice_netid, uint16_t rpc_id,
                            const void* args, size_t len);
    void (*on_authority_changed)(void* ctx, lattice_netid, uint64_t new_owner);
} lattice_callbacks;
void           lattice_set_callbacks(lattice_world*, lattice_callbacks, void* ctx);

ABI rules for the binding author: - Treat handles as opaque; never dereference. - The pump order is fixed: recv → tick → send per host frame; call tick once per fixed sim step (the binding owns the accumulator, §10.1, or delegates it to a core helper). - Callbacks fire inside the pump on the calling thread; do no blocking work in them. - All buffers are caller-owned; the core copies what it retains. - Numeric layout across the boundary is little-endian POD; no STL types ever appear.


20. Serialization codegen

[Networked] members and RPC signatures are compiled into bitstream packers so that hot-path (de)serialization has no reflection cost.

20.1 Approach

A schema (IDL or attribute/macro-driven, per engine) declares each networked type, its members, their wire types, bounds, and precision. A codegen step emits, per type:

  • pack(writer, obj, dirty_mask) / unpack(reader, obj, changed_mask) — bit-level, delta-aware.
  • A dirty_mask enum (one bit per member) for §14.2.
  • Per-member quantizers from declared bounds/precision (§14.6).
  • A lattice_type_desc registered via lattice_register_type (§19).

Macro-driven C++ (LATTICE_NETWORKED(Type, (member, type, bounds)...)) is the default for native code; the IDL path generates equivalent code plus per-engine binding stubs ([Networked] C# attributes for Unity, UPROPERTY(Replicated)-style for Unreal, etc. — 05-engine-integration.md).

20.2 Schema/version negotiation

Every type schema hashes to a schema_id; the connection negotiates a top-level schema hash during/after handshake. A mismatch is a hard, early connection rejection (no silent corruption). This pairs with protocol_id (§3.1).

20.3 Endian safety

The bitstream writer/reader define the canonical wire byte order as little-endian for all byte-aligned scalars; bit-packed fields are written MSB-first within a byte consistently on both ends. Big-endian hosts byte-swap scalars at the (un)pack boundary only — the wire format is identical regardless of host endianness. Quantized fields are integers on the wire, so they're endian-defined by the same rule.

20.4 Example (illustrative)

// Declaration (macro-driven)
LATTICE_NETWORKED(PlayerState,
    (position, vec3,   BOUNDS(-4096, 4096, /*precision_mm*/ 1)),
    (rotation, quat,   SMALLEST_THREE(/*bits_per_comp*/ 10)),
    (velocity, vec3,   BOUNDS(-256, 256, /*precision*/ 0.01f)),
    (health,   uint16, BITS(10)),                  // 0..1023
    (flags,    uint8,  BITS(4)))

// Codegen emits (sketch):
void pack(BitWriter& w, const PlayerState& s, uint32_t dirty) {
    w.write_bits(dirty, PlayerState_DIRTY_BITS);
    if (dirty & DIRTY_position) write_quantized_vec3(w, s.position, /*bounds*/...);
    if (dirty & DIRTY_rotation) write_smallest_three(w, s.rotation, 10);
    if (dirty & DIRTY_velocity) write_quantized_vec3(w, s.velocity, /*bounds*/...);
    if (dirty & DIRTY_health)   w.write_bits(s.health, 10);
    if (dirty & DIRTY_flags)    w.write_bits(s.flags,  4);
}

21. Networked type system & custom types

The brief and the user require that lattice-core replicate custom classes/structs, with an explicit API for declaring custom variables of a defined primitive set. This section is the contract for both the declarative codegen path (§20) and an explicit function-based path, and for the two kinds of custom user type: value structs (INetworkStruct) and reference types (NetworkObject + NetworkBehaviour, §14).

The explicitly-requested primitive set is: bool, int (32-bit), long (64-bit), float, double, Vector2, Vector3, Vector4. Quaternion, string/fixed-string, byte[], and enums are also supported. All of these are first-class both as standalone [Networked] members and as fields of a custom value struct.

21.1 Supported primitive networked types

The "wire size" column is the steady-state delta size when the field is dirty; clean fields cost 0 bits (delta-elided, §14.3). All encodings are little-endian / MSB-first-within-byte (§20.3).

Type Wire encoding Default quantization Approx wire size (dirty)
bool 1 bit, packed into the changed-member bitfield word none 1 bit
enum ranged int over the declared member count ceil(log2(count)) bits 2–8 bits typical
byte / int8 fixed 8 bits or ranged optional ranged 8 bits
int16 fixed 16 bits or ranged optional ranged 16 bits
int (int32) varint (1–5 B, zig-zag for signed) by default; or fixed 32; or ranged (declared [min,max]ceil(log2(max-min+1)) bits) ranged when bounds known (e.g. 0..1023 ⇒ 10 bits) 1–5 B, or N bits if ranged
long (int64) varint (1–9 B, zig-zag for signed) by default; or fixed 64; or ranged ranged when bounds known 1–9 B, or N bits if ranged
float full IEEE-754 32-bit; or compressed float quantized to a configurable [min,max] range + precision ⇒ integer of ceil(log2((max-min)/precision)) bits compressed when bounds/precision declared 32 bits full; ~10–24 bits compressed
double full IEEE-754 64-bit; or quantized to configurable range+precision (same scheme as float, wider integer) full unless quantized 64 bits full; configurable if quantized
Vector2 2× compressed float (per-component range/precision) per-component compressed 2 × (10–24) bits
Vector3 3× compressed float (per-component range/precision) per-component compressed 3 × (10–24) bits
Vector4 4× compressed float (per-component range/precision) per-component compressed 4 × (10–24) bits
Quaternion smallest-three (2-bit largest-index + 3× N-bit components, reconstruct dropped component from unit length) N=10 bits/component default ~32 bits
string length-prefix (varint) + UTF-8 bytes none (avoid in hot snapshots; prefer ids) varies
fixed-string<N> N bytes, optionally length-prefixed within N none ≤ N bytes
byte[] / blob length-prefix (varint) + raw bytes none varies

Notes: - Ranged ints are the user's requested "an int known to be 0..1023 sent in 10 bits" — declared via bounds in the declarative path, or WriteRangedInt(value, min, max) in the manual path (§21.4). - Compressed floats quantize to a configurable range+precision; vectors apply this per component, so each axis can have independent bounds (e.g. a Y axis with a tighter range). - bool is special: it is not written as a standalone bit in the body — it is folded into the per-behaviour changed-member bitfield, so a dirty bool costs only its presence bit.

21.2 Custom value structs (INetworkStruct)

A custom value struct is a user-defined, POD-like aggregate composed only of the primitives in §21.1 (and other value structs, nested). It is the analogue of Fusion's INetworkStruct.

  • Value semantics. Structs are copied by value into and out of a NetworkObject's state. They have no network id, no owner, and no independent lifetime — they exist only as the value of a [Networked] member (or as an RPC argument).
  • Inline, bit-packed serialization. A struct serializes inline as part of its enclosing object's snapshot, using the same bitstream writer; nested structs recurse. Delta and quantization apply per leaf primitive exactly as for top-level members.
  • Dirty tracking — whole-struct by default, per-member optional. A [Networked] struct member occupies one bit in the enclosing behaviour's changed-member bitfield: writing any field of the struct marks the whole struct dirty and re-sends all its members on the next delta. This is cheap and simple and is the default. For large structs, a struct may opt into per-member delta (LATTICE_NETSTRUCT_DELTA): it then carries its own internal changed-member bitfield so only changed leaf fields are sent — at the cost of that bitfield's bits. Guidance: whole-struct for small/cohesive structs (a transform); per-member for large structs with independently-changing fields (an inventory record).
  • Alignment / endianness across the C ABI. On the wire a struct is bit-packed with no padding (alignment is a host-memory concern, not a wire concern). In host memory across the C ABI a struct is a fixed-layout POD; the binding marshals it as a blob via lattice_set_prop/lattice_get_prop (§19) using the codegen-emitted packer, so host alignment never reaches the wire. Endianness is normalized at the (un)pack boundary (§20.3).

21.3 Custom reference types (NetworkObject + NetworkBehaviour) vs value structs

A custom class / reference type maps to a NetworkObject carrying one or more NetworkBehaviours with [Networked] members (§14.1). Unlike a value struct it has:

  • Reference semantics & identity — a stable network_id; other objects reference it by id, not by value (an id replicates as a long-style handle, resolving to null if the target is out of AoI / despawned).
  • Ownership & authority — an owner and an AuthorityMode (§17); it can be predicted, transfer authority, and be the subject of RPCs.
  • Independent lifetime — spawned/despawned, enters/leaves AoI (§15).
Use a value struct when… Use a reference type (NetworkObject) when…
The data is a value of something else (a transform, a stat block, an RPC argument). The thing is an entity with identity, ownership, and lifetime (a player, a projectile, a door).
No independent ownership/authority/lifetime is needed. It must be predicted, own authority, transfer authority, or receive RPCs.
You want it copied and packed inline with its parent. You want it referenced by id and managed by spawn/despawn + AoI.
Many identical small records change together. It needs per-object relevancy/priority.

21.4 Two authoring paths

Both paths produce the identical wire format for the same fields; they differ only in ergonomics. A type may even mix them (declarative members + a hand-written serializer for one complex member).

(a) Declarative / attribute-driven — the default (§20). Members are declared with [Networked] (managed) / LATTICE_NETWORKED(...) (C++); codegen emits the bit-packed (de)serializer, the dirty mask, and the lattice_type_desc. Best for the common case; least code; codegen guarantees both sides agree.

(b) Explicit function-based API — a manual serializer the user writes by hand, for full control (conditional fields, custom compression, third-party types). The user implements one interface:

  • C++ (value struct or behaviour): INetworkSerializable with void Serialize(lattice::BitWriter& w) const; and void Deserialize(lattice::BitReader& r);
  • Managed (Unity/C# and other bindings): INetworkSerializable with void Serialize(ref BitBuffer buf); (a single buffer that reads in read-mode and writes in write-mode), mirrored by the binding over the same C ABI helpers (05-engine-integration.md).

The typed helper surface (the "functions to allow for custom variables"). BitWriter / BitReader (C++) and the managed BitBuffer expose the same named methods:

Write method Read method Wire behavior
WriteBool(bool) ReadBool() -> bool 1 bit
WriteEnum<E>(E) ReadEnum<E>() -> E ranged over enum member count
WriteByte(uint8) ReadByte() -> uint8 8 bits
WriteInt(int32) ReadInt() -> int32 varint (zig-zag), default for int
WriteRangedInt(int32 v, int32 min, int32 max) ReadRangedInt(int32 min, int32 max) -> int32 ceil(log2(max-min+1)) bits (e.g. 0..1023 ⇒ 10 bits)
WriteLong(int64) ReadLong() -> int64 varint (zig-zag), default for long
WriteRangedLong(int64 v, int64 min, int64 max) ReadRangedLong(int64 min, int64 max) -> int64 ranged bits
WriteFloat(float) ReadFloat() -> float full 32-bit IEEE-754
WriteCompressedFloat(float v, float min, float max, float precision) ReadCompressedFloat(float min, float max, float precision) -> float quantized integer of ceil(log2((max-min)/precision)) bits
WriteDouble(double) ReadDouble() -> double full 64-bit IEEE-754
WriteVector2(Vector2 v, FloatQuant q) ReadVector2(FloatQuant q) -> Vector2 2× compressed float
WriteVector3(Vector3 v, FloatQuant q) ReadVector3(FloatQuant q) -> Vector3 3× compressed float
WriteVector4(Vector4 v, FloatQuant q) ReadVector4(FloatQuant q) -> Vector4 4× compressed float
WriteQuaternion(Quaternion v, int bitsPerComp = 10) ReadQuaternion(int bitsPerComp = 10) -> Quaternion smallest-three
WriteString(string) ReadString() -> string varint length + UTF-8
WriteBytes(const uint8* p, size_t n) ReadBytes(uint8* out, size_t cap) -> size_t varint length + raw bytes
WriteBits(uint32 v, int bits) ReadBits(int bits) -> uint32 raw N-bit escape hatch

FloatQuant is a small { min, max, precision } descriptor (per component allowed for vectors). All methods advance the shared bit cursor; the writer never pads between fields.

C++ example — same struct via both paths:

// ----- Path (a): declarative ------------------------------------------------
struct ProjectileState {                       // a custom INetworkStruct (value type)
    LATTICE_NETSTRUCT(ProjectileState,
        (origin,   vec3,   BOUNDS(-4096, 4096, /*precision*/ 0.01f)),
        (velocity, vec3,   BOUNDS(-512,  512,  /*precision*/ 0.01f)),
        (charge,   int32,  RANGED(0, 1023)),          // 0..1023 -> 10 bits
        (piercing, bool),                              // 1 bit (folded into bitfield)
        (kind,     ProjKind /*enum*/))                 // ranged over enum count
};

// ----- Path (b): explicit function-based, identical wire format -------------
struct ProjectileStateManual : lattice::INetworkSerializable {
    Vector3   origin, velocity;
    int32_t   charge;
    bool      piercing;
    ProjKind  kind;

    void Serialize(lattice::BitWriter& w) const override {
        w.WriteVector3(origin,   {.min=-4096, .max=4096, .precision=0.01f});
        w.WriteVector3(velocity, {.min=-512,  .max=512,  .precision=0.01f});
        w.WriteRangedInt(charge, 0, 1023);             // 10 bits
        w.WriteBool(piercing);
        w.WriteEnum(kind);
    }
    void Deserialize(lattice::BitReader& r) override {
        origin   = r.ReadVector3({.min=-4096, .max=4096, .precision=0.01f});
        velocity = r.ReadVector3({.min=-512,  .max=512,  .precision=0.01f});
        charge   = r.ReadRangedInt(0, 1023);
        piercing = r.ReadBool();
        kind     = r.ReadEnum<ProjKind>();
    }
};

21.5 (De)serialization flow

flowchart TB
    START["Build delta for object O at target_tick (§14.3)"]
    MASK["Write changed-member bitfield (bool members folded in here)"]
    LOOP["For each dirty member m"]
    DISP{"Member kind?"}
    PRIM["Primitive (§21.1): writer.Write* with quantizer"]
    VS["Value struct (INetworkStruct): recurse -> Serialize(BitWriter)"]
    REF["Reference member: write network_id as ranged/var long"]
    MAN["Custom type with manual serializer: call its Serialize(BitWriter&)"]
    OUT["Append bit-packed bytes to coalesced datagram payload (§3.2)"]

    START --> MASK --> LOOP --> DISP
    DISP -->|"bool/int/long/float/double/VectorN/Quaternion/string/bytes/enum"| PRIM --> OUT
    DISP -->|"value struct"| VS --> OUT
    DISP -->|"NetworkObject reference"| REF --> OUT
    DISP -->|"INetworkSerializable"| MAN --> OUT
    OUT --> LOOP

The read path is the mirror image: read the changed-member bitfield, then for each set bit dispatch to the matching Read* / Deserialize(BitReader&). Because the bitfield drives both sides and the schema is content-hash-gated (§21.6, registration below), reader and writer always agree on field presence, type, and quantization.

21.6 Registration, schema id & versioning

Custom types register exactly like NetworkObject prefab types (§14.1, §19):

  • Each custom type (value struct or reference type) is registered via lattice_register_type(world, &type_desc), receiving a stable type_id (assigned by the project's type table; deterministic across builds so server and client agree).
  • Each registered type contributes a content hash computed over its ordered field list (name, wire-type, bounds, precision, bits). The world's overall schema hash is the combined hash of all registered types, negotiated at connect time (§20.2, §3.1).
  • Mismatch handling: if a connecting client's schema hash differs from the authority's, the connection is rejected early during/after the handshake with a SchemaMismatch disconnect reason — never a silent partial decode. This makes a type evolution (adding/removing/retyping a [Networked] member, or changing a quantization range/bits) a breaking change that requires both sides to ship the same schema, the same rule that governs prefab/type registration.
  • Schema evolution: versioned releases bump protocol_id and/or the schema hash; the control plane (lattice-director) matchmakes compatible builds together. Within a build, field order in the declaration is the wire order, so reordering members is a breaking change — additive evolution appends members at the end and ships as a new version.

21.7 Custom types in the snapshot / delta / interpolation / rollback pipeline

Custom types are first-class throughout the existing pipeline — they are not a side channel:

  • Snapshot / delta (§14). Custom struct members participate in baseline+delta: whole-struct-dirty members re-send all leaves on change; per-member-delta structs send only changed leaves. Reference members replicate as ids and follow AoI spawn/despawn.
  • Quantization. Compressed-float vectors are stored quantized in the snapshot ring buffer, so history and deltas operate on the quantized values (deterministic, §18.2).
  • Interpolation / extrapolation (§12). A VectorN member is interpolated per component between bracketing snapshots; when a packet is late it is extrapolated per component via the same dead-reckoning rule ("estimated physics for destination") if the schema marks it as a positional/velocity field. Quaternion members use slerp (interp) and angular-velocity integration (extrap). Scalar numerics interpolate linearly; bools/enums/ids snap (no meaningful blend). A value struct interpolates field-by-field using each field's rule.
  • Prediction / rollback (§11). A custom struct on a predicted (locally-owned) object lives in the predicted-state ring buffer and is compared in approx_equal; on misprediction it is overwritten with the authoritative struct value and re-simulated by the same game-sim/ code, then eased by the smoother (§12.4). approx_equal for floats uses the field's quantization step as its tolerance so quantization noise never trips a needless rollback.

22. Defaults & tuning summary

All values are defaults exposed via lattice_world_config across the C ABI (configurable).

Parameter Default Section
Simulation tick rate 60 Hz §10.1
Snapshot send rate 20–30 Hz §14
Interpolation delay (INTERP_DELAY) ~100 ms §12.1
Extrapolation cap (EXTRAP_MAX) ~250 ms §12.2
Safe MTU payload ~1200 B §3.2, §7
Reliability channels Unreliable / Unreliable-Seq / Reliable-Unordered / Reliable-Ordered §5
ACK bitfield width 32 (covers 33 packets/ACK) §3.4
RTO bounds 50 ms – 1 s §5.1
Heartbeat / timeout 1 s / 10 s §4.3
Replay window 1024 packets §8.3
Key rotation ~15 min / ~1 GiB §8.4
Crypto X25519 + ChaCha20-Poly1305; Ed25519 tokens §8
Prediction window / history ring ~64 ticks / ~1 s §11.1, §14.4
Lag-comp max rewind ~250 ms §13
Ports relay UDP 7777, game UDP 27015+, STUN 3478 §9

23. Cross-Platform Support (Linux + Windows)

The core and the attachable shared game-sim module (§18) build and run identically on Linux and Windows. The reference library (reference/) is the proof: the same sources compile to liblattice.so on Linux and lattice.dll on Windows, and the 130-check conformance harness passes byte-for-byte on both — including the deterministic PRNG sequences (§5), which are seed-driven and never touch platform RNG, so loss/reorder/dup decisions and RTT estimates are identical across OSes.

23.1 Platform abstraction layer

All OS-specific concerns are funnelled through reference/src/platform/, behind #ifdef _WIN32 / #else, so the rest of the library is portable C++20 with no scattered platform code:

  • platform.h — platform detection (LATTICE_OS_WINDOWS / _LINUX / _MACOS); the canonical little-endian load/store helpers (load_u32_le / store_u32_le, etc.) that encode the wire byte-order rule (§20.3) using only shifts and masks (no host-endian assumption — correct on a big-endian host unchanged); monotonic time (monotonic_nanos, a std::chrono::steady_clock wrapper — the one owner of the "monotonic, never wall-clock" rule, so no gettimeofday / QueryPerformanceCounter leaks into core); threading aliases (platform::Mutex / LockGuard, currently std::mutex / std::lock_guard); and the NetworkSubsystem RAII type.
  • socket.h / socket.cpp — a minimal non-blocking UDP socket abstraction (UdpSocket, Endpoint) unifying BSD sockets (<sys/socket.h>, int fd, errno) and Winsock2 (<winsock2.h> / <ws2tcpip.h>, SOCKET, WSAGetLastError, closesocket). It is the seam the real localhost UDP transport (Task C12, §3) drops into; the abstraction exists now so C12 builds on it rather than introducing the platform #ifdef late. The current transport is in-process loopback and uses no real sockets, but the wrapper compiles on both toolchains today.
  • NetworkSubsystem — a no-op on POSIX; on Windows it pairs WSAStartup / WSACleanup (OS-refcounted). The loopback transport already holds one for its lifetime, so the Windows init contract is identical between loopback and the future UDP transport.

A required Windows-portability detail lives in platform.h: it defines _WIN32_WINNT = 0x0600 (Vista) before any system/STL include, because MinGW's <chrono>/<mutex> indirectly pull in <windows.h>, which otherwise defaults to an older target under which the modern Winsock APIs (inet_pton, getaddrinfo) are not declared.

23.2 Windows DLL / export model

The export macro is in the public header (include/lattice/lattice.h): LATTICE_API expands to __declspec(dllexport) when building the library on _WIN32 (LATTICE_BUILD defined), __declspec(dllimport) for consumers, and __attribute__((visibility("default"))) on POSIX. Combined with -fvisibility=hidden (non-MSVC; gated off MSVC in CMake), the exported surface is exactly the extern "C" lattice_* ABI (§19) on both OSes — verified with nm -D (Linux) showing only lattice_* exports, and objdump -p (Windows) showing the DLL depends only on core Windows DLLs (KERNEL32, WS2_32, msvcrt).

23.3 Build matrix

Target Toolchain Recipe
Linux x86_64 g++ 11 / C++20 build.sh (quick, no CMake) or cmake
Windows x86_64 MinGW-w64 cross (x86_64-w64-mingw32-g++-posix) build-windows.sh or cmake -DCMAKE_TOOLCHAIN_FILE=cmake/mingw-w64.cmake
Windows x86_64 MSVC CMake (export macro + GNU-flag gating make it MSVC-ready)

CMakeLists.txt gates GNU/Clang-only flags (-Wall -Wextra, -fvisibility=hidden) to non-MSVC, links ws2_32 on Windows, and uses -static -static-libgcc -static-libstdc++ under MinGW so the .dll/.exe carry no runtime DLL dependency.

MinGW thread-model note: the cross-build deliberately selects the -posix compiler variant. The default Ubuntu MinGW uses the win32 thread model, whose libstdc++ does not provide std::mutex / std::thread / std::lock_guard (the loopback transport uses them); the -posix variant provides the full C++ threading library, and the static-link flags fold winpthread into the binaries.

23.4 Verified cross-build (Linux host → Windows, run under wine)

$ ./build-windows.sh            # x86_64-w64-mingw32-g++-posix (GCC) 10-posix
  -> build-win/lattice.dll  build-win/lattice_conformance.exe   (clean, exit 0)
$ wine build-win/lattice_conformance.exe
  === SUMMARY ===
  PASS: 130   FAIL: 0   TOTAL: 130
  RESULT: ALL CHECKS PASSED                                     (exit 0)

The same result is reproduced via the CMake MinGW toolchain file. This is the cross-platform foundation established before crypto (Task C5, OpenSSL) and real UDP (Task C12) land — those features extend the platform shim (NetworkSubsystem, UdpSocket) rather than re-introducing platform #ifdefs elsewhere.


Assumptions other authors must honor

  • Binding author (05-engine-integration.md): the public surface is extern "C" only — opaque handles, POD structs, little-endian layout, no STL. The pump order is fixed (recv → tick → send); tick is called once per fixed sim step. [Networked] members and RPCs are codegen-backed (§20) and registered via lattice_register_type. The RoleContext (§18.1) is the only runtime difference between server/host/client.
  • Roadmap author (06-implementation-roadmap.md): the natural build order mirrors the layer stack — Socket+Crypto+Connection (handshake) → Reliability/Channels+Fragmentation → Serialization/codegen → Replication (snapshots, AoI, delta) → Simulation+Prediction/Rollback → Lag compensation → NAT/Relay → shared-authority transfer → QUIC/WebTransport backend. Determinism tooling (fixed-point option, ordered iteration) is a prerequisite for shared-authority and for rollback correctness.
  • Auth author (03-auth-service.md): connect tokens and relay tokens are Ed25519-signed, short-lived, audience-scoped, and single-use (token_id tracked in Redis); the handshake (§4.2) and relay Allocate (§9.3) verify them before allocating state.