Skip to content

03 — Authentication Service (lattice-auth)

Part of the Lattice networking suite design set. Read 00-overview.md and 01-high-level-design.md first for vocabulary and the data-plane/control-plane split. This document specifies lattice-auth, the control-plane service that proves who a player is and mints the tokens every other component trusts. The on-the-wire token validation at connection time is the bridge to 02-netcode-lld.md; identity is consumed by 04-social-library.md; delivery sequencing lives in 06-implementation-roadmap.md.


1. Goals & Responsibilities

lattice-auth is the single logical source of identity for the suite. It is implemented as N ≥ 2 stateless, load-balanced nodes that behave as one auth source (see §5 — this is the core requirement driving the whole design).

In scope

  • Authentication — verify a caller's claim to an identity (guest, email+password, or an external platform ticket such as Steam/Epic/Google/Apple).
  • Identity & account management — one durable Account per human; multiple linked Identity/Credential records (account linking and upgrade, e.g. guest → email → Steam-linked).
  • Session & token issuance — mint short-lived, stateless, Ed25519-signed access tokens (verifiable offline by game servers) and long-lived, rotating, server-tracked refresh tokens.
  • Public-key distribution — publish the current signing public keys as a JWKS-style key set at /.well-known/jwks.json so any verifier (other auth nodes, lattice-director, lattice-gameserver) can validate tokens without contacting auth.
  • Security & anti-abuse — rate limiting, brute-force / credential-stuffing defence, account lockout, audit logging, key rotation.
  • Data-subject operations — account deletion / export (GDPR / CCPA).

Explicitly NOT in scope

Concern Owner Note
Matchmaking, session directory, fleet orchestration, server allocation lattice-director (8444) Director consumes an access token to authorize a player and reads sub/roles/region claims.
Friends, presence, parties, invites, messaging lattice-social (9443) Social consumes auth identity (sub) as the stable user key; it never mints tokens.
Gameplay authority, replication, prediction lattice-gameserver / lattice-core Game server only validates the access token at handshake (§7).
Payments / entitlements / inventory Out of suite A title's commerce backend can read sub; not provided here.

Design principle: auth is on the control plane and is off the hot path. A player authenticates rarely (login, refresh) and then carries a self-describing token. The data plane (game servers) must never make a synchronous call to auth to admit a player; it validates the token's signature locally. This keeps connection setup fast and keeps auth availability decoupled from match availability.


2. Identity Model

Lattice separates the durable Account (the person / save anchor) from the Identities that can authenticate into it. This is what makes account linking and the guest→full upgrade clean.

Identity kinds

Kind provider How it authenticates Notes
Guest / anonymous guest Device-bound secret issued at first launch; no PII Upgradable; rate-limited & abuse-scored (§8).
Email + password email Email + Argon2id-hashed password Email verification, password reset, lockout.
Steam steam Steam encrypted app ticket Verified via Steamworks Web API (AuthenticateUserTicket).
Epic / EOS epic EOS ID token (OIDC) Verified via Epic JWKS.
Google google Google OIDC ID token Verified via Google JWKS, aud check.
Apple apple Sign in with Apple identity token Verified via Apple JWKS, nonce check.
Console (future) psn / xbl / nintendo Platform attestation token Same exchange pattern as §6.3.

A single Account may own many Identity rows (one per provider), enabling cross-platform play under one save. Account linking attaches a new verified identity to an existing account; the inverse unlink is supported with a guard that an account must always retain at least one usable credential.

2.1 Data Model

classDiagram
    class Account {
        +uuid id "PK — the stable sub claim"
        +string display_name
        +enum status "active | locked | banned | deleted"
        +string[] roles "player, moderator, admin, server"
        +string home_region "eu, na, ap..."
        +bool is_guest "true until upgraded"
        +timestamptz created_at
        +timestamptz updated_at
        +timestamptz deleted_at "soft delete for GDPR grace"
    }

    class Identity {
        +uuid id "PK"
        +uuid account_id "FK -> Account"
        +enum provider "guest|email|steam|epic|google|apple"
        +string provider_user_id "external subject id"
        +bool verified
        +timestamptz linked_at
        +timestamptz last_login_at
    }

    class Credential {
        +uuid id "PK"
        +uuid identity_id "FK -> Identity (email/guest only)"
        +enum type "password | guest_secret"
        +string secret_hash "Argon2id (never plaintext)"
        +int failed_attempts
        +timestamptz locked_until
        +timestamptz rotated_at
    }

    class Session {
        +uuid id "PK — the sid claim"
        +uuid account_id "FK -> Account"
        +uuid device_id "FK -> Device"
        +enum auth_method "guest|email|steam|epic|google|apple"
        +string ip_cidr
        +string region
        +timestamptz created_at
        +timestamptz expires_at
        +timestamptz revoked_at
    }

    class RefreshToken {
        +uuid id "PK — opaque token id (jti)"
        +uuid session_id "FK -> Session"
        +string token_hash "SHA-256 of the opaque secret"
        +uuid prev_token_id "rotation chain (reuse detection)"
        +bool used
        +timestamptz issued_at
        +timestamptz expires_at
    }

    class Device {
        +uuid id "PK"
        +uuid account_id "FK -> Account (nullable for guest pre-link)"
        +string platform "ios|android|win|mac|linux|console|web"
        +string fingerprint_hash
        +string push_handle "nullable"
        +timestamptz first_seen_at
        +timestamptz last_seen_at
    }

    class AuditLog {
        +uuid id "PK"
        +uuid account_id "FK (nullable)"
        +enum event "login|logout|refresh|link|unlink|lockout|delete|key_rotation"
        +string ip
        +string node_id "which auth node served it"
        +jsonb detail
        +timestamptz at
    }

    Account "1" o-- "many" Identity : owns
    Identity "1" o-- "0..1" Credential : secured by
    Account "1" o-- "many" Session : has
    Account "1" o-- "many" Device : registers
    Session "1" o-- "many" RefreshToken : rotates
    Session "1" --> "1" Device : bound to
    Account "1" o-- "many" AuditLog : records

Storage placement. Account, Identity, Credential, Device, AuditLog live in PostgreSQL (durable, transactional, the source of truth). Session and RefreshToken live in Redis (fast, TTL-driven) with an optional async write-through to Postgres for long-term audit/forensics. Rate-limit counters and lockout state are Redis-only. See §9 for HA.


3. Token Design

Two token types with deliberately different lifetimes and trust models.

3.1 Access token — short-lived, stateless, offline-verifiable

  • Format: PASETO v4.public (default) — an Ed25519-signed, versioned, hard-to-misuse token. JWT with EdDSA (Ed25519) is the interoperable alternative when third-party tooling demands JWT; both carry the same claim set below. (Crypto baseline from the shared brief: Ed25519 for signing; X25519 + ChaCha20-Poly1305 is the separate transport crypto in lattice-core.)
  • Lifetime: 5–15 minutes (default 10 min). Short on purpose: revocation is mostly handled by expiry, so verifiers can stay fully offline.
  • Verification: signature checked against the public key for the token's kid, fetched once from /.well-known/jwks.json and cached. No round-trip to auth is required to validate — this is what lets game servers admit players at line rate (§7).
  • Stateless: the access token is never stored server-side. Any node can issue it; any verifier can validate it.

Access-token claims

Claim Type Example Meaning / who reads it
sub uuid string "7c9e...". Account.id Stable user identity. Read by game server, director, social.
sid uuid string Session.id Session this token belongs to; key for optional revocation check & refresh linkage.
platform string "steam" Auth method / platform the player came in on.
roles string[] ["player"] Authorization roles (player, moderator, admin, server).
region string "eu" Home/affinity region; director uses it for placement, game server for logging.
iss string "lattice-auth" Issuer; logical issuer (not per-node) — all nodes share it.
aud string "lattice" Intended audience; verifiers reject mismatches.
iat unix ts 1718900000 Issued-at.
exp unix ts 1718900600 Expiry (iat + ~10 min). Primary revocation mechanism.
jti uuid string per-token Unique token id; enables targeted deny-listing if ever needed.
kid string (header) "2026-06-key-a" Signing key id; selects the public key from JWKS (§5.3).

Cross-doc contract (load-bearing). lattice-gameserver and the 02-netcode-lld.md handshake rely on exactly these claims: they verify the Ed25519 signature for kid, then check exp/iat, iss/aud, and read sub, sid, roles, region. The handshake auth field carries the raw PASETO/JWT string. Adding or renaming a claim is a coordinated change across auth, director, social, and the netcode handshake.

3.2 Refresh token — long-lived, rotating, server-tracked

  • Format: an opaque high-entropy secret (≥ 256 bits, e.g. base64url(32 bytes)). It is not a JWT — it carries no claims and is meaningless without the server record.
  • Lifetime: days to weeks (default 30 days, sliding), bounded by the parent Session.expires_at.
  • Storage: server-side in Redis (hash of the secret, never the secret itself), keyed by RefreshToken.id, with the Session/Device linkage. This is the stateful half of the system and is the lever for true logout/revocation.
  • Rotation: single-use, rotating (RFC 6819 §5.2.2.3). Every successful refresh issues a new refresh token and marks the old one used. Presenting an already-used token is treated as token theft: the entire Session (and its token chain) is revoked and an AuditLog refresh/lockout event is written. This makes refresh-token replay detectable.

4. Why two token types?

Property Access token Refresh token
Verifiable offline by game servers ✅ (signature) ❌ (server lookup)
Stored server-side ❌ stateless ✅ Redis
Lifetime minutes days/weeks
Carries claims ❌ opaque
Revocable instantly only via short expiry / optional deny-list ✅ delete the record
Used at game-server handshake ❌ never leaves the client↔auth path

The access token optimizes the hot path (cheap, local, fast). The refresh token optimizes control (revocable, theft-detecting). Together: players stay logged in for weeks while game servers never call auth.


5. The Dual-Node "Act As One" Design (core requirement)

User requirement: "an auth server of sorts before the main game servers, all with balanced capabilities, so 2 auth servers but both act as one auth source."

Lattice realizes this as N ≥ 2 stateless auth nodes behind a load balancer, all sharing the same backing stores and the same signing key. To a client, the director, or a game server there is one auth source; internally, requests fan out across interchangeable nodes.

5.1 The three properties that make N nodes "one source"

  1. Stateless nodes. A node holds no per-request session in local memory. Everything durable goes to the shared stores. Any node can serve any request for any user; nodes are cattle, not pets. Adding/removing a node changes throughput, never correctness.
  2. Shared state stores. All nodes read/write the same PostgreSQL (accounts/identities — the source of truth) and the same Redis (sessions, refresh tokens, rate-limit counters). So a refresh issued on Node A is immediately visible to Node B; a lockout counter incremented on Node B is enforced by Node A.
  3. Shared signing key. All nodes sign access tokens with the same Ed25519 private key (selected by kid). Therefore a token minted by Node A is byte-for-byte valid to a verifier using the shared public key — including the other auth nodes and every game server. There is no "Node A's tokens" vs "Node B's tokens"; there is one issuer (iss: lattice-auth) with one key set.

Statelessness lets any node serve; the shared store lets any node see the same data; the shared signing key lets any node's tokens be trusted everywhere. That triad is precisely "balanced capabilities, both act as one auth source."

5.2 Deployment

flowchart TB
    subgraph clients["Clients / Callers"]
        C["Game client"]
        D["lattice-director (8444)"]
        S["lattice-social (9443)"]
    end

    LB["Load Balancer (L4/L7)\nTLS terminate, health checks\nVIP :8443"]

    subgraph authtier["lattice-auth tier (stateless, N>=2)"]
        A1["auth-node-1"]
        A2["auth-node-2"]
        A3["auth-node-N (scale out)"]
    end

    subgraph signing["Signing / Secrets"]
        KMS["Secret Manager / KMS\n(private signing key, by kid)"]
        SS["(optional) dedicated signing service\nholds private key, returns signatures"]
    end

    subgraph stores["Shared state (HA)"]
        PG[("PostgreSQL primary\n+ replicas — accounts, identities")]
        RDS[("Redis cluster\nsessions, refresh, rate-limit")]
        NATS[("optional NATS\nkey-rotation / revocation pub-sub")]
    end

    GS["lattice-gameserver (UDP 27015+)\nverifies tokens OFFLINE via JWKS"]

    C -->|"HTTPS :8443"| LB
    D -->|"HTTPS :8443"| LB
    S -->|"HTTPS :8443"| LB
    LB --> A1
    LB --> A2
    LB --> A3

    A1 -. "fetch/cache private key" .-> KMS
    A2 -. "fetch/cache private key" .-> KMS
    A3 -. "fetch/cache private key" .-> KMS
    A1 -. "or sign via" .-> SS
    A2 -. "or sign via" .-> SS

    A1 --> PG
    A2 --> PG
    A3 --> PG
    A1 --> RDS
    A2 --> RDS
    A3 --> RDS
    A1 -. "rotation events" .- NATS
    A2 -. "rotation events" .- NATS

    GS -->|"GET /.well-known/jwks.json (cached)"| LB

5.3 Signing key distribution & rotation (JWKS + key IDs)

The private signing key is never baked into an image or committed. Two equivalent strategies (the brief lists both):

  • Distributed private key (default). The Ed25519 private key is stored in a secret manager / KMS (e.g. cloud KMS, Vault). Each node fetches it at boot (and on rotation), caches it in memory, and signs locally — fast, no extra hop.
  • Dedicated signing service. A single small service holds the private key and exposes a "sign these bytes" RPC; auth nodes never see the private key. Stronger blast-radius control, one extra network hop. Choose this for high-compliance titles.

Public keys are openly published as a JWKS-style key set at GET /.well-known/jwks.json, each entry tagged with a kid. Verifiers (other auth nodes, director, every game server) fetch and cache the set; the token header's kid selects the right public key.

Rotation is overlap-based and zero-downtime:

flowchart LR
    A["Generate key B in KMS\n(kid = 2026-09-key-b)"] --> B["Publish B's PUBLIC key in JWKS\nalongside A (both keys served)"]
    B --> C["Wait propagate window\n>= verifier JWKS cache TTL\n(e.g. 10-15 min)"]
    C --> D["Flip nodes to SIGN with B\n(A still verifiable)"]
    D --> E["Wait > max access-token lifetime\n(all A-signed tokens expired)"]
    E --> F["Remove A's public key from JWKS\nretire kid A"]

Because both public keys are in JWKS during the overlap, tokens signed by the old key keep validating until they expire, and tokens signed by the new key validate as soon as the JWKS cache refreshes. Rotation never invalidates a live session. An emergency rotation (suspected key compromise) shortens the windows and additionally pushes a key_rotation event over NATS (or relies on a short JWKS TTL) so verifiers refresh immediately.


6. Login Flows

All flows hit the LB VIP on :8443; the LB picks any healthy node. The "auth node" lane below is whichever node was chosen — it does not matter which.

6.1 Guest / anonymous login

sequenceDiagram
    autonumber
    participant Cli as Client
    participant LB as Load Balancer (:8443)
    participant N as auth-node (any)
    participant PG as PostgreSQL
    participant R as Redis

    Cli->>LB: POST /guest { device_fingerprint }
    LB->>N: forward (any healthy node)
    N->>R: check guest-create rate limit (per IP/device)
    alt limit exceeded
        N-->>Cli: 429 Too Many Requests
    else allowed
        N->>PG: upsert Device; create Account(is_guest=true) + Identity(provider=guest) + guest_secret Credential
        N->>R: create Session + RefreshToken (hashed)
        N->>N: sign access token (Ed25519, current kid)
        N-->>Cli: 200 { access_token, refresh_token, expires_in, account_id }
    end

The returned guest_secret (delivered once) lets the same device re-authenticate later; the guest can be upgraded by linking an email/platform identity to the same Account (§6.3 linking variant).

6.2 Email + password login

sequenceDiagram
    autonumber
    participant Cli as Client
    participant LB as Load Balancer (:8443)
    participant N as auth-node (any)
    participant PG as PostgreSQL
    participant R as Redis

    Cli->>LB: POST /login { email, password, device }
    LB->>N: forward
    N->>R: check login rate limit + lockout (email + IP)
    alt locked or rate-limited
        N-->>Cli: 429 / 423 Locked
    else allowed
        N->>PG: load Identity(email) + Credential
        N->>N: Argon2id verify(password, secret_hash)
        alt password wrong
            N->>R: increment failed_attempts; maybe set locked_until
            N->>PG: write AuditLog(login, failure)
            N-->>Cli: 401 Unauthorized
        else password ok
            N->>R: reset failed_attempts; create Session + RefreshToken
            N->>PG: write AuditLog(login, success); update last_login_at
            N->>N: sign access token
            N-->>Cli: 200 { access_token, refresh_token, expires_in }
        end
    end

6.3 Platform-token exchange (e.g. Steam ticket → Lattice token)

sequenceDiagram
    autonumber
    participant Cli as Client
    participant LB as Load Balancer (:8443)
    participant N as auth-node (any)
    participant P as Platform API (Steam/Epic/Google/Apple)
    participant PG as PostgreSQL
    participant R as Redis

    Cli->>LB: POST /platform { provider:"steam", ticket, device }
    LB->>N: forward
    N->>R: rate-limit check (per IP/provider)
    N->>P: verify ticket (AuthenticateUserTicket / OIDC + JWKS)
    alt ticket invalid
        N-->>Cli: 401 Unauthorized
    else valid -> provider_user_id
        N->>PG: find Identity(provider, provider_user_id)
        alt identity exists
            PG-->>N: existing Account
        else first time
            N->>PG: create Account + Identity (verified=true)
        end
        opt account linking (attach to existing logged-in account)
            N->>PG: link new Identity to current Account (guard >=1 credential)
            N->>PG: AuditLog(link)
        end
        N->>R: create Session + RefreshToken
        N->>N: sign access token (platform claim = "steam")
        N-->>Cli: 200 { access_token, refresh_token, expires_in, account_id }
    end

This is the canonical "auth server before the game servers" step: the client trades a platform-native credential for a Lattice access token that the rest of the suite understands.

6.4 Token refresh / rotation

sequenceDiagram
    autonumber
    participant Cli as Client
    participant LB as Load Balancer (:8443)
    participant N as auth-node (any)
    participant R as Redis

    Note over Cli: access token near expiry (or expired)
    Cli->>LB: POST /refresh { refresh_token }
    LB->>N: forward (possibly a DIFFERENT node than issued it)
    N->>R: look up RefreshToken by hash
    alt not found / expired / session revoked
        N-->>Cli: 401 — must re-login
    else used == true (REPLAY!)
        N->>R: revoke entire Session + token chain
        N-->>Cli: 401 — session terminated (theft suspected)
    else valid & unused
        N->>R: mark old token used; create new RefreshToken (prev_token_id = old)
        N->>N: sign fresh access token (current kid)
        N-->>Cli: 200 { access_token, refresh_token, expires_in }
    end

Because Redis is shared, the node serving the refresh need not be the node that issued the original token — proof of the "act as one" property end to end.


7. Game-Server Token Validation (handshake bridge to 02-netcode-lld)

When a client connects to a lattice-gameserver (UDP 27015+), it presents its access token inside the 02-netcode-lld.md connection handshake (the encrypted handshake establishes X25519 + ChaCha20-Poly1305 transport keys; the access token rides in the authenticated handshake payload). The game server validates it offline:

sequenceDiagram
    autonumber
    participant Cli as Client
    participant GS as lattice-gameserver (UDP 27015+)
    participant JWKS as Auth JWKS (cached, :8443)
    participant R as Redis (optional)

    Note over GS,JWKS: at boot/periodically GS caches JWKS public keys (by kid)
    Cli->>GS: connection handshake (X25519) + access_token (PASETO/JWT)
    GS->>GS: select public key by token.kid (from cache)
    GS->>GS: verify Ed25519 signature
    GS->>GS: check exp/iat, iss="lattice-auth", aud="lattice"
    alt signature/claims invalid or expired
        GS-->>Cli: reject handshake (auth failed)
    else valid
        opt high-security titles only
            GS->>R: SISMEMBER revoked sid/jti ?
            alt revoked
                GS-->>Cli: reject handshake (revoked)
            end
        end
        GS->>GS: bind connection to sub, roles, region
        GS-->>Cli: accept — proceed to sim join (see 02-netcode-lld)
    end

Key points:

  • The default path is a pure local cryptographic check — no network call to auth. This is essential: thousands of players can connect without ever loading the auth tier, and a full auth outage does not stop matches that are already placed.
  • Optional revocation check. High-security or competitive titles can add a single Redis lookup against a small deny-set of revoked sid/jti values (auth writes to this set on logout/ban). This trades a touch of latency and a Redis dependency for near-instant revocation, instead of waiting out the ≤10-minute token expiry.
  • The game server trusts the token because it was signed by the shared key it already has the public half of — it does not care which auth node minted it.

8. Node Failover

Because nodes are stateless and the access/refresh model is store-backed, losing a node is a throughput event, not a correctness event.

sequenceDiagram
    autonumber
    participant Cli as Client
    participant LB as Load Balancer
    participant A1 as auth-node-1
    participant A2 as auth-node-2
    participant R as Redis

    LB->>A1: GET /healthz (every few seconds)
    A1-->>LB: 200 OK
    Note over A1: node-1 crashes / fails health check
    LB->>A1: GET /healthz
    A1--xLB: timeout / 5xx
    LB->>LB: mark node-1 UNHEALTHY, drain from pool
    Cli->>LB: POST /refresh { refresh_token }
    LB->>A2: route to healthy node-2
    A2->>R: read same session/refresh state (shared store)
    A2-->>Cli: 200 { new tokens }
    Note over Cli,A2: no re-login, no session loss
flowchart LR
    F["auth-node-1 fails"] --> H["LB health check fails"]
    H --> P["LB removes node from pool (connection draining)"]
    P --> Rt["In-flight idempotent requests retried on another node"]
    Rt --> OK["Survivors serve all traffic\n(stateless + shared store + shared key)"]
    OK --> SC["Auto-scaler / orchestrator replaces node\n-> rejoins pool, no special bootstrap"]
  • Health checks: LB probes GET /healthz (liveness) and GET /readyz (DB + Redis + key-cache reachable). An unready node is drained before it serves traffic.
  • In-flight requests: all write endpoints are designed to be idempotent or safe to retry (refresh rotation tolerates retry via the chain; account creation upserts on provider id). The client SDK retries on 502/503/timeouts.
  • No session loss: sessions/refresh tokens are in shared Redis and access tokens are self-contained, so a surviving node continues seamlessly. Tokens already issued by the dead node keep validating everywhere — they were signed with the shared key.
  • Replacement: a new node needs only DB/Redis credentials and KMS access; it fetches the signing key and JWKS at boot and joins the pool. No state migration.

9. Scaling & High Availability

Layer Strategy
Auth nodes Horizontal: add stateless replicas behind the LB. CPU-bound work is Argon2id hashing and Ed25519 signing; scale on CPU. N is independent per region.
Load balancer Redundant L4/L7 LB (cloud LB or HAProxy/Envoy pair) with health checks; terminates TLS, or passes through to nodes for mTLS-internal setups.
PostgreSQL Primary + streaming read replicas; reads (identity lookups) can hit replicas, writes go to primary. Automated failover (Patroni / managed Postgres). Account data is the source of truth — protected by PITR backups.
Redis Redis Cluster (or primary/replica + Sentinel) for sessions/refresh/rate-limit. Tolerates node loss; AOF persistence for durability of refresh tokens.
NATS (optional) Clustered; carries key-rotation and revocation fan-out. Non-critical — JWKS TTL is the fallback.
Regions Deploy an auth tier per region (eu/na/ap). Postgres can be globally replicated (writes home-region or a global primary); Redis is regional (sessions are region-local). The signing key set is global so a token minted in EU validates on an NA game server — important for cross-region/social. region claim records affinity for director placement.

Capacity intuition: a single node handles thousands of logins/sec and tens of thousands of refreshes/sec because refresh is a cheap Redis op + one signature; logins are dominated by the deliberately-expensive Argon2id. Sizing follows login spike, not concurrent players (those are on the data plane).


10. Security

Area Control
Transport TLS 1.3 only on :8443; HSTS; modern cipher suites. Optional mTLS for director/social→auth internal calls.
Password storage Argon2id (tuned memory/time cost), unique salt per credential; never plaintext, never reversible. Pepper held in KMS optional.
Rate limiting Per-IP, per-account, per-endpoint sliding-window counters in shared Redis (so limits hold across all nodes — a key benefit of "act as one"). Stricter limits on /login, /guest, /platform.
Brute-force / credential stuffing Failed-attempt counters with exponential backoff; account lockout (Credential.locked_until) after a threshold; IP reputation / velocity checks; optional CAPTCHA / proof-of-work challenge on suspicious bursts; breach-password screening (k-anonymity HIBP-style) at set/reset.
Audit logging Append-only AuditLog (login, logout, refresh, link/unlink, lockout, delete, key_rotation) with node_id, IP, and detail — for forensics and compliance.
Token theft Rotating single-use refresh tokens with reuse detection → whole-session revocation (§3.2, §6.4). Short access-token TTL caps stolen-access-token value.
Key management Private signing key only in KMS / signing service; overlap-based rotation with JWKS + kid (§5.3); emergency rotation path; keys never in source/images/logs.
Guest abuse Guests are device-bound and abuse-scored: rate-limited creation per IP/device, lower default trust, restricted from sensitive features until upgraded, and prunable (idle guest accounts garbage-collected). Prevents farming throwaway identities.
Authorization roles claim drives RBAC across the suite; the server role authorizes server-to-server tokens (e.g. director/gameserver service identities).
GDPR / data deletion DELETE /account triggers soft-delete (status=deleted, grace window) then hard purge of PII (Identity, Credential, Device, PII in AuditLog); data export endpoint returns the subject's data. sub (UUID) is pseudonymous and can be retained where lawful for integrity/ban enforcement after PII removal. Deletion fans out a notice to director/social to drop derived identity data.

11. API Surface

All endpoints are served on the LB VIP https://auth.<env>.lattice:8443. JSON over HTTPS. Access token passed as Authorization: Bearer <token> where required.

Method & Path Auth Purpose Returns
POST /guest none Create/restore a guest identity (device-bound). access_token, refresh_token, expires_in, account_id
POST /login none Email + password authentication. tokens + account_id
POST /platform none (or Bearer to link) Exchange a platform ticket (Steam/Epic/Google/Apple) for Lattice tokens; with Bearer, links to the current account. tokens + account_id
POST /refresh refresh token Rotate refresh token, mint new access token. new tokens
POST /logout Bearer / refresh Revoke the current session (and add sid/jti to deny-set). 204
GET /.well-known/jwks.json none (public) Current signing public keys by kid for offline verification. JWKS document
GET /account Bearer Fetch the caller's account + linked identities + devices. account profile
PATCH /account Bearer Update profile (display name, etc.). updated profile
POST /account/link Bearer Link a new verified identity to the account. updated identities
POST /account/unlink Bearer Unlink an identity (guard: ≥1 credential remains). updated identities
POST /account/export Bearer GDPR data export. export job / payload
DELETE /account Bearer GDPR deletion (soft-delete → purge). 202
GET /healthz none (internal) Liveness for LB. 200
GET /readyz none (internal) Readiness (DB/Redis/key-cache). 200 / 503

12. Cross-Doc Assumptions & Contracts

  • Access-token claim set is a shared contract (§3.1). lattice-gameserver (the 02-netcode-lld.md handshake), lattice-director, and lattice-social all depend on sub, sid, platform, roles, region, iss="lattice-auth", aud="lattice", exp, iat, jti, and the header kid. Changing these is a coordinated suite-wide change.
  • Offline verification is the default (§7): game servers validate via cached JWKS, never a synchronous auth call. Only high-security titles add the optional Redis revocation lookup.
  • sub is the universal user key consumed by lattice-social (04-social-library.md) and lattice-director. It is a stable, pseudonymous UUID that survives platform linking.
  • Crypto split: Ed25519 here is for token signing; X25519 + ChaCha20-Poly1305 in 02-netcode-lld.md is the transport layer — independent key material.
  • Delivery: node count N, KMS choice (distributed key vs signing service), and the optional NATS revocation bus are sequenced in 06-implementation-roadmap.md; the minimum shippable config is N=2 nodes + Postgres + Redis + distributed KMS key.