Compositor and Recording Architecture

The compositor is the system that records live event rooms. It captures video from a LiveKit room, streams it through Mux for HLS playback, and generates per-participant clips afterward. The architecture spans three Durable Objects, a headless browser page rendered by LiveKit Egress, and two external services (Mux and LiveKit).

How Recording Works (End to End)

When a moderator clicks "Start Recording," here is the full chain of events:

sequenceDiagram
    participant Mod as Moderator
    participant API as Workers API
    participant Coord as Coordinator DO
    participant WF as Workflow DO
    participant Mux as Mux API
    participant LK as LiveKit Egress
    participant Page as Composite Page
    participant Event as Event DO
    participant Clients as Connected Clients

    Mod->>API: POST /cloudcaster/start
    API->>Coord: acquireLock(eventId, roomId)
    Coord-->>API: { success, sessionId }
    API->>WF: start(params)

    WF->>WF: Step 1: Create DB session (status: starting)
    WF->>Event: Broadcast snapshot (starting)
    Event->>Clients: Push update

    WF->>Mux: Create live stream (low_latency)
    Mux-->>WF: { streamId, streamKey, playbackId }

    WF->>Mux: Poll until status = "idle"
    Note over WF,Mux: Exponential backoff: 30 attempts

    WF->>LK: startWebEgress(compositeUrl, mux://streamKey)
    LK->>Page: Headless browser loads composite page
    Page->>Page: Join LiveKit room as "compositor"
    Page->>Page: Render 1920x1080 canvas
    Page->>Page: console.log("START_RECORDING")
    LK->>Mux: RTMP stream begins

    WF->>Coord: activateLock(muxStreamId, egressId)
    WF->>Event: Broadcast snapshot (recording active)
    Event->>Clients: Push update (playbackId available)

    WF->>WF: Poll HLS manifest + Mux API
    Note over WF: Detect when stream is live for viewers

    WF->>Event: Query stage participants
    WF->>WF: Log initial entered_live events

The entire start() method runs inside blockConcurrencyWhile(), which guarantees that no other request can interrupt the workflow while it executes. This is critical because the workflow creates resources across three services (DB, Mux, LiveKit) and needs to either complete fully or clean up on failure.

The Three Recording Durable Objects

CloudcasterCoordinator — Lock Manager

The Coordinator prevents duplicate recordings. Only one recording session can be active per event room at any time. It also monitors the health of running sessions.

Lock lifecycle:

acquireLock() → status: "acquiring"
    ↓
activateLock() → status: "active"  (after Mux + LiveKit are confirmed)
    ↓
releaseLock() → lock deleted       (after stop completes)

Lock data structure (apps/workers/src/durable-objects/cloudcaster/cloudcaster-coordinator.ts):

type SessionLock = {
  eventId: string;
  roomId: string;
  sessionId: string;          // UUID for this recording session
  acquiredAt: Date;
  lastActivity?: Date;
  status: "acquiring" | "active" | "releasing";
  muxStreamId?: string;       // Added in Step 2
  lkEgressId?: string;        // Added in Step 4
  key: string;                // "{eventId}:{roomId}"
};

What happens when you try to start recording while one is already running:

  1. Lock key is ${eventId}:${roomId} — one lock per room
  2. If an existing lock has status "acquiring", the request is denied (another start is in progress)
  3. If an existing lock has status "active", the Coordinator checks the stream health:
    • If healthy (Mux receiving data, egress running) — request denied
    • If unhealthy (stream dead, egress failed) — old lock is released, new one granted
  4. If an existing lock has status "releasing", the request is denied (a stop is in progress)
  5. If the session has run longer than 12 hours (LiveKit egress maximum), the Coordinator forces a stop

Rate limiting: The Coordinator enforces a 10-second minimum between lock acquisition attempts for the same room. This prevents rapid start/stop cycles that could leave orphan resources.

Health checks run every 30 seconds via a Durable Object alarm:

// Checks both services are alive
async checkStreamHealth(lock: SessionLock): Promise<boolean> {
  const muxHealthy = await this.checkMuxStream(lock.muxStreamId);
  const lkHealthy = await this.checkEgressStatus(lock.lkEgressId);
  return muxHealthy && lkHealthy;
}

If a health check fails, the Coordinator updates the session status and can trigger cleanup.

Architect's Note: The Coordinator uses blockConcurrencyWhile() for lock operations, so two simultaneous start requests can't both acquire a lock. This is the Durable Object equivalent of a database row lock, but without the overhead of a database query. The single-threaded nature of DOs is the actual concurrency guarantee.

CloudcasterWorkflow — Orchestrator

The Workflow is the brain of the recording pipeline. It coordinates seven sequential steps to start a recording and eight steps to stop one.

Start flow (apps/workers/src/durable-objects/cloudcaster/cloudcaster-workflow.ts):

Step Action External service Failure handling
1 Create session record in DB Workers D1 Abort
2 Create Mux live stream Mux API Cleanup DB record
3 Wait for Mux stream ready Mux API (polling) Cleanup Mux stream + DB
4 Start LiveKit web egress LiveKit API Cleanup Mux + DB
5 (skipped — delegated to Coordinator alarm)
6 Detect stream active for viewers Mux API + HLS Non-blocking (background poll)
7 Log initial stage participants Event DO Non-critical

Mux stream creation (Step 2):

const stream = await muxClient.video.liveStreams.create({
  low_latency: true,               // ~3-5s latency for viewers
  playback_policy: ["public"],     // No token needed to watch
  new_asset_settings: {
    playback_policy: ["public"],
    master_access: "temporary",    // Allow clips from the recording
  },
  reconnect_window: 60,            // 60s reconnect grace period
  use_slate_for_standard_latency: true,
  reconnect_slate_url: env.MUX_SLATE_URL,  // Shown during reconnect
});

The reconnect_window: 60 setting is important — if the RTMP stream drops for up to 60 seconds, Mux keeps the stream alive and shows a slate image. This prevents brief network interruptions from ending the entire recording.

Polling for Mux readiness (Step 3):

Mux streams aren't immediately ready after creation. The workflow polls with exponential backoff until the stream status changes from "created" to "idle" (meaning ready to receive RTMP input):

  • 30 maximum attempts
  • 1-second base delay, 1.5x multiplier, 10-second cap
  • Total maximum wait: ~3 minutes

Starting LiveKit Egress (Step 4):

This is the core bridge between LiveKit (WebRTC) and Mux (RTMP):

const streamOutput = new StreamOutput({
  urls: [`mux://${state.muxStream.stream_key}`],
});

const egressInfo = await egressClient.startWebEgress(
  state.params.compositeUrl,        // The composite page URL
  { stream: streamOutput },
  { encodingOptions: EncodingOptionsPreset.H264_1080P_30 },
);

Key design decisions:

  1. Web Egress, not Room Composite Egress — LiveKit offers two egress types. Room Composite Egress uses LiveKit's built-in layout engine. Web Egress launches a headless Chromium browser that loads a URL. Eventuall uses Web Egress because the composite page is a custom React component with layouts that match the live event UI.

  2. mux:// protocol — LiveKit has native Mux integration. The mux:// shorthand resolves to rtmps://global-live.mux.com:443/app/{streamKey} internally, handling TLS and the correct RTMP endpoint.

  3. H264 1080p 30fps — The encoding preset balances quality and bandwidth. 1080p matches the composite page's fixed canvas size.

Detecting stream active for viewers (Step 6):

After the RTMP stream starts flowing to Mux, there's a delay before viewers can watch. The workflow uses a dual-check race to detect this:

  1. HLS manifest check (every 1.5 seconds): HEAD https://stream.mux.com/{playbackId}.m3u8 — returns 200 when the stream is viewable
  2. Mux API check (every 5 seconds): retrieves stream status, checks for "active" state

Whichever succeeds first triggers the playback status update. Maximum wait: 5 minutes. This runs as a background task (ctx.waitUntil) so it doesn't block the workflow response.

Pitfall: The workflow runs inside blockConcurrencyWhile() for the start sequence. If the workflow takes too long (e.g., Mux is slow to respond), the Durable Object will hold off all other requests. The polling timeouts are tuned to prevent this from exceeding a few minutes, but under extreme API latency, the DO could become unresponsive. The Coordinator's health check alarm provides a safety net.

Stop Flow

Stopping is simpler but has its own complexity around parallel service shutdown:

sequenceDiagram
    participant Mod as Moderator
    participant WF as Workflow DO
    participant LK as LiveKit
    participant Mux as Mux
    participant DB as Workers D1
    participant Event as Event DO

    Mod->>WF: stop()
    WF->>DB: Update status: "stopping"
    WF->>Event: Broadcast snapshot (stopping)

    par Stop services in parallel
        WF->>LK: stopEgress(egressId)
        WF->>Mux: disableStream(streamId)
    end

    alt Graceful stop failed
        WF->>LK: Force stop egress
        WF->>Mux: Force disable stream
    end

    WF->>WF: Verify both services stopped
    WF->>Mux: Retrieve final asset data
    WF->>DB: Update: status "stopped", asset IDs

    WF->>WF: Generate participant clips
    WF->>WF: Release coordinator lock
    WF->>Event: Broadcast final snapshot

Services are stopped with Promise.allSettled() — if one fails, the other still proceeds. A second pass force-stops any service that didn't respond to the graceful stop. The workflow then collects the final Mux asset data (the recorded video is automatically saved as a Mux asset when the stream ends) and generates participant clips.

Error Handling and Cleanup

If any step in the start flow fails, the workflow calls cleanupFailedWorkflow():

  1. If the Mux stream was created, disables it
  2. If the LiveKit egress was started, stops it
  3. If a DB session exists with no external resource IDs, deletes it
  4. If a DB session has resource IDs, marks it as "failed" (for debugging)
  5. Releases the Coordinator lock
  6. Notifies the Event DO with the failure status

The Coordinator's alarm-based health check provides an additional safety net. If the workflow crashes entirely (DO runtime failure), the Coordinator detects the orphaned lock and cleans it up within 30 seconds.

The Composite Page

LiveKit Web Egress loads a URL in a headless browser. That URL is the composite page — a Next.js route that renders the video layout as a 1920x1080 canvas.

Route: apps/webapp/src/app/event/[eventId]/room/[roomId]/composite/page.tsx

URL pattern: /event/{eventId}/room/{roomId}/composite?layout=one-up

How It Works

graph LR
    subgraph "LiveKit Egress (Headless Browser)"
        Browser["Chromium"]
        Browser --> Page["Composite Page"]
    end

    subgraph "Next.js Server"
        Page --> Server["Server Component"]
        Server --> Token["Generate compositor token"]
    end

    subgraph "Client-Side React"
        Page --> Client["CompositeClient"]
        Client --> LKRoom["LiveKitRoom component"]
        LKRoom --> Template["Template renderer"]
        Template --> Layout["Layout component"]
    end

    subgraph "LiveKit Cloud"
        LKRoom -.->|WebRTC| LKServer["LiveKit Server"]
    end

    Layout --> Canvas["1920x1080 Canvas"]
    Canvas --> RTMP["RTMP Output → Mux"]

Server Component (apps/webapp/src/components/livekit/composite/Composite.tsx):

  • Fetches a LiveKit access token via tRPC with joinAs: "compositor"
  • The compositor identity is special — it joins the room as a subscriber only, not a publisher
  • Passes the token, LiveKit host URL, and layout name to the client component

Client Component (apps/webapp/src/components/livekit/composite/CompositeClient.tsx):

  • Creates a <LiveKitRoom> connection using the compositor token
  • If already inside an existing LiveKit room context (embedded preview), reuses that connection
  • Uses useTransformChildToParentRectViaScale to scale the 1920x1080 canvas to fit any container size

Template (apps/webapp/src/components/livekit/composite/Template.tsx):

  • Renders a fixed 1920x1080 pixel canvas (matching the egress encoding resolution)
  • Selects a layout based on the ?layout= URL parameter or automatic detection via useDetermineOptimalLayout()
  • Can display a background image from the event's backgroundImageId
  • On mount, emits console.log("START_RECORDING") — this is a signal that LiveKit Egress watches for to know the page is ready to capture

Layout System

The composite page supports multiple layouts, each rendering participants in a different arrangement:

Layout When used Arrangement
one-up 1 participant Single speaker, full frame
two-up 2 participants Side by side
three-up 3 participants Various configurations
four-up 4 participants 2x2 grid
pin-two-up 1 pinned + 1 Large pinned speaker + small thumbnail
pin-three-up 1 pinned + 2 Large pinned speaker + 2 thumbnails
pin-four-up 1 pinned + 3 Large pinned speaker + 3 thumbnails

Layout files live in apps/webapp/src/components/livekit/composite/layouts/.

Pin layout (pin.tsx) is the most complex:

  • Main pinned track: 1373x816 pixels (roughly 16:9 at the left)
  • Non-pinned tracks: 307x307 pixels (stacked vertically on the right)
  • CSS Grid with explicit regions for flexible spacing around the content
  • Uses useLiveParticipantTracks() to get track references from the LiveKit room

Automatic layout selection (useDetermineOptimalLayout):

  • Counts the number of active participants with video tracks
  • If any track is pinned, selects the appropriate pin-* variant
  • Otherwise, maps participant count to the basic layout (1 → one-up, 2 → two-up, etc.)

Pitfall: The composite page runs inside LiveKit's headless Chromium browser, not in a normal browser window. This means no user interaction is possible — no clicks, no scrolls. The page must render the correct layout purely from the LiveKit room state. If the layout logic depends on user interaction (like a manual pin toggle), that state must come from the Event DO's snapshot, not from local UI.

Participant Clips

After recording stops, the workflow generates individual video clips for each participant based on when they were on the live stage.

Stage Event Tracking

During a recording, every time a participant enters or exits the live stage, a stage event is written to the workers database:

-- Stage events table (apps/workers/src/drizzle/schema.ts)
CREATE TABLE stage_events (
  id TEXT PRIMARY KEY,
  session_id TEXT REFERENCES cloudcaster_sessions(id),
  event_id TEXT NOT NULL,
  room_id TEXT NOT NULL,
  participant_id TEXT NOT NULL,
  user_id TEXT,
  event_type TEXT NOT NULL,  -- "entered_live" | "exited_live"
  timestamp TEXT NOT NULL,
  time_offset_seconds REAL NOT NULL,  -- seconds since recording started
  created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

The time_offset_seconds field is the key metric — it records how many seconds after the recording started this event occurred. This offset maps directly to a position in the Mux recording.

When recording starts, the workflow calls logInitialStageParticipants() to snapshot everyone currently on stage with time_offset_seconds: 0.

Clip Generation Flow

After the recording stops and the Mux asset is created:

graph TD
    A[Fetch all stage events for session] --> B[Group by participantId]
    B --> C[For each participant: match enter/exit pairs]
    C --> D{Valid time range?}
    D -->|Yes| E[Create Mux clip asset]
    D -->|No| F[Skip - too short or invalid]
    E --> G[Insert participantClips record]
    G --> H[Poll Mux until clip ready]

Event pairing (matchEventPairs in cloudcaster-workflow.ts):

The algorithm iterates through a participant's events chronologically:

  • An entered_live event starts a range
  • The next exited_live event ends it
  • Edge case — duplicate enters: If two entered_live events appear without an exit between them, the first is implicitly closed at the second's timestamp
  • Edge case — orphaned exit: An exited_live without a preceding enter is skipped
  • Edge case — still on stage: If the recording ends while a participant is still on stage, the session's stop time is used as the exit timestamp
  • All offsets are clamped to [0, maxDuration] to stay within the recording's bounds

Mux clip creation:

const clipAsset = await mux.video.assets.create({
  inputs: [{
    url: `mux://assets/${muxAssetId}`,      // Source: the full recording
    start_time: range.startOffset,           // Seconds from start
    end_time: range.endOffset,               // Seconds from start
  }],
  playback_policy: ["public"],
  passthrough: JSON.stringify({
    sessionId, participantId, userId, clipId, timestamp,
  }),
});

Mux creates a new asset by extracting a time range from the parent recording. The passthrough field lets you match the Mux asset back to the database record.

Clip database record:

-- Participant clips table (apps/workers/src/drizzle/schema.ts)
CREATE TABLE participant_clips (
  id TEXT PRIMARY KEY,
  session_id TEXT REFERENCES cloudcaster_sessions(id),
  participant_id TEXT NOT NULL,
  user_id TEXT,
  start_event_id TEXT REFERENCES stage_events(id),
  end_event_id TEXT REFERENCES stage_events(id),
  start_time_offset REAL NOT NULL,
  end_time_offset REAL NOT NULL,
  duration_seconds REAL NOT NULL,
  mux_clip_asset_id TEXT,
  mux_clip_playback_id TEXT,
  clip_status TEXT DEFAULT 'pending',  -- pending | processing | ready | failed
  error_message TEXT,
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

Pitfall: Clip generation is asynchronous. After creating the Mux clip asset, it takes time for Mux to process the source recording and extract the clip. The workflow polls for clip readiness with backoff. If the parent recording is still being processed by Mux when clip creation is requested, the clip stays in "pending" status until the parent is ready. Long recordings (2+ hours) can take significant time to process.

Session Database Schema

The recording session itself is tracked in the workers D1 database:

-- Cloudcaster sessions table (apps/workers/src/drizzle/schema.ts)
CREATE TABLE cloudcaster_sessions (
  id TEXT PRIMARY KEY,            -- Same as workflow instance ID
  event_id TEXT NOT NULL,
  room_id TEXT NOT NULL,
  recording_status TEXT,          -- starting | started | stopping | stopped | failed
  playback_status TEXT,           -- starting | started | stopping | stopped | failed
  mux_stream_id TEXT,
  mux_stream_key TEXT,
  mux_playback_id TEXT,
  mux_asset_id TEXT,              -- Set after recording stops
  mux_asset_playback_id TEXT,     -- Playback ID for the recorded asset
  lk_egress_id TEXT,
  egress_duration_seconds REAL,   -- Total recording duration
  error_message TEXT,
  started_at TEXT,
  stopped_at TEXT,
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

Status transitions:

recording: starting → started → stopping → stopped
                  ↘                   ↗
                   → failed ←--------

The playbackStatus follows a similar pattern but is independent — the Mux stream might be active (RTMP receiving data) before it's playable (HLS manifest available).

How the Event DO Uses Recording State

The Event DO doesn't manage recording directly — it delegates to the Coordinator and Workflow. But it includes recording state in its snapshots so all connected clients know what's happening.

The Event DO:

  1. Receives broadcast updates from the Workflow at each step (starting, Mux stream created, egress started, recording active, stopping, stopped, failed)
  2. Includes recording state in snapshots — when connected clients receive a new snapshot, they see the current recordingStatus, playbackStatus, and muxPlaybackId
  3. Queries the Coordinator when clients reconnect — to get the current lock state and session data
  4. Provides stage participant data — the Workflow queries the Event DO's getStageParticipants() method to know who is currently on stage for initial stage event logging

This separation means the Event DO stays focused on presence and state broadcasting. The recording orchestration complexity lives entirely in the Workflow and Coordinator.

Architect's Note: The three-DO architecture (Event, Coordinator, Workflow) was chosen over a monolithic approach because recording failures should never crash the Event DO. If the Workflow throws an unhandled exception, the Event DO keeps running — presence tracking, chat, and stage management continue unaffected. The Coordinator detects the orphaned Workflow via its health check alarm and cleans up.