mirror of https://github.com/benbjohnson/litestream.git synced 2026-01-25 05:06:30 +00:00

Files

Cory LaNou 130fa20432 feat(oss): add native Alibaba Cloud OSS storage backend (#862 )

Co-authored-by: Claude <noreply@anthropic.com>

2025-12-04 08:25:38 -06:00

28 KiB

Raw Blame History

AGENT.md - Litestream AI Agent Documentation

This document provides comprehensive guidance for AI agents working with the Litestream codebase. Read this document carefully before making any modifications.

Overview
Fundamental Concepts
Core Architecture
Critical Concepts
Architectural Boundaries and Patterns
Common Pitfalls
Component Guide
Performance Considerations
Testing Requirements

Overview

Litestream is a disaster recovery tool for SQLite that runs as a background process and safely replicates changes incrementally to various storage backends. It monitors SQLite's Write-Ahead Log (WAL), converts changes to an immutable LTX format, and replicates these to configured destinations.

Current Architecture Highlights:

LTX Format: Page-level replication format replaces direct WAL mirroring
Multi-level Compaction: Hierarchical compaction keeps storage efficient (30s → 5m → 1h → snapshots)
Single Replica Constraint: Each database is replicated to exactly one remote destination
Pure Go Build: Uses modernc.org/sqlite, so no CGO dependency for the main binary
Optional NATS JetStream Support: Additional replica backend alongside S3/GCS/ABS/OSS/File/SFTP
Snapshot Compatibility: Only LTX-based backups are supported—keep legacy v0.3.x binaries to restore old WAL snapshots

Key Design Principles:

Non-invasive: Uses only SQLite API, never directly manipulates database files
Incremental: Replicates only changes, not full databases
Single-destination: Exactly one replica destination per database
Eventually Consistent: Handles storage backends with eventual consistency
Safe: Maintains long-running read transactions for consistency

Fundamental Concepts

CRITICAL: Understanding SQLite internals and the LTX format is essential for working with Litestream.

Required Reading

SQLite Internals - Understand WAL, pages, transactions, and the 1GB lock page
LTX Format - Learn the custom replication format Litestream uses

Key SQLite Concepts

WAL (Write-Ahead Log): Temporary file containing uncommitted changes
Pages: Fixed-size blocks (typically 4KB) that make up the database
Lock Page at 1GB: Special page at 0x40000000 that MUST be skipped
Checkpoints: Process of merging WAL back into main database
Transaction Isolation: Long-running read transaction for consistency

Key LTX Concepts

Immutable Files: Once written, LTX files are never modified
TXID Ranges: Each file covers a range of transaction IDs
Page Index: Binary search tree for efficient page lookup
Compaction Levels: Time-based merging to reduce storage (30s → 5min → 1hr)
Checksums: CRC-64 integrity verification at multiple levels
CLI Command: Use litestream ltx (not wal) for LTX operations

The Replication Flow

graph LR
    App[Application] -->|SQL| SQLite
    SQLite -->|Writes| WAL[WAL File]
    WAL -->|Monitor| Litestream
    Litestream -->|Convert| LTX[LTX Format]
    LTX -->|Upload| Storage[Cloud Storage]
    Storage -->|Restore| Database[New Database]

Core Architecture

graph TB
    subgraph "SQLite Layer"
        SQLite[SQLite Database]
        WAL[WAL File]
        SQLite -->|Writes| WAL
    end

    subgraph "Litestream Core"
        DB[DB Component<br/>db.go]
        Replica[Replica Manager<br/>replica.go]
        Store[Store<br/>store.go]

        DB -->|Manages| Replica
        Store -->|Coordinates| DB
    end

    subgraph "Storage Layer"
        RC[ReplicaClient Interface<br/>replica_client.go]
        S3[S3 Client]
        GCS[GCS Client]
        File[File Client]
        SFTP[SFTP Client]

        Replica -->|Uses| RC
        RC -->|Implements| S3
        RC -->|Implements| GCS
        RC -->|Implements| File
        RC -->|Implements| SFTP
    end

    WAL -->|Monitor Changes| DB
    DB -->|Checkpoint| SQLite

Data Flow Sequence

sequenceDiagram
    participant App
    participant SQLite
    participant WAL
    participant DB
    participant Replica
    participant Storage

    App->>SQLite: Write Transaction
    SQLite->>WAL: Append Changes

    loop Monitor (1s interval)
        DB->>WAL: Check Size/Changes
        WAL-->>DB: Current State

        alt WAL Has Changes
            DB->>WAL: Read Pages
            DB->>DB: Convert to LTX Format
            DB->>Replica: Queue LTX File

            loop Sync (configurable interval)
                Replica->>Storage: WriteLTXFile()
                Storage-->>Replica: FileInfo
                Replica->>Replica: Update Position
            end
        end
    end

    alt Checkpoint Needed
        DB->>SQLite: PRAGMA wal_checkpoint
        SQLite->>WAL: Merge to Main DB
    end

Critical Concepts

1. SQLite Lock Page at 1GB Boundary ⚠️

CRITICAL: SQLite reserves a special lock page at exactly 1GB (0x40000000 bytes).

// db.go:951-953 - Must skip lock page during replication
lockPgno := ltx.LockPgno(pageSize)  // Page number varies by page size
if pgno == lockPgno {
    continue // Skip this page - it's reserved by SQLite
}

Lock Page Numbers by Page Size:

4KB pages: 262145 (most common)
8KB pages: 131073
16KB pages: 65537
32KB pages: 32769

Testing Requirement: Any changes affecting page iteration MUST be tested with >1GB databases.

2. LTX File Format

LTX (Log Transaction) files are immutable, append-only files containing:

Header with transaction IDs (MinTXID, MaxTXID)
Page data with checksums
Page index for efficient seeking
Trailer with metadata

Important: LTX files are NOT SQLite WAL files - they're a custom format for efficient replication.

3. Compaction Process

Compaction merges multiple LTX files to reduce storage overhead:

flowchart LR
    subgraph "Level 0 (Raw)"
        L0A[0000000001-0000000100.ltx]
        L0B[0000000101-0000000200.ltx]
        L0C[0000000201-0000000300.ltx]
    end

    subgraph "Level 1 (30 seconds)"
        L1[0000000001-0000000300.ltx]
    end

    subgraph "Level 2 (5 minutes)"
        L2[0000000001-0000001000.ltx]
    end

    subgraph "Level 3 (1 hour)"
        L3[0000000001-0000002000.ltx]
    end

    subgraph "Snapshot (24h)"
        Snap[snapshot.ltx]
    end

    L0A -->|Merge| L1
    L0B -->|Merge| L1
    L0C -->|Merge| L1
    L1 -->|30s window| L2
    L2 -->|5min window| L3
    L3 -->|Hourly| Snap

Critical Compaction Rule: When compacting with eventually consistent storage:

// db.go:1280-1294 - ALWAYS read from local disk when available
f, err := os.Open(db.LTXPath(info.Level, info.MinTXID, info.MaxTXID))
if err == nil {
    // Use local file - it's complete and consistent
    return f, nil
}
// Only fall back to remote if local doesn't exist
return replica.Client.OpenLTXFile(...)

4. Eventual Consistency Handling

Many storage backends (S3, R2, etc.) are eventually consistent. This means:

A file you just wrote might not be immediately readable
A file might be listed but only partially available
Reads might return stale or incomplete data

Solution: Always prefer local files during compaction.

Architectural Boundaries and Patterns

CRITICAL: Understanding proper architectural boundaries is essential for successful contributions.

Layer Responsibilities

graph TB
    subgraph "DB Layer (db.go)"
        DBInit[DB.init&#40;&#41;]
        DBPos[DB position tracking]
        DBRestore[Database state validation]
        DBSnapshot[Snapshot triggering via verify&#40;&#41;]
    end

    subgraph "Replica Layer (replica.go)"
        ReplicaStart[Replica.Start&#40;&#41;]
        ReplicaSync[Sync operations]
        ReplicaPos[Replica position tracking]
        ReplicaClient[Storage interaction]
    end

    subgraph "Storage Layer"
        S3[S3/GCS/Azure]
        LTXFiles[LTX Files]
    end

    DBInit -->|Initialize| ReplicaStart
    DBInit -->|Check positions| DBPos
    DBInit -->|Validate state| DBRestore
    ReplicaStart -->|Focus on replication only| ReplicaSync
    ReplicaSync -->|Upload/Download| ReplicaClient
    ReplicaClient -->|Read/Write| S3
    S3 -->|Store| LTXFiles

✅ DO: Handle database state in DB layer

Principle: Database restoration logic belongs in the DB layer, not the Replica layer.

Pattern: When the database is behind the replica (local TXID < remote TXID):

Clear local L0 cache: Remove the entire L0 directory and recreate it
- Use os.RemoveAll() on the L0 directory path
- Recreate with proper permissions using internal.MkdirAll()
Fetch latest L0 file from replica: Download the most recent L0 LTX file
- Call replica.Client.OpenLTXFile() with the remote min/max TXID
- Stream the file contents (don't load into memory)
Write using atomic file operations: Prevent partial/corrupted files
- Write to temporary file with .tmp suffix
- Call Sync() to ensure data is on disk
- Atomically rename temp file to final path

Why this matters: If the database state is not synchronized before replication starts, the system will attempt to apply WAL segments that are ahead of the database's current position, leading to restore failures.

Reference Implementation: See DB.checkDatabaseBehindReplica() in db.go:670-737

❌ DON'T: Put database state logic in Replica layer

// WRONG - Replica should only handle replication concerns
func (r *Replica) Start() error {
    // DON'T check database state here
    if needsRestore() {  // ❌ Wrong layer!
        restoreDatabase()  // ❌ Wrong layer!
    }
    // Replica should focus only on replication mechanics
}

Atomic File Operations Pattern

CRITICAL: Always use atomic writes to prevent partial/corrupted files.

✅ DO: Write to temp file, then rename

// CORRECT - Atomic file write pattern
func writeFileAtomic(path string, data []byte) error {
    // Create temp file in same directory (for atomic rename)
    dir := filepath.Dir(path)
    tmpFile, err := os.CreateTemp(dir, ".tmp-*")
    if err != nil {
        return fmt.Errorf("create temp file: %w", err)
    }
    tmpPath := tmpFile.Name()

    // Clean up temp file on error
    defer func() {
        if tmpFile != nil {
            tmpFile.Close()
            os.Remove(tmpPath)
        }
    }()

    // Write data to temp file
    if _, err := tmpFile.Write(data); err != nil {
        return fmt.Errorf("write temp file: %w", err)
    }

    // Sync to ensure data is on disk
    if err := tmpFile.Sync(); err != nil {
        return fmt.Errorf("sync temp file: %w", err)
    }

    // Close before rename
    if err := tmpFile.Close(); err != nil {
        return fmt.Errorf("close temp file: %w", err)
    }
    tmpFile = nil // Prevent defer cleanup

    // Atomic rename (on same filesystem)
    if err := os.Rename(tmpPath, path); err != nil {
        os.Remove(tmpPath)
        return fmt.Errorf("rename to final path: %w", err)
    }

    return nil
}

❌ DON'T: Write directly to final location

// WRONG - Can leave partial files on failure
func writeFileDirect(path string, data []byte) error {
    return os.WriteFile(path, data, 0644)  // ❌ Not atomic!
}

Error Handling Patterns

✅ DO: Return errors immediately

// CORRECT - Return error for caller to handle
func (db *DB) validatePosition() error {
    dpos, err := db.Pos()
    if err != nil {
        return err
    }
    rpos := replica.Pos()
    if dpos.TXID < rpos.TXID {
        return fmt.Errorf("database position (%v) behind replica (%v)", dpos, rpos)
    }
    return nil
}

❌ DON'T: Continue on critical errors

// WRONG - Silently continuing can cause data corruption
func (db *DB) validatePosition() {
    if dpos, _ := db.Pos(); dpos.TXID < replica.Pos().TXID {
        log.Printf("warning: position mismatch")  // ❌ Don't just log!
        // Continuing here is dangerous
    }
}

Leveraging Existing Mechanisms

✅ DO: Use verify() for snapshot triggering

// CORRECT - Leverage existing snapshot mechanism
func (db *DB) ensureSnapshot() error {
    // Use existing verify() which already handles snapshot logic
    if err := db.verify(); err != nil {
        return fmt.Errorf("verify for snapshot: %w", err)
    }
    // verify() will trigger snapshot if needed
    return nil
}

❌ DON'T: Reimplement existing functionality

// WRONG - Don't recreate what already exists
func (db *DB) customSnapshot() error {
    // ❌ Don't write custom snapshot logic
    // when verify() already does this correctly
}

Common Pitfalls

❌ DON'T: Mix architectural concerns

// WRONG - Database state logic in Replica layer
func (r *Replica) Start() error {
    if db.needsRestore() {  // ❌ Wrong layer for DB state!
        r.restoreDatabase()  // ❌ Replica shouldn't manage DB state!
    }
    return r.sync()
}

✅ DO: Keep concerns in proper layers

// CORRECT - Each layer handles its own concerns
func (db *DB) init() error {
    // DB layer handles database state
    if db.needsRestore() {
        if err := db.restore(); err != nil {
            return err
        }
    }
    // Then start replica for replication only
    return db.replica.Start()
}

func (r *Replica) Start() error {
    // Replica focuses only on replication
    return r.startSync()
}

❌ DON'T: Read from remote during compaction

// WRONG - Can get partial/corrupt data
f, err := client.OpenLTXFile(ctx, level, minTXID, maxTXID, 0, 0)

✅ DO: Read from local when available

// CORRECT - Check local first
if f, err := os.Open(localPath); err == nil {
    defer f.Close()
    // Use local file
} else {
    // Fall back to remote only if necessary
}

❌ DON'T: Use RLock for write operations

// WRONG - Race condition in replica.go:217
r.mu.RLock()  // Should be Lock() for writes
defer r.mu.RUnlock()
r.pos = pos   // Writing with RLock!

✅ DO: Use proper lock types

// CORRECT
r.mu.Lock()
defer r.mu.Unlock()
r.pos = pos

❌ DON'T: Ignore CreatedAt preservation

// WRONG - Loses timestamp granularity
info := &ltx.FileInfo{
    CreatedAt: time.Now(), // Don't use current time
}

✅ DO: Preserve earliest timestamp

// CORRECT - Preserve temporal information
info, err := replica.Client.WriteLTXFile(ctx, level, minTXID, maxTXID, r)
if err != nil {
    return fmt.Errorf("write ltx: %w", err)
}
info.CreatedAt = oldestSourceFile.CreatedAt

❌ DON'T: Write files without atomic operations

// WRONG - Can leave partial files on failure
func saveLTXFile(path string, data []byte) error {
    return os.WriteFile(path, data, 0644)  // ❌ Not atomic!
}

✅ DO: Use atomic write pattern

// CORRECT - Write to temp, then rename
func saveLTXFileAtomic(path string, data []byte) error {
    tmpPath := path + ".tmp"
    if err := os.WriteFile(tmpPath, data, 0644); err != nil {
        return err
    }
    return os.Rename(tmpPath, path)  // Atomic on same filesystem
}

❌ DON'T: Ignore errors and continue

// WRONG - Continuing after error can corrupt state
func (db *DB) processFiles() {
    for _, file := range files {
        if err := processFile(file); err != nil {
            log.Printf("error: %v", err)  // ❌ Just logging!
            // Continuing to next file is dangerous
        }
    }
}

✅ DO: Return errors for proper handling

// CORRECT - Let caller decide how to handle errors
func (db *DB) processFiles() error {
    for _, file := range files {
        if err := processFile(file); err != nil {
            return fmt.Errorf("process file %s: %w", file, err)
        }
    }
    return nil
}

❌ DON'T: Recreate existing functionality

// WRONG - Don't reimplement what already exists
func customSnapshotTrigger() {
    // Complex custom logic to trigger snapshots
    // when db.verify() already does this!
}

✅ DO: Leverage existing mechanisms

// CORRECT - Use what's already there
func triggerSnapshot() error {
    return db.verify()  // Already handles snapshot logic correctly
}

Component Guide

DB Component (db.go)

Responsibilities:

Manages SQLite database connection (via modernc.org/sqlite - no CGO)
Monitors WAL for changes
Performs checkpoints
Maintains long-running read transaction
Converts WAL pages to LTX format

Key Fields:

type DB struct {
    path     string      // Database file path
    db       *sql.DB     // SQLite connection
    rtx      *sql.Tx     // Long-running read transaction
    pageSize int         // Database page size (critical for lock page)
    notify   chan struct{} // Notifies on WAL changes
}

Initialization Sequence:

Open database connection
Read page size from database
Initialize long-running read transaction
Start monitor goroutine
Initialize replicas

Replica Component (replica.go)

Responsibilities:

Manages replication to a single destination (one replica per DB)
Tracks replication position (ltx.Pos)
Handles sync intervals
Manages encryption (if configured)

Key Operations:

Sync(): Synchronizes pending changes
SetPos(): Updates replication position (must use Lock, not RLock!)
Snapshot(): Creates full database snapshot

ReplicaClient Interface (replica_client.go)

Required Methods:

type ReplicaClient interface {
    Type() string  // Client type identifier

    // File operations
    LTXFiles(ctx context.Context, level int, seek ltx.TXID, useMetadata bool) (ltx.FileIterator, error)
    OpenLTXFile(ctx context.Context, level int, minTXID, maxTXID ltx.TXID, offset, size int64) (io.ReadCloser, error)
    WriteLTXFile(ctx context.Context, level int, minTXID, maxTXID ltx.TXID, r io.Reader) (*ltx.FileInfo, error)
    DeleteLTXFiles(ctx context.Context, files []*ltx.FileInfo) error
    DeleteAll(ctx context.Context) error
}

LTXFiles useMetadata Parameter:

useMetadata=true: Fetch accurate timestamps from backend metadata (required for point-in-time restores)
- Slower but provides correct CreatedAt timestamps
- Use when restoring to specific timestamp
useMetadata=false: Use fast timestamps (LastModified/ModTime) for normal operations
- Faster enumeration, suitable for synchronization
- Use during replication monitoring

Implementation Requirements:

Handle partial reads gracefully
Implement proper error types (os.ErrNotExist)
Support seek/offset for efficient page fetching
Preserve file timestamps when useMetadata=true

Store Component (store.go)

Responsibilities:

Coordinates multiple databases
Manages compaction schedules
Controls resource usage
Handles retention policies

Default Compaction Levels:

var defaultLevels = CompactionLevels{
    {Level: 0, Interval: 0},        // Raw LTX files (no compaction)
    {Level: 1, Interval: 30*Second},
    {Level: 2, Interval: 5*Minute},
    {Level: 3, Interval: 1*Hour},
    // Snapshots created daily (24h retention)
}

Performance Considerations

O(n) Operations to Watch

Page Iteration: Linear scan through all pages
- Cache page index when possible
- Use binary search on sorted page lists
File Listing: Directory scans can be expensive
- Cache file listings when unchanged
- Use seek parameter to skip old files
Compaction: Reads all input files
- Limit concurrent compactions
- Use appropriate level intervals

Caching Strategy

// Page index caching example
const DefaultEstimatedPageIndexSize = 32 * 1024 // 32KB

// Fetch end of file first for page index
offset := info.Size - DefaultEstimatedPageIndexSize
if offset < 0 {
    offset = 0
}
// Read page index once, cache for duration of operation

Batch Operations

Group small writes into larger LTX files
Batch delete operations for old files
Use prepared statements for repeated queries

Testing Requirements

For Any DB Changes

# Test with various page sizes
./bin/litestream-test populate -db test.db -page-size 4096 -target-size 2GB
./bin/litestream-test populate -db test.db -page-size 8192 -target-size 2GB

# Test lock page handling
./bin/litestream-test validate -source-db test.db -replica-url file:///tmp/replica

For Replica Client Changes

# Test eventual consistency
go test -v ./replica_client_test.go -integration [s3|gcs|abs|oss|sftp]

# Test partial reads
# (Example) add targeted partial-read tests in your backend package
go test -v -run TestReplicaClient_PartialRead ./...

For Compaction Changes

# Test with store compaction
go test -v -run TestStore_CompactDB ./...

# Test with eventual consistency mock
go test -v -run TestStore_CompactDB_RemotePartialRead ./...

Race Condition Testing

# Always run with race detector
go test -race -v ./...

# Specific race-prone areas
go test -race -v -run TestReplica_Sync ./...
go test -race -v -run TestDB_Sync ./...
go test -race -v -run TestStore_CompactDB ./...

Quick Reference

File Paths

Database: /path/to/database.db
Metadata: /path/to/database.db-litestream/
LTX Files: /path/to/database.db-litestream/ltx/LEVEL/MIN-MAX.ltx
Snapshots: /path/to/database.db-litestream/snapshots/TIMESTAMP.ltx

Key Configuration

l0-retention: 5m                 # Minimum time to keep compacted L0 files
l0-retention-check-interval: 15s # Frequency for enforcing L0 retention
dbs:
  - path: /path/to/db.sqlite
    replica:
      type: s3
      bucket: my-bucket
      path: db-backup
      sync-interval: 10s  # How often to sync

# Compaction configuration (default)
levels:
  - level: 1
    interval: 30s    # 30-second windows
  - level: 2
    interval: 5m     # 5-minute windows
  - level: 3
    interval: 1h     # 1-hour windows

Important Constants

DefaultMonitorInterval    = 1 * time.Second   // WAL check frequency
DefaultCheckpointInterval = 1 * time.Minute   // Time-based passive checkpoint frequency
DefaultMinCheckpointPageN = 1000              // Min pages before passive checkpoint
DefaultTruncatePageN      = 121359            // ~500MB truncate threshold (REMOVED: DefaultMaxCheckpointPageN - RESTART mode permanently removed due to #724)

Getting Help

For complex architectural questions, consult:

docs/SQLITE_INTERNALS.md - SQLite fundamentals, WAL format, lock page details
docs/LTX_FORMAT.md - LTX file format specification and operations
docs/ARCHITECTURE.md - Deep technical details of Litestream components
docs/REPLICA_CLIENT_GUIDE.md - Storage backend implementation guide
docs/TESTING_GUIDE.md - Comprehensive testing strategies
Review recent PRs for current patterns and best practices

Future Roadmap

Planned Features:

Litestream VFS: Virtual File System for read replicas
- Instantly spin up database copies
- Background hydration from S3
- Enables scaling read operations without full database downloads
Enhanced read replica support: Direct reads from remote storage

Important Constraints

Single Replica Authority: Each database is replicated to exactly one remote target—configure redundancy at the storage layer if needed.
Legacy Backups: Pre-LTX (v0.3.x) WAL snapshots cannot be restored with current binaries; keep an old binary around to hydrate those backups before re-replicating.
CLI Changes: Use litestream ltx for LTX inspection; litestream wal is deprecated.
Pure Go Build: The default build is CGO-free via modernc.org/sqlite; enable CGO only for optional VFS tooling.
Page-Level Compaction: Expect compaction to merge files across 30s/5m/1h windows plus daily snapshots.

Final Checklist Before Making Changes

Read this entire document
Read docs/SQLITE_INTERNALS.md for SQLite fundamentals
Read docs/LTX_FORMAT.md for replication format details
Understand current constraints (single replica authority, LTX-only restores)
Understand the component you're modifying
Understand architectural boundaries (DB vs Replica responsibilities)
Check for eventual consistency implications
Consider >1GB database edge cases (lock page at 0x40000000)
Use atomic file operations (temp file + rename)
Return errors properly (don't just log and continue)
Leverage existing mechanisms (e.g., verify() for snapshots)
Plan appropriate tests
Review recent similar PRs for patterns
Use proper locking (Lock vs RLock)
Preserve timestamps where applicable
Test with race detector enabled

Agent-Specific Instructions

This document serves as the universal source of truth for all AI coding assistants. Different agents may access it through various paths:

Claude: Reads AGENTS.md directly (also loads CLAUDE.md if present)
GitHub Copilot: Via .github/copilot-instructions.md symlink
Cursor: Via .cursorrules symlink
Gemini: Reads AGENTS.md and respects .aiexclude patterns
Other agents: Check for AGENTS.md or llms.txt in repository root

GitHub Copilot / OpenAI Codex

Context Window: 64k tokens (upgrading to 1M with GPT-4.1)

Best Practices:

Use /explain command for SQLite internals
Reference patterns in Common Pitfalls section
Switch to GPT-5-Codex model for complex refactoring
Focus on architectural boundaries and anti-patterns
Leverage workspace indexing for multi-file operations

Model Selection:

Use GPT-4o for quick completions
Switch to GPT-5 or Claude Opus 4.1 for complex tasks

Cursor

Context Window: Configurable based on model selection

Best Practices:

Enable "codebase indexing" for full repository context
Use Claude 3.5 Sonnet for architectural questions
Use GPT-4o for quick inline completions
Split complex rules into .cursor/rules/*.mdc files if needed
Leverage workspace search before asking questions

Model Recommendations:

Architecture changes: Claude 3.5 Sonnet
Quick fixes: GPT-4o or cursor-small
Test generation: Any model with codebase context

Claude / Claude Code

Context Window: 200k tokens standard (1M in beta)

Best Practices:

Full documentation can be loaded (5k lines fits easily)
Reference docs/ subdirectory for deep technical details
Use structured note-taking for complex multi-step tasks
Leverage MCP tools when available
Check CLAUDE.md for project-specific configuration

Strengths:

Deep architectural reasoning
Complex system analysis
Large context window utilization

Google Gemini / Gemini Code Assist

Context Window: Varies by tier

Best Practices:

Check .aiexclude for files to ignore
Enable local codebase awareness
Excellent for test generation and documentation
Use for code review and security scanning
Leverage code customization features

Configuration:

Respects .aiexclude patterns (like .gitignore)
Can use custom AI rules files

General Multi-Agent Guidelines

Always start with this document (AGENTS.md) for project understanding
Check llms.txt for quick navigation to other documentation
Respect architectural boundaries (DB layer vs Replica layer)
Follow the patterns in Common Pitfalls section
Test with race detector for any concurrent code changes
Preserve backward compatibility with current constraints

Documentation Hierarchy

Tier 1 (Always read):
- AGENTS.md (this file)
- llms.txt (if you need navigation)

Tier 2 (Read when relevant):
- docs/SQLITE_INTERNALS.md (for WAL/page work)
- docs/LTX_FORMAT.md (for replication work)
- docs/ARCHITECTURE.md (for major changes)

Tier 3 (Reference only):
- docs/TESTING_GUIDE.md (for test scenarios)
- docs/REPLICA_CLIENT_GUIDE.md (for new backends)

28 KiB Raw Blame History

AGENT.md - Litestream AI Agent Documentation

Table of Contents

Overview

Fundamental Concepts

Required Reading

Key SQLite Concepts

Key LTX Concepts

The Replication Flow

Core Architecture

Data Flow Sequence

Critical Concepts

1. SQLite Lock Page at 1GB Boundary ⚠️

2. LTX File Format

3. Compaction Process

4. Eventual Consistency Handling

Architectural Boundaries and Patterns

Layer Responsibilities

✅ DO: Handle database state in DB layer

❌ DON'T: Put database state logic in Replica layer

Atomic File Operations Pattern

✅ DO: Write to temp file, then rename

❌ DON'T: Write directly to final location

Error Handling Patterns

✅ DO: Return errors immediately

❌ DON'T: Continue on critical errors

Leveraging Existing Mechanisms

✅ DO: Use verify() for snapshot triggering

❌ DON'T: Reimplement existing functionality

Common Pitfalls

❌ DON'T: Mix architectural concerns

✅ DO: Keep concerns in proper layers

❌ DON'T: Read from remote during compaction

✅ DO: Read from local when available

❌ DON'T: Use RLock for write operations

✅ DO: Use proper lock types

❌ DON'T: Ignore CreatedAt preservation

✅ DO: Preserve earliest timestamp

❌ DON'T: Write files without atomic operations

✅ DO: Use atomic write pattern

❌ DON'T: Ignore errors and continue

✅ DO: Return errors for proper handling

❌ DON'T: Recreate existing functionality

✅ DO: Leverage existing mechanisms

Component Guide

DB Component (db.go)

Replica Component (replica.go)

ReplicaClient Interface (replica_client.go)

Store Component (store.go)

Performance Considerations

O(n) Operations to Watch

Caching Strategy

Batch Operations

Testing Requirements

For Any DB Changes

For Replica Client Changes

For Compaction Changes

Race Condition Testing

Quick Reference

File Paths

Key Configuration

Important Constants

Getting Help

Future Roadmap

Important Constraints

Final Checklist Before Making Changes

Agent-Specific Instructions

GitHub Copilot / OpenAI Codex

Cursor

Claude / Claude Code

Google Gemini / Gemini Code Assist

General Multi-Agent Guidelines

Documentation Hierarchy

28 KiB

Raw Blame History