🏗️ DeepSeek Reasonix — Architecture Analysis

~350

Source Files

Architectural Pillars

99.82%

Real-World Cache Hit

1,052

Lines in loop.ts

180

Lines Code Prompt

~250

Tests

What is Reasonix?

A DeepSeek-native, cache-first coding agent for the terminal. Unlike general-purpose agents, every architectural decision is justified by DeepSeek-specific behavior or economic property. The product north star: "a coding agent that stays cheap enough to leave on."

Key differentiator: cache stability is an invariant the loop is designed around, not a feature you toggle on. The entire codebase is tuned to DeepSeek's byte-stable automatic prefix-caching mechanic — achieving real-world 99.82% cache hit rates (435M input tokens, ~$12 instead of ~$61).

npm: reasonix Node ≥22 MIT License TypeScript 5.6+ Commander.js + Ink 5 TUI

flowchart TB
  subgraph "User Interface"
    CLI["CLI Entry\nsrc/cli/index.ts"]
    TUI["Ink TUI\nsrc/cli/ui/App.tsx\n(~1984 LOC)"]
    DASH["Dashboard SPA\nReact + Express\nport :3100"]
  end

  subgraph "Core Engine"
    LOOP["CacheFirstLoop\nsrc/loop.ts\n(1052 LOC)"]
    CM["ContextManager\nsrc/context-manager.ts\n(345 LOC)"]
    REPAIR["ToolCallRepair\nsrc/repair/"]
    PROMPT["Prompt Builder\nsrc/code/prompt.ts\n+ prompt-fragments.ts"]
  end

  subgraph "Tool System"
    TOOLS["ToolRegistry\nsrc/tools.ts"]
    FS["Filesystem\nread/write/edit/search"]
    SHELL["Shell\nrun_command + jobs"]
    SUB["Subagents\nspawn_subagent"]
    MCP_BRIDGE["MCP Bridge\nstdio + SSE"]
    WEB["Web\nsearch + fetch"]
  end

  subgraph "Infrastructure"
    CLIENT["DeepSeek Client\nsrc/client.ts\nfetch + SSE streaming"]
    SESSION["Session Store\nJSONL persistence"]
    MEMORY["Memory Store\nUser / Project / Runtime"]
    TOKENIZER["DeepSeek V3 Tokenizer\nPorted from Python"]
  end

  CLI --> TUI
  TUI --> LOOP
  LOOP --> CM
  LOOP --> REPAIR
  LOOP --> PROMPT
  LOOP --> TOOLS
  TOOLS --> FS
  TOOLS --> SHELL
  TOOLS --> SUB
  TOOLS --> MCP_BRIDGE
  TOOLS --> WEB
  LOOP --> CLIENT
  LOOP --> SESSION
  LOOP --> MEMORY
  LOOP --> TOKENIZER
  DASH --> LOOP

📦 Technology Stack & Module Layout

Core Stack

Language: TypeScript 5.6+, ES2022, ESM ("type": "module")
CLI Framework: Commander.js + Ink 5 (React 18 for terminal UI)
Testing: Vitest 2.x, ~250 test files
Lint/Format: Biome 1.9 (2-space, double quotes, always semicolons, 100 width)
Build: tsup (bundler), tsx (dev runner)
Desktop: Tauri (Rust shell, macOS/Windows/Linux)
Dashboard: React + Vite + Express (port 3100)

Module Map

Directory	Purpose	Key Files
`src/cli/`	CLI Entry + TUI	index.ts, commands/code.tsx, commands/chat.tsx, ui/App.tsx
`src/tools/`	Tool Definitions	filesystem.ts, shell.ts, subagent.ts, plan.ts, web.ts, memory.ts
`src/loop/`	Agent Loop Engine	dispatch.ts, messages.ts, streaming.ts, thinking.ts, healing.ts
`src/repair/`	Tool-Call Repair	flatten.ts, scavenge.ts, storm.ts, truncation.ts
`src/mcp/`	MCP Protocol	client.ts, stdio.ts, sse.ts, registry.ts, spec.ts
`src/code/`	Edit Engine	edit-blocks.ts, prompt.ts, setup.ts, diff-preview.ts
`src/memory/`	Persistence	session.ts, runtime.ts (3-region model), project.ts, user.ts
`src/core/`	Kernel	events.ts, reducers.ts, eventize.ts, inflight.ts, pause-gate.ts
`src/ports/`	Interfaces	model-client.ts, tool-host.ts, event-sink.ts, memory-store.ts
`src/transcript/`	Logging	log.ts, diff.ts, replay.ts

🏛️ The Four Architectural Pillars

Pillar 1: Cache-First Loop src/loop.ts

DeepSeek bills cached input at ~10% of the miss rate. The loop partitions context into three regions to maximize prefix-cache stability:

flowchart TB
  subgraph "IMMUTABLE PREFIX — Fixed for session"
    SYS["System Prompt\n(codeSystemBase)"]
    TOOLS["Tool Specifications\n(ToolRegistry.specs)"]
    FEW["Few-Shot Examples"]
  end

  subgraph "APPEND-ONLY LOG — Grows monotonically"
    direction TB
    A1["assistant_1"]
    T1["tool_result_1"]
    A2["assistant_2"]
    T2["tool_result_2"]
    A1 --> T1 --> A2 --> T2
  end

  subgraph "VOLATILE SCRATCH — Reset each turn"
    R1["R1 Reasoning\n(reasoning_content)"]
    PLAN["Transient Plan State"]
    THOUGHTS["Working Thoughts"]
  end

  I --> L --> S

Invariants: (1) Prefix computed once per session, hashed, and pinned. (2) Log entries serialized in append order — no rewrites. (3) Scratch distilled via Pillar 2 before folding into log.

Metric: prompt_cache_hit_tokens / (hit + miss) exposed per-turn and aggregated per-session. Visible in TUI top-bar cache cell.

Pillar 2: Tool-Call Repair src/repair/

Four-pass pipeline addressing DeepSeek-specific failure modes:

flowchart LR
  INPUT["Model Response\n(tool_calls + reasoning_content)"]
  
  subgraph "Pass 1: Flatten"
    FLAT["Schema Flatten\n>10 params or depth>2\n→ dot-notation"]
  end
  
  subgraph "Pass 2: Scavenge"
    SCAV["Scavenge\nRegex + JSON parser\nSweeps reasoning_content\nfor forgotten tool calls"]
  end
  
  subgraph "Pass 3: Truncation Repair"
    TRUNC["Truncation Repair\nDetect unbalanced JSON\nClose braces or\nrequest continuation"]
  end
  
  subgraph "Pass 4: Storm"
    STORM["Storm Breaker\nIdentical (tool, args)\nwithin sliding window\n→ suppress + reflect"]
  end

  INPUT --> FLAT --> SCAV --> TRUNC --> STORM
  STORM --> DISPATCH["dispatchToolCallsChunked()"]

Pass	File	Problem Solved
1. Flatten	`flatten.ts`	DeepSeek drops args when schema has >10 leaf params or depth >2 — auto-flatten to dot-notation, re-nest at dispatch
2. Scavenge	`scavenge.ts`	Tool-call JSON emitted inside `<think>`, missing from final message
3. Truncation	`truncation.ts`	Truncated JSON due to max_tokens hit mid-structure
4. Storm	`storm.ts`	Same tool called repeatedly with identical args (call-storm)

Pillar 3: Cost Control v0.6

Tiered Model Presets

Preset	Model	Effort	Relative Cost
`flash`	v4-flash	max	1×
`auto` (default)	v4-flash → v4-pro on hard turns	max	1–3×
`pro`	v4-pro	max	~12×

Cost Mechanisms

Turn-end auto-compaction: Tool results exceeding 3000 tokens shrunk when turn ends
Proactive 40% threshold: Context-ratio above 40% triggers pre-emptive shrink before 80% emergency
Model self-report escalation (<<<NEEDS_PRO>>>): Model decides when task exceeds current tier
Auxiliary calls hard-code flash: Summarization, subagent spawns, truncation repair — all use v4-flash regardless of user preset
Parallel tool dispatch: Read-only tools race concurrently (up to 3) to reduce turn latency

Budget Cap: Soft USD cap — warns at 80%, refuses next turn at 100%. Per-turn cost colored green (<$0.05), yellow ($0.05–0.20), red (≥$0.20).

Pillar 4: MCP Protocol Integration src/mcp/

Model Context Protocol support with stdio + SSE transports. Tools registered via registry at startup; third-party MCP tools default to parallelSafe: false.

Transports: stdio (subprocess), SSE (HTTP streaming), Streamable HTTP
Registry: Marketplace overlay with categorized server list
Lifecycle: Connect → initialize → list_tools → ready. Reconnect on crash
Truncation: Results capped at DEFAULT_MAX_RESULT_TOKENS with save-to-disk option

📝 Prompt Construction Lineage

flowchart TB
  subgraph "Layer 1: Shared Fragments"
    TUI["TUI_FORMATTING_RULES\n• Table formatting\n• Code blocks\n• No ASCII art"]
    NEG["NEGATIVE_CLAIM_RULE\n• Cite or shut up\n• search_content FIRST"]
    ESC["escalationContract(modelId)\n• ESCALATION_CONTRACT\n• <<<NEEDS_PRO>>> marker"]
  end

  subgraph "Layer 2: Identity + Rules"
    ID["Identity Block\n• You are Reasonix Code\n• Don't infer from workspace"]
    CITE["Citation Rules\n• File paths with line ranges\n• Broken paths = red strikethrough"]
    AUDIT["Audit Rails (6 rules)\n• Auto-preview limits\n• Flag→consumer trace\n• No fabricated %"]
  end

  subgraph "Layer 3: Tool Guidance"
    PICK["Tool Selection\n• submit_plan vs ask_choice\n• todo_write"]
    PLAN_MODE["Plan Mode\n• Bounced writes\n• submit_plan required"]
    SUBAGENT["Subagent Delegation\n• Default: don't delegate\n• Skill index with tags"]
  end

  subgraph "Layer 4: Edit Protocol"
    EDIT["SEARCH/REPLACE Format\n• Read before edit enforced\n• Exact whitespace match\n• Edit gate routes review/auto"]
    WHEN["When to Edit\n• Only on explicit change ask\n• Analyze → tools + prose"]
  end

  subgraph "Layer 5: Memory Stack"
    UM["User Memory\n~/.reasonix/memory/"]
    PM["Project Memory\nREASONIX.md"]
    SM["Skill Memos\n@reasonix/core-utils"]
  end

  subgraph "Output"
    FINAL["CODE_SYSTEM_PROMPT\nBuilt per-session by\ncodeSystemBase(modelId)\nwith real model name"]
  end

  TUI --> ID
  NEG --> CITE
  ESC --> PICK
  ID --> PICK
  CITE --> AUDIT
  AUDIT --> PLAN_MODE
  PICK --> SUBAGENT
  PLAN_MODE --> EDIT
  SUBAGENT --> WHEN
  EDIT --> FINAL
  WHEN --> FINAL
  UM --> FINAL
  PM --> FINAL
  SM --> FINAL

Prompt Assembly Code Path

codeSystemBase(modelId)
  └─ CODE_SYSTEM_TEMPLATE (180 lines, src/code/prompt.ts)
       ├─ TUI_FORMATTING_RULES (from prompt-fragments.ts)
       ├─ NEGATIVE_CLAIM_RULE (from prompt-fragments.ts)
       ├─ escalationContract(modelId) (from prompt-fragments.ts)
       ├─ Identity block (lines 13-17)
       ├─ Citation rules (lines 19-21)
       ├─ Audit rails (lines 23-31)
       ├─ Tool selection guidance (lines 34-38)
       ├─ Plan mode constraints (lines 40-42)
       ├─ Subagent delegation policy (lines 44-48)
       ├─ Edit protocol + SEARCH/REPLACE format (lines 50-80+)
       └─ __ESCALATION_CONTRACT__ (replaced at runtime with model-specific contract)

applyMemoryStack() overlays:
  ├─ User memory files (HIGH PRIORITY constraints block)
  ├─ Project memory (REASONIX.md or CLAUDE.md)
  └─ Skill memos (@reasonix/core-utils compaction module)

🔄 Agent Loop — CacheFirstLoop

flowchart TB
  START(["User submits prompt"])
  
  subgraph "Turn Setup"
    RESET["resetStorm()\nFresh intent = clean slate"]
    LOAD["loadSessionMessages()\nRestore from JSONL"]
    HEAL["healLoadedMessages()\nFix tool call pairing"]
    CHECK["Budget check\nRefuse if over cap"]
  end
  
  subgraph "Iteration Loop"
    BUILD["Build messages:\nprefix + log + scratch"]
    STREAM["streamModelResponse()\nSSE from DeepSeek API"]
    THINK["Strip/retain reasoning\nthinking mode handling"]
    REPAIR["ToolCallRepair.process()\nscavenge→truncation→storm"]
  end
  
  subgraph "Dispatch Phase"
    DISPATCH["dispatchToolCallsChunked()\nParallel-safe chunking"]
    RUN["runOne(call)\nToolRegistry.dispatch()"]
    APPEND["appendAndPersist(msg)\nLog + session JSONL"]
  end
  
  subgraph "Post-Turn"
    COMPACT["decideAfterUsage()\nFold if ratio > 75%"]
    BUDGET["updateBudget()\nWarn at 80%, stop at 100%"]
  end

  START --> RESET --> LOAD --> HEAL --> CHECK
  CHECK --> BUILD --> STREAM --> THINK --> REPAIR
  REPAIR -->|has tool calls| DISPATCH
  REPAIR -->|no tool calls| COMPACT
  DISPATCH --> RUN --> APPEND
  APPEND -->|more iter cycles| BUILD
  APPEND -->|turn done| COMPACT
  COMPACT -->|fold| HEAL
  COMPACT -->|carry on| DONE(["Await next user input"])
  COMPACT -->|exit summary| FORCE(["forceSummaryAfterIterLimit()"])

Key Loop States

State	File	Description
`ImmutablePrefix`	`memory/runtime.ts`	System prompt + tool specs + few-shots — pinned for session
`AppendOnlyLog`	`memory/runtime.ts`	Monotonically growing conversation — preserves cache prefix
`VolatileScratch`	`memory/runtime.ts`	R1 reasoning + transient state — reset each turn
`ReadTracker`	`tools/read-tracker.ts`	Files model has read — gates edit_file. Cleared on fold
`SessionStats`	`telemetry/stats.ts`	Per-session token usage + cost + cache hit ratio
`InflightSet`	`core/inflight.ts`	Tracking in-progress tool calls for TUI spinner display

Mid-Turn Steering

Users can inject guidance during a running turn. The wrapper constant preserves context:

MID_TURN_STEER_WRAPPER = "[Mid-turn steer queued by the user. 
Do not treat this as a new task; use it only as additional 
guidance for the current task after completing the current step.]"

⚡ Parallel Tool Dispatch

flowchart TB
  CALLS["Repaired Tool Calls\n(repairedCalls[])"]
  
  subgraph "Chunking Logic (dispatchToolCallsChunked)"
    CHECK{"REASONIX_TOOL_DISPATCH\n=== 'serial'?"}
    SERIAL["Serial Mode\nOne call per chunk"]
    GROUP["Group parallel-safe calls\nup to REASONIX_PARALLEL_MAX (default 3)\nStop at first non-parallel-safe"]
    BARRIER["Serial Barrier\nNon-parallel-safe call\nruns alone"]
  end
  
  subgraph "Execution"
    RACE["Promise.allSettled()\nRace chunk in parallel"]
    ORDER["Yield results\nin DECLARED ORDER\n(not completion order)"]
    APPEND["Append tool messages\nto AppendOnlyLog"]
  end

  CALLS --> CHECK
  CHECK -->|yes| SERIAL
  CHECK -->|no| GROUP
  GROUP --> BARRIER
  SERIAL --> RACE
  BARRIER --> RACE
  RACE --> ORDER --> APPEND
  APPEND -->|more calls| CHECK
  APPEND -->|done| NEXT(["Return to agent loop"])

Parallel-Safe Tools (Built-In)

Tool	Why Safe
`read_file`, `list_directory`, `directory_tree`	Read-only filesystem ops
`search_files`, `search_content`, `get_file_info`	Read-only search ops
`web_search`, `web_fetch`	External, no filesystem side effects
`recall_memory`, `semantic_search`	Read-only memory access
`run_skill`, `spawn_subagent`	Isolated child loops
`job_output`, `list_jobs`	In-memory job queries

Not parallel-safe: edit_file, write_file, run_command — these mutate state and must execute serially for read-after-write ordering.

Configuration: REASONIX_PARALLEL_MAX (default 3, hard cap 16) · REASONIX_TOOL_DISPATCH=serial (escape hatch)

🔧 Tool-Call Repair Pipeline

Pass Order: scavenge → truncation → storm (flatten runs at construction)

flowchart TB
  RAW["Raw Model Response\n• tool_calls[]\n• reasoning_content\n• content"]

  subgraph "Phase 0: Registration Time"
    ANALYZE["analyzeSchema()\nCheck depth > 2\nor leaf params > 10"]
    FLAT_STORE["Store flatSchema\non InternalTool"]
  end

  subgraph "Phase 1: Scavenge"
    COMBINE["Combine channels\nreasoning + content"]
    SCAN["Regex + JSON parse\nFind tool calls in\n<think> + DSML markup"]
    DEDUP["Deduplicate\n(by name+args signature)\nMax 4 scavenged calls"]
  end

  subgraph "Phase 2: Truncation Repair"
    CHECK_JSON["Check each call's\nargument JSON"]
    FIX["Close braces/strings\nFallback: leave original\n(rejects with 'invalid JSON')"]
  end

  subgraph "Phase 3: Storm Breaker"
    WINDOW["Sliding window\n(default 6 calls)"]
    DETECT["Detect identical\n(tool_name, args) tuples"]
    ACT["3+ repeats in window\n→ suppress call\n→ inject reflection"]
  end

  RAW --> ANALYZE
  RAW --> COMBINE --> SCAN --> DEDUP
  DEDUP --> CHECK_JSON --> FIX
  FIX --> WINDOW --> DETECT --> ACT
  ACT --> REPAIRED(["Repaired calls → dispatchToolCallsChunked()"])

Scavenge Deep-Dive src/repair/scavenge.ts

Searches BOTH channels:

reasoning_content: Where R1 models leak JSON tool calls inside <think> blocks
content: Where DSML markup appears in regular text turns

Channels are joined with newline and scanned independently. Dedup prefers first-seen (declared calls take priority over scavenged). Only novel (name, args) signatures are added. Default max: 4 scavenged calls.

Storm Breaker src/repair/storm.ts

Prevents infinite loops where the model calls the same tool with identical args repeatedly. Configurable:

Window size: 6 calls (default)
Threshold: 3 identical calls in window → suppress
Exemptions: stormExempt: true for cheap state-inspection tools
Reset: Mutating calls (writes, edits) clear the window — post-edit verify reads aren't false positives

📐 Context Manager — Auto-Compaction

flowchart TB
  USAGE["After turn response\nUsage.promptTokens"]

  subgraph "Threshold Decision (decideAfterUsage)"
    FST{"> 80%?\nFORCE_SUMMARY_THRESHOLD"}
    AGGR{"> 78%?\nHISTORY_FOLD_AGGRESSIVE"}
    NORM{"> 75%?\nHISTORY_FOLD_THRESHOLD"}
    PRE{"> 90% at turn start?\nTURN_START_FOLD_THRESHOLD"}
  end

  subgraph "Fold Execution"
    PINS["Extract pinned skills\n(collectPinnedSkills)"]
    SUMMARIZE["Summarize head messages\nv4-flash + effort=high\n15s timeout"]
    REWRITE["rewriteSession()\nReplace head with summary\n+ pinned skill bodies\n+ recent tail"]
  end

  FST -->|yes| EXIT(["Exit with summary\nforceSummaryAfterIterLimit()"])
  FST -->|no| AGGR
  AGGR -->|yes| AGG_FOLD(["Aggressive fold\ntailBudget = 10% ctxMax"])
  AGGR -->|no| NORM
  NORM -->|yes| NORM_FOLD(["Normal fold\ntailBudget = 20% ctxMax"])
  NORM -->|no| CARRY(["Carry on"])
  PRE -->|yes| PRE_FOLD(["Pre-iter fold\nbefore turn starts"])

  AGG_FOLD --> PINS
  NORM_FOLD --> PINS
  PRE_FOLD --> PINS
  PINS --> SUMMARIZE --> REWRITE

Compaction Thresholds

Constant	Value	Trigger
`FORCE_SUMMARY_THRESHOLD`	0.80	Exit turn with summary — defense in depth
`HISTORY_FOLD_AGGRESSIVE_THRESHOLD`	0.78	Fold harder — tail budget 10%
`HISTORY_FOLD_THRESHOLD`	0.75	Normal fold — tail budget 20%
`TURN_START_FOLD_THRESHOLD`	0.90	Pre-turn fold — covers session restore, huge paste
`HISTORY_FOLD_MIN_SAVINGS_FRACTION`	0.30	Skip fold if head wouldn't shrink by ≥30%

Skill Pin Preservation: Active skill memos wrapped in <skill-pin name="...">...</skill-pin> are lifted from the head before summarization and re-appended verbatim after. This ensures skill procedures survive context folds.

🔌 MCP Protocol Architecture

flowchart TB
  subgraph "Transport Layer"
    STDIO["StdioTransport\nSubprocess spawn\nJSON-RPC over stdin/stdout"]
    SSE["SseTransport\nHTTP GET /sse event stream\nPOST for client→server"]
    STREAM_HTTP["StreamableHttpTransport\nStreaming HTTP with\nupgrade path"]
  end

  subgraph "Client"
    CONNECT["McpClient.connect()\nHandshake + initialize"]
    TOOLS["listTools()\n→ ToolSpec[] registered\nin ToolRegistry"]
    CALL["callTool(name, args)\n→ result string\ncapped at maxResultTokens"]
  end

  subgraph "Registry"
    CATALOG["catalog.ts\nMCP server directory"]
    MARKETPLACE["marketplace-overlay/\nCurated server list\nwith i18n (zh-CN.json)"]
    SPEC["spec.ts\nMCP schema validation"]
  end

  STDIO --> CONNECT
  SSE --> CONNECT
  STREAM_HTTP --> CONNECT
  CONNECT --> TOOLS --> CALL
  CATALOG --> MARKETPLACE
  SPEC --> CONNECT

📈 Design Evolution

Version	Milestone
v0.0.x	Pillar 1 end-to-end, repair pipeline complete, Ink TUI scaffold
v0.1	τ-bench numbers published, streaming polish, transcript replay
v0.3	MCP client (stdio + SSE), session persistence
v0.4.x	`reasonix code` with SEARCH/REPLACE edits, review/auto gate, background jobs, hooks
v0.5.x	V4 model support, skills, memory, subagents, actionable error messages
v0.6	Cost control (flash-first, auto-compaction, /pro one-shot, self-report escalation, cost badges). Shared prompt fragments. UI refactor (App.tsx split into hooks, slash.ts split into 13 modules)
v0.31 (current)	Branch + harvest removed. Leaner surface, fewer slash commands

Explicit Non-Goals

Multi-agent orchestration (subagents are cost-reduction, not coordination)
RAG / vector retrieval
Non-DeepSeek backends (OpenAI shim possible via --model but untested)
Web UI / SaaS
Silent cost escalation (every pro call is user-visible)

DeepSeek Reasonix Architecture Analysis · Generated by Hermes Agent · github.com/esengine/deepseek-reasonix