Memory, Context & Deep Config

#089 Claude Code

Memory tool is just-in-time retrieval — not an upfront context dump at startup

The API memory tool is designed for on-demand retrieval: store what's learned, pull back only what's relevant for the current task. Agents that load all memory at boot are using it wrong.

"This is the key primitive for just-in-time context retrieval: rather than loading all relevant information upfront, agents store what they learn in memory and pull it back on demand."

↗ Source

#090 Claude Code

Bootstrap memory before work begins — ad hoc accumulation creates recovery failures

Run a dedicated initializer session that creates structured memory: progress log, feature checklist, startup script reference. Sessions without this spend turns re-discovering already-known facts.

"Initializer session: The first session sets up the memory artifacts before any substantive work begins. This includes a progress log, a feature checklist, and a reference to any startup or initialization script."

↗ Source

#091 Claude Code

Memory tool + compaction = theoretically unbounded agent sessions

When context fills, Claude writes key state to memory files. After compaction, it retrieves from memory on demand. This enables agents that run indefinitely across sessions without losing task state.

"compaction keeps the active context manageable without client-side bookkeeping, and memory persists important information across compaction boundaries so that nothing critical is lost."

↗ Source

#092 Claude Code

Run /context to see what's consuming your context window before optimizing

/context shows a breakdown by source: conversation, files, CLAUDE.md, MCP server tool definitions, skills. MCP servers can silently consume thousands of tokens in tool schema definitions.

"Run /context to see what's using space. MCP servers add tool definitions to every request, so a few servers can consume significant context before you start working. Run /mcp to check per-server costs."

↗ Source

#093 Claude Code

Memory tool is client-side — Anthropic stores nothing; ZDR means zero post-response retention

The memory tool stores data in your infrastructure, not on Anthropic's servers. With Zero Data Retention arrangements, data isn't retained after the API response returns.

"The memory tool operates client-side: you control where and how the data is stored through your own infrastructure. This feature is eligible for Zero Data Retention (ZDR)."

↗ Source

#094 Claude Code

resumeSessionAt: messageId forks a session from any historical checkpoint

The SDK supports forking from any message ID in a session's history — not just the end. Try alternative approaches from a known-good point without re-running all prior work.

"Resume from a checkpoint in the conversation: resumeSessionAt: messageId // message.message.id from SDKAssistantMessage"

↗ Source

#095 Codex

stream_idle_timeout_ms and retry counts are independently configurable per provider

In [model_providers.openai], set stream_idle_timeout_ms (default 300,000ms = 5 min) independently from stream_max_retries. Complex code generation regularly hits stream idle timeouts.

"[model_providers.openai] request_max_retries = 4 stream_max_retries = 10 stream_idle_timeout_ms = 300000"

↗ Source

#096 Codex

Set model_context_window explicitly for proxies and custom deployments

With custom providers or LLM proxies, Codex may auto-detect the wrong context window and truncate unnecessarily. Set model_context_window = 128000 in the provider config block to override.

"model_context_window = 128000 # Context window size"

↗ Source

#097 Codex

model_reasoning_summary = "none" suppresses thinking output from reasoning models

Set this to suppress thinking summaries or "low" to shorten them. Pair with model_verbosity = "low" for shorter output. Only applies to Responses API providers — Chat Completions providers ignore it.

"model_reasoning_summary = 'none' # Disable summaries model_verbosity = 'low' # Shorten responses"

↗ Source

#098 Both

LLM proxy routing gives teams centralized cost tracking, logging, and budget enforcement

Claude Code: set ANTHROPIC_VERTEX_BASE_URL. Codex: configure a [model_providers.proxy] block. A central proxy gives the whole team unified logging, per-project cost attribution, and budget alerts.

Codex: "[model_providers.proxy] name = 'OpenAI using LLM proxy' base_url = 'http://proxy.example.com' env_key = 'OPENAI_API_KEY'"

↗ Source

#099 Both

Session transcripts are local JSONL files — parse them for cost analysis and custom tooling

Claude Code stores sessions at ~/.local/share/claude/sessions/. Codex stores runs at ~/.codex/sessions/. Both are machine-readable JSONL. Parse externally to build cost dashboards or custom memory tools.

"Check ~/.codex/log/codex-tui.log (or the most recent session-*.jsonl file if you enabled session logging) after a session if you need to audit which instruction files Codex loaded."

↗ Source

#100 Both

Both platforms expose programmatic analytics APIs for org-wide usage reporting

Claude Code has the Claude Code Analytics API (in Anthropic admin docs) for programmatic org-level session usage. Codex exposes usage through the OpenAI admin API. Build cost allocation dashboards programmatically.

"Admin API overview · Data residency · Workspaces · Usage and Cost API · Claude Code Analytics API · Zero Data Retention"

↗ Source