Agentic Recipes & Power Patterns

#101 Claude Code

'think' < 'think hard' < 'think harder' < 'ultrathink' — mapped to real thinking budgets

These specific phrases directly map to increasing levels of thinking budget in Claude Code. ultrathink is not a metaphor — it allocates progressively more compute for planning. Use it for architecture and hard debugging.

"We recommend using the word think to trigger extended thinking mode… 'think' < 'think hard' < 'think harder' < 'ultrathink'. Each level allocates progressively more thinking budget."

↗ Source

#102 Both

Ask Claude/Codex to 'interview you' before planning — turns fuzzy ideas into concrete specs

If you have a rough idea but aren't sure how to describe it well, ask the agent to question you first and challenge your assumptions. The interview output becomes the spec. Much better than a bad prompt.

"Ask Codex to interview you: If you have a rough idea of what you want but aren't sure how to describe it well, ask Codex to question you first. Tell it to challenge your assumptions and turn the fuzzy idea into something concrete before writing code."

↗ Source

#103 Claude Code

Include four things in every complex prompt: context, task, constraints, definition of done

A good default structure: which files/folders matter, what to do, what not to do, and what 'done' looks like. This four-part structure consistently outperforms prose descriptions.

"A good default is to include four things in your prompt: Context (which files, folders, docs matter), Task (what to do), Constraints (boundaries), and Verification (what 'good' looks like). This provides the clearest path to a useful result."

↗ Source

#104 Both

Tell the agent what 'done' looks like — it can only verify against what you define

Without a definition of done in the prompt or AGENTS.md/CLAUDE.md, the agent guesses. It will stop at the wrong point. Define done as: tests pass, no new deps added, no unrelated files modified.

"Don't stop at asking Codex to make a change. Ask it to create tests when needed, run the relevant checks, confirm the result, and review the work before you accept it. Codex can do this loop for you, but only if it knows what 'good' looks like."

↗ Source

#105 Both

Use Plan mode (Codex) / 'don't code yet' pattern (Claude) before implementation

Plan mode lets the agent gather context, ask clarifying questions, and build a stronger plan before implementation. Explicitly asking the agent not to code yet prevents premature solutions and improves architectural quality.

"Use Plan mode: For most users, this is the easiest and most effective option. Plan mode lets Codex gather context, ask clarifying questions, and build a stronger plan before implementation."

↗ Source

#106 Codex

Batch all file reads together — never read files one-by-one unless logically unavoidable

Before any tool call, decide all files you will need and read them in one parallel batch. Sequential reads when parallel is possible are a common performance anti-pattern in agentic workflows.

"Batch everything. If you need multiple files (even from different places), read them together. Only make sequential calls if you truly cannot know the next file without seeing a result first."

↗ Source

#107 Claude Code

Challenge Claude to prove its solution works before accepting it

After a fix, say 'prove to me this works' and have Claude diff between main and your branch. Or 'grill me on these changes and don't make a PR until I pass your test.' Forces real verification, not assumed success.

"challenge Claude — 'grill me on these changes and don't make a PR until I pass your test.' or 'prove to me this works' and have Claude diff between main and your branch."

↗ Source

#108 Claude Code

When Claude goes off-track, use Esc Esc or /rewind — don't try to fix in same context

Pressing Esc twice or using /rewind undoes the last turn. Trying to correct a wrong direction in the same context often makes it worse. Rewind to the last good point and re-approach.

"use Esc Esc or /rewind to undo when Claude goes off-track instead of trying to fix it in the same context."

↗ Source

#109 Claude Code

At 70% context usage Claude loses precision; at 85% hallucinations increase; 90%+ is erratic

Empirical thresholds from practitioner research: 0-50% = work freely; 50-70% = start thinking about /compact; 70-90% = run /compact; 90%+ = /clear is mandatory. These thresholds apply across model sizes.

"At 70% context, Claude starts losing precision. At 85%, hallucinations increase. At 90%+, responses become erratic. Strategy: 0-50% (work freely). 50-70% (attention). 70-90% (/compact). 90%+ (/clear mandatory)."

↗ Source

#110 Claude Code

Run /compact manually at 50% to avoid the agent's 'dumb zone'

Don't wait for auto-compaction — proactively compact at 50% to maintain quality. Vanilla Claude Code is better than any workflow with smaller tasks when context is properly managed.

"avoid agent dumb zone, do manual /compact at max 50%. Use /clear to reset context mid-session if switching to a new task."

↗ Source

#111 Both

Start with a minimal spec and expand — don't front-load all context

Ask the agent to explore your codebase, then build the spec iteratively. Front-loading everything consumes context budget before work starts. Context is infrastructure — manage it like a resource.

"Context is infrastructure. Claude Code automatically pulls context from your environment. Rather than treating Claude as a chatbot, the core insight is: Claude Code works best when treated like a junior engineer with tools, memory, and iteration."

↗ Source

#112 Codex

Codex ran 25 hours uninterrupted on a long-horizon task using 13M tokens — context compaction is the key unlock

Codex ran a 25-hour continuous session generating 30k lines of code by using context compaction throughout. Long-horizon reliability is not just about model capability — it's about systematic context management.

"Codex ran for about 25 hours uninterrupted, used about 13M tokens, and generated about 30k lines of code. This performed well on the parts that matter for long-horizon work: following the spec, staying on task, running verification, and repairing failures."

↗ Source

#113 Claude Code

Writer/Reviewer pattern: one Claude writes code, a fresh Claude reviews it

A fresh context improves code review because Claude won't be biased toward code it just wrote. Use one session for implementation, then open a second session with only the diff as context for review.

"A fresh context improves code review since Claude won't be biased toward code it just wrote. For example, use a Writer/Reviewer pattern. You can do something similar with tests: have one Claude write tests, then another write code to pass them."

↗ Source

#114 Claude Code

TDD with agents: have one Claude write failing tests, another write code to pass them

This is a powerful quality pattern. The test-writer has no knowledge of the implementation it's about to test, so it writes genuinely independent tests. The implementer's only goal is to pass them.

"have one Claude write tests, then another write code to pass them. Loop through tasks calling claude -p for each. Use --allowedTools to scope permissions for batch operations."

↗ Source

#115 Claude Code

Large migrations: have Claude list all files, then loop parallel invocations per file

For large-scale migrations, have Claude list all files needing changes, then spawn a parallel Claude invocation per file. Faster and more reliable than one long sequential session losing context.

"Have Claude list all files that need migrating. for file in $(cat files.txt); do claude -p 'Migrate $file from React to Vue. Return OK or FAIL.' --allowedTools 'Edit,Bash(git commit *)' done"

↗ Source

#116 Claude Code

Use -p flag for headless Claude invocations in shell scripts and loops

The -p flag runs Claude in non-interactive mode, taking the prompt as an argument and printing results to stdout. This is the foundation of all scripted and batch agentic workflows.

"Loop through tasks calling claude -p for each. Use --allowedTools to scope permissions for batch operations."

↗ Source

#117 Claude Code

Use say 'use subagents' to throw more compute at a problem without changing context

Adding the phrase 'use subagents' in your prompt offloads tasks to parallel workers and keeps your main context clean and focused. Subagents run in parallel — one instruction triggers multiple parallel work streams.

"say 'use subagents' to throw more compute at a problem — offload tasks to keep your main context clean and focused."

↗ Source

#118 Claude Code

Define security-reviewer, performance-analyzer, and doc-writer as persistent subagents

Create specialized subagents in .claude/agents/ for recurring review types. Delegate with a single instruction: 'use a subagent to review this code for security issues.' Each runs in isolated context.

"Define specialized assistants in .claude/agents/ that Claude can delegate to. --- name: security-reviewer description: Reviews code for security vulnerabilities tools: Read, Grep, Glob, Bash model: opus ---"

↗ Source

#119 Both

Multi-instance agents with tmux + git worktrees: the pattern for parallel development

Open multiple Claude Code sessions, each in its own git worktree. Each works on a separate feature independently. Tmux keeps all sessions visible. Merge each worktree when done. The foundation of AI-native team workflows.

"agent teams with tmux and git worktrees for parallel development… use test time compute — separate context windows make results better; one agent can cause bugs and another (same model) can find them."

↗ Source

#120 Codex

Codex parallel task execution is the 'killer feature' — assign 5 tasks, review all 5 when done

Assign 5 different tasks, each runs in its own isolated container, and you review all 5 when they're done. The entire UI, worktree management, and review queue in the Codex app is built for this delegation model.

"Parallel task execution. This is Codex's killer feature. Assign 5 different tasks, each runs in its own isolated container, and you review all 5 when they're done."

↗ Source

#121 Claude Code

Turn any repeated inner-loop workflow into a slash command

If you do something more than once a day, make it a /command in .claude/commands/. Commands are markdown files checked into git and available to the entire team. Build /techdebt, /pr-ready, /explain-diff.

"use slash commands for every 'inner loop' workflow you do many times a day — saves repeated prompting, commands live in .claude/commands/ and are checked into git."

↗ Source

#122 Both

Build a Gotchas section in every skill file — highest-signal content across sessions

Add a ## Gotchas section to every SKILL.md documenting the model's failure points you've discovered. This is the highest-signal content in a skill — it pushes Claude/Codex out of default failure modes specific to your codebase.

"build a Gotchas section in every skill — highest-signal content, add Claude's failure points over time."

↗ Source

#123 Both

Write skill descriptions as trigger conditions, not summaries — 'when should I fire?'

The skill description field is read by the model to decide when to auto-invoke. Write it from the model's perspective: 'Use this skill when the user asks to analyze database query performance.' Not a summary of what the skill does.

"skill description field is a trigger, not a summary — write it for the model ('when should I fire?') don't state the obvious in skills — focus on what pushes Claude out of its default behavior."

↗ Source

#124 Both

Give goals and constraints in skills, not step-by-step instructions

Don't railroad the agent in skills — prescriptive step-by-step instructions reduce quality. Provide the goal, the constraints, and the definition of done. Let the agent figure out the steps.

"don't railroad Claude in skills — give goals and constraints, not prescriptive step-by-step instructions."

↗ Source

#125 Both

Embed !`command` in SKILL.md to inject live shell output into the prompt

Claude/Codex runs the backtick command on skill invocation and injects the result into the prompt. Use it to inject current git status, database schema, API health check, or any dynamic context the skill needs.

"embed !`command` in SKILL.md to inject dynamic shell output into the prompt — Claude runs it on invocation and the model only sees the result."

↗ Source

#126 Codex

Ask Codex to plan before coding on any complex or ambiguous task

For complex or hard-to-describe tasks, ask Codex to plan before it starts coding. Toggle with /plan or Shift+Tab. You can comment on the plan inline before implementation begins.

"If the task is complex, ambiguous, or hard to describe well, ask Codex to plan before it starts coding."

↗ Source

#127 Codex

Use speech dictation inside the Codex app to provide context faster

For complex tasks with lots of context to provide, speech dictation inside the Codex app is faster than typing. The model receives the same quality of context with dramatically less friction.

"To provide context faster, try using speech dictation inside the Codex app to dictate what you want Codex to do rather than typing it."

↗ Source

#128 Both

Scope skills to one job — start with 2–3 concrete use cases, expand later

Don't try to cover every edge case in a skill up front. Start with one representative task, get it working well, then add edge cases. Monolithic skills that try to do everything degrade quality for each specific task.

"Keep each skill scoped to one job. Start with 2 to 3 concrete use cases, define clear inputs and outputs… Don't try to cover every edge case up front."

↗ Source

#129 Claude Code

/rename important sessions and /resume them by name later

Label each instance when running multiple Claude sessions simultaneously. Named sessions are resumable and identifiable across terminal sessions. Good for long-running projects spanning multiple days.

"/rename important sessions (e.g. [TODO - refactor task]) and /resume them later — label each instance when running multiple Claudes simultaneously."

↗ Source

#130 Claude Code

Use /model to switch to Opus for planning, Sonnet for implementation in the same session

Use /model to select the right model per phase: Opus for plan mode reasoning on complex architecture decisions, Sonnet for bulk code generation. Switch mid-session without losing context.

"use /model to select model and reasoning… use Opus for plan mode and Sonnet for code to get the best of both."

↗ Source

#131 Both

Paste the bug, say 'fix' — don't micromanage how

Claude Code fixes most bugs on its own. Paste the error or symptom, tell the agent to fix it, and let it trace the issue. Micromanaging the implementation approach often produces worse results than open delegation.

"Claude fixes most bugs by itself — paste the bug, say 'fix', don't micromanage how."

↗ Source

#132 Claude Code

Keep codebases clean and finish migrations — partially migrated frameworks confuse agents

Partially migrated frameworks (mixing old and new patterns) cause agents to pick the wrong pattern for new code. A consistent codebase produces dramatically more consistent agent output.

"keep codebases clean and finish migrations — partially migrated frameworks confuse models that might pick the wrong pattern."

↗ Source

#133 Both

Use browser automation MCPs (Claude in Chrome, Playwright) to let agents inspect console logs

Give the agent access to a browser tool so it can observe the actual runtime behavior of your frontend — console errors, network requests, visual state. Agents with perceptual feedback produce dramatically better UI fixes.

"use browser automation MCPs (Claude in Chrome, Playwright, Chrome DevTools) for Claude to inspect console logs."

↗ Source

#134 Both

Create a closed feedback loop: agent writes code, runs tests, checks output, iterates

Ask the agent to write code AND run the relevant tests AND confirm the result AND review before you accept it. This closed feedback loop, where the agent evaluates its own output, catches most errors before you see them.

"This creates a closed feedback loop where the agent evaluates its own output."

↗ Source

#135 Claude Code

After a mediocre fix, say 'knowing everything you know now, scrap this and implement the elegant solution'

When Claude produces a working but ugly fix, this prompt triggers it to use everything it learned during the failed attempt to produce a cleaner solution in a fresh try. Dramatically better than iterating on a bad first attempt.

"after a mediocre fix — 'knowing everything you know now, scrap this and implement the elegant solution'."

↗ Source

#136 Both

Add tools only when they unlock a real manual loop — don't wire in everything at once

Start with one or two tools that clearly remove a manual workflow you already do often. Every tool you add consumes context budget in every session. The wrong tools actively hurt quality.

"Add tools only when they unlock a real workflow. Do not start by wiring in every tool you use. Start with one or two tools that clearly remove a manual loop you already do often."

↗ Source

#137 Both

MCP turns the agent into a tool-orchestrating agent rather than a single-model assistant

The Model Context Protocol connects Claude/Codex to your actual infrastructure: databases, monitoring tools (Sentry), issue trackers, deployment systems. This is the foundation of AI-native developer environments.

"The Model Context Protocol (MCP) turns Claude into a tool-orchestrating agent rather than a single-model assistant. Query monitoring tools (e.g. Sentry). This is a foundational step toward AI-native developer environments."

↗ Source

#138 Both

Run /plugin to browse the marketplace for pre-built skills, tools, and integrations

The plugin marketplace provides skills, tools, and integrations without manual configuration. Browse with /plugin and install with $skill-installer. Includes Linear, GitHub, Sentry, and more.

"Run /plugin to browse the marketplace. Plugins add skills, tools, and integrations without configuration."

↗ Source

#139 Codex

Use --output-schema with codex exec to get structured JSON outputs for downstream scripts

Pass a JSON Schema file to --output-schema and Codex will constrain its final response to match that schema. Enables stable downstream scripting — no regex parsing of prose output.

"Use --output-schema to request a final response that conforms to a JSON Schema. This is useful for automated workflows that need stable fields (for example, job summaries, risk reports, or release metadata)."

↗ Source

#140 Codex

Use codex exec --json to stream JSONL events for CI pipeline integration

With --json, stdout becomes a JSONL stream of every event Codex emits: thread.started, turn.started, item.*, turn.completed. Parse with jq. No terminal output scraping needed in CI.

"When you enable --json, stdout becomes a JSON Lines (JSONL) stream so you can capture every event Codex emits while it's running. Event types include thread.started, turn.started, turn.completed, item.*, and error."

↗ Source

#141 Both

24 CVEs have been found in the Claude Code ecosystem — audit MCP servers before trusting them

MCP servers can read/write your codebase and make network calls. A malicious skill could exfiltrate code or introduce backdoors. Treat skill installation like installing software: audit sources, check for unusual network calls.

"24 CVEs identified in Claude Code ecosystem. 655 malicious skills in supply chain. MCP servers can read/write your codebase. Strategy: Systematic audit (5-min checklist)."

↗ Source

#142 Claude Code

Skills that fetch from external URLs are high-risk — fetched content may contain injected instructions

Even trustworthy skills can be compromised if their external dependencies change. A skill fetching from a CDN or API can receive new instructions at any time. Audit all external fetches in skill files.

"Skills that fetch data from external URLs pose particular risk, as fetched content may contain malicious instructions. Even trustworthy Skills can be compromised if their external dependencies change over time."

↗ Source

#143 Both

Unexpected changes during a session: STOP IMMEDIATELY and ask the user how to proceed

If the agent notices unexpected changes it didn't make — a file was modified by another process, a git state changed — it should stop and verify. Continuing may corrupt the working state. Build this rule into your AGENTS.md.

"While you are working, you might notice unexpected changes that you didn't make. If this happens, STOP IMMEDIATELY and ask the user how they would like to proceed."

↗ Source

#144 Both

Never use git reset --hard or git checkout -- unless specifically requested

These destructive commands should be prohibited in agent AGENTS.md/CLAUDE.md. An agent running a migration that accidentally triggers a hard reset can destroy hours of work with no undo.

"NEVER use destructive commands like git reset --hard or git checkout -- unless specifically requested or approved by the user."

↗ Source

#145 Both

Use the right model for the task — gpt-5.4-mini and Claude Haiku for subagents and light tasks

For most subagent work, a lighter model is sufficient. Use gpt-5.4-mini for Codex subagents and focused tasks. Use claude-haiku for Claude Code subagents. Reserve frontier models for reasoning-heavy planning.

"Use gpt-5.4-mini when you want a faster, lower-cost option for lighter coding tasks or subagents."

↗ Source

#146 Codex

GPT-5.3-Codex-Spark is available for near-instant real-time coding iteration (Pro subscribers)

The Spark model is optimized for near-instant coding iteration — effectively real-time feedback on code changes. Available in research preview for ChatGPT Pro subscribers. Switch with /model.

"The gpt-5.3-codex-spark model is available in research preview for ChatGPT Pro subscribers and is optimized for near-instant, real-time coding iteration."

↗ Source

#147 Both

Start with tight permissions, loosen only when need is clear

If you're new to coding agents, start with the default permissions. Keep approval and sandboxing tight by default. Loosen only for trusted repos or specific workflows once you understand the exact requirement.

"If you're new to coding agents, start with the default permissions. Keep approval and sandboxing tight by default, then loosen permissions only for trusted repos or specific workflows once the need is clear."

↗ Source

#148 Claude Code

Use /usage to check plan limits and /extra-usage to configure overflow billing

Monitor your consumption in real time with /usage. If you're hitting plan limits during critical work, /extra-usage lets you configure overflow billing without stopping mid-task.

"use /usage to check plan limits, /extra-usage to configure overflow billing, /config to configure settings."

↗ Source

#149 Both

Long-horizon work requires: following spec, staying on task, running verification, repairing failures

These are the four measurable dimensions of long-horizon task quality identified from practitioner tests. Verification and failure repair are the two that most agents fail on. Build explicit verification steps into your specs.

"This performed well on the parts that matter for long-horizon work: following the spec, staying on task, running verification, and repairing failures as it went."

↗ Source

#150 Codex

Use Extra High reasoning for long-horizon tasks — quality scales with reasoning effort

For complex multi-hour tasks, set reasoning effort to Extra High. Higher reasoning effort means the model spends more tokens exploring alternatives before committing to an approach. Quality scales non-linearly with effort.

"I gave Codex a blank repo, full access, and one job: build a design tool from scratch. Then I let it run with GPT-5.3-Codex at 'Extra High' reasoning."

↗ Source

#151 Both

Spec-driven development: research → spec → plan → implement → review — every session

All major agentic workflows converge on this pattern. The spec phase is the highest-leverage point: a well-written spec reduces total token cost and produces better output than a longer implementation session.

"All major workflows converge on the same architectural pattern: Research → Plan → Execute → Review → Ship."

↗ Source

#152 Claude Code

Agent Teams: automated coordination of multiple sessions with a team lead and shared tasks

Beyond parallelizing work, Agent Teams enable quality-focused workflows where different agents with fresh context review each other's work. The team lead delegates and coordinates without losing track of overall progress.

"Agent teams: Automated coordination of multiple sessions with shared tasks, messaging, and a team lead. Beyond parallelizing work, multiple sessions enable quality-focused workflows."

↗ Source

#153 Claude Code

Use git worktrees to run multiple parallel Claude sessions on the same repo without conflicts

Git worktrees let each parallel Claude session work on its own branch in its own directory. No conflicts, no stale index, no need to stash. The cleanest mechanism for multi-agent parallel development.

"Git worktrees make this practical by enabling parallel, isolated agent sessions on the same repo."

↗ Source

#154 Claude Code

context: fork in skill frontmatter runs the skill in an isolated subagent

The main context only sees the final result, not intermediate tool calls. Use this for read-heavy exploration skills or any skill that generates verbose intermediate output that shouldn't pollute the main session.

"use context: fork to run a skill in an isolated subagent — main context only sees the final result, not intermediate tool calls. The agent field lets you set the subagent type."

↗ Source

#155 Both

Skills are folders, not files — use references/, scripts/, examples/ subdirectories

A skill directory can contain supporting scripts, example inputs/outputs, reference schemas, and any other files the skill needs. This folder structure enables progressive disclosure at scale.

"skills are folders, not files — use references/, scripts/, examples/ subdirectories for progressive disclosure."

↗ Source

#156 Both

Include scripts and libraries in skills so Claude composes rather than reconstructs boilerplate

Bundle the actual scripts the skill needs alongside the SKILL.md. When the skill runs, Claude composes using the existing script rather than writing the same boilerplate from scratch every time.

"include scripts and libraries in skills so Claude composes rather than reconstructs boilerplate."

↗ Source

#157 Codex

Once a workflow is repeatable, stop using long prompts — encode it as a skill

Once a workflow becomes repeatable, stop relying on long prompts or repeated back-and-forth. A skill packages the instructions, context, and supporting logic into a durable, reusable artifact.

"Once a workflow becomes repeatable, stop relying on long prompts or repeated back-and-forth. Use a Skill to package the instructions in a SKILL.md file, context, and supporting logic Codex should apply consistently."

↗ Source

#158 Both

Configure Codex/Claude for your real environment early — many quality issues are actually setup issues

Wrong working directory, missing write access, wrong model defaults, or missing tools cause quality problems that look like model failures. Configure the environment correctly before blaming the model.

"Configure Codex for your real environment early. Many quality issues are really setup issues, like the wrong working directory, missing write access, wrong model defaults, or missing tools and connectors."

↗ Source

#159 Claude Code

Use /insights commands and verify patterns through tests — agents generate 1.75x more logic errors

Claude Code can generate 1.75× more logic errors than human-written code (ACM 2025 research). Every output must be verified. Use tests as the ground truth, not visual inspection or agent assertions.

"Claude Code can generate 1.75x more logic errors than human-written code (ACM 2025). Every output must be verified. Use /insights commands and verify patterns through tests."

↗ Source

#160 Codex

Codex built OpenAI DevDay 2025 — from keynote demos to arcade machines to SDK polishing

OpenAI used Codex to build everything for DevDay 2025: programmatic camera/stage-light control for the keynote, arcade machines in the community hall, and polishing the Guardrails SDKs for Python and TypeScript. Production proof of agentic reliability.

"OpenAI used Codex to build everything for DevDay 2025 — from Romain Huet's keynote demo… to the arcade machines… Even the Guardrails SDKs for Python and TypeScript were polished using Codex."

↗ Source

#161 Codex

Codex scheduled automations: daily issue triage, weekly test coverage scans, Friday release notes

Codex can run recurring tasks on a schedule without you prompting it. This is the feature that separates it from interactive-only coding agents. Teams drowning in operational toil can offload entire categories of work.

"Scheduled automations. Codex can run recurring tasks on a schedule without you prompting it. Daily issue triage. Weekly test coverage scans. Friday release notes."

↗ Source

#162 Both

Metaprompting: ask the agent at the end of a bad turn how to improve its own instructions

If a turn didn't perform up to expectations, ask the model directly how to improve the instructions that produced the bad output. This self-meta-prompting technique is documented in the Codex prompting guide for fixing overthinking and loggy behavior.

"It's possible to ask the model at the end of a turn that didn't perform up to expectations how to improve its own instructions. The following prompt was used to produce some of the solutions to overthinking problems."

↗ Source

#163 Both

Use thinking mode with Explanatory output style to see Claude's decision reasoning

Always enable thinking mode (true) and set Output Style to Explanatory to see detailed output with Insight boxes. This lets you understand why Claude made a decision, not just what it produced.

"always use thinking mode true (to see reasoning) and Output Style Explanatory (to see detailed output with ★ Insight boxes) in /config for better understanding of Claude's decisions."

↗ Source

#164 Both

Common failure modes in agents: overthinking, loggy updates, awkward preamble tics

Documented failure patterns: taking too long before the first useful action; unnatural status updates instead of pair-programmer collaboration; repetitive tics like 'Good catch', 'Aha', 'Got it'. All are fixable via metaprompting.

"Common failure modes… Overthinking / long time before first useful action. Loggy / unnatural status updates instead of pair programmer collaboration. Awkward preamble phrasing and repetitive tics ('Good catch', 'Aha', 'Got it–')."

↗ Source

#165 Both

Handoff to a fresh agent rather than one long session for quality-critical reviews

A fresh context improves quality for any evaluative task. The reviewing agent hasn't been anchored by the implementation process. This is the key insight behind Writer/Reviewer, Implementer/Tester, and Builder/Validator patterns.

"use test time compute — separate context windows make results better; one agent can cause bugs and another (same model) can find them."

↗ Source

#166 Codex

Use Plan mode (/plan) toggle to review plans inline in the Codex app with diff panel

Toggle the diff panel in the Codex app to directly review changes locally. Plan mode integrates the plan review with the diff view — you can see both what the agent plans to do and what it has already done.

"Toggle the diff panel in the Codex app to directly review changes locally."

↗ Source

#167 Claude Code

Start with the minimal working example pattern: spec → minimal working impl → expand

Ask Claude to build the smallest possible working version first, then expand incrementally. This produces better architecture than specifying the full system up front because it forces decisions to be made in the right order.

"Start with a minimal spec or prompt and ask Claude to interview you using AskUserQuestion tool, then make a new session to execute the spec."

↗ Source

#168 Both

Multi-instance research pattern: break into parallelizable sub-goals, run child processes, aggregate

For deep research tasks, decompose objectives into parallelizable sub-goals and run child agent processes via codex exec or Claude subagents. Aggregate results into polished reports. The pattern scales to arbitrary research depth.

"Multi-instance (multi-agent) orchestration workflow for deep research tasks. Breaks down research objectives into parallelizable sub-goals, runs child processes via codex exec, and aggregates results into polished reports."

↗ Source

#169 Codex

Spec-kit / KIRO workflow: constitution-based spec-driven development from idea to tasks

Trigger with 'kiro' or references to .kiro/specs/. Creates EARS-format requirements, design documents, and implementation task lists from a single idea. Keeps the full decision trail in version control.

"Interactive feature development workflow from idea to implementation. Creates requirements (EARS format), design documents, and implementation task lists. Triggered by: 'kiro' or references to .kiro/specs/ directory."

↗ Source

#170 Both

Command → Agent → Skill: the three-layer orchestration architecture for complex workflows

A slash command invokes an agent; the agent delegates to one or more skills; skills provide focused, progressive-disclosure context. This three-layer pattern scales from simple one-off tasks to complex multi-day feature development.

"Two skill patterns: agent skills (preloaded via skills: field) vs skills (invoked via Skill tool). See orchestration-workflow for implementation details of Command → Agent → Skill pattern."

↗ Source

#171 Both

Verify your session started with the right instruction files using audit logs

Check ~/.codex/log/codex-tui.log or Claude Code session JSONL to audit which instruction files were loaded. Stale or wrong AGENTS.md/CLAUDE.md loaded at session start is the root cause of many mysterious quality issues.

"Check ~/.codex/log/codex-tui.log (or the most recent session-*.jsonl file if you enabled session logging) after a session if you need to audit which instruction files Codex loaded."

↗ Source

#172 Both

Real companies using coding agents: Duolingo pilots for dev workflows, Virgin Atlantic for data analysis

Codex is in production at Cisco (engineering acceleration), Virgin Atlantic (data analysis and customer engagement), and Duolingo (development workflows). These are not experiments — agents are running in real product environments.

"Cisco — exploring Codex for engineering teams to accelerate feature development. Virgin Atlantic — deployed AI agents internally for data analysis and customer engagement. Duolingo — piloting Codex for development workflows."

↗ Source

#173 Both

Parallel agents: Fountain 50% faster, CRED 2x speed with multi-agent coordination

Validated production metrics from real companies using multi-agent Claude Code coordination. Parallelism doesn't just save time — it enables tasks (like autonomous C compilation) that would be impossible in a single context window.

"Production metrics from real companies: Fountain: 50% faster, CRED: 2x speed. 5 validated workflows (multi-layer review, parallel debugging, large-scale refactoring)."

↗ Source

#174 Codex

Use the Slack integration to delegate tasks to Codex directly from Slack channels

Tag @Codex or @Claude with bug reports or feature requests in Slack channels for team workflows. The agent picks up the task from Slack, works in its cloud environment, and reports results back to the channel.

"Delegate tasks to Claude Code directly from Slack channels. Tag @Claude with bug reports or feature requests for team workflows."

↗ Source

#175 Both

AI-native team pattern: every PR has an AI reviewer, every sprint has AI-generated tickets

The AI-native development pattern: AI reviews every PR before humans see it, AI triages every incoming issue, AI generates Jira tickets from product specs. Humans focus on decisions and architecture — not mechanical work.

"Build AI Teams: Every major coding platform is converging on the same pattern — AI teammates that handle the mechanical work so engineers focus on the hard decisions."

↗ Source

#176 Both

Use /review command in Codex CLI for targeted pre-commit code review

The /review command in Codex CLI opens review presets: review against base branch, review uncommitted changes, or review a specific commit. Each run appears as its own transcript turn — compare feedback across iterations.

"Use /review in the CLI to open Codex's review presets. The CLI launches a dedicated reviewer that reads the diff you select and reports prioritized, actionable findings without touching your working tree."

↗ Source

#177 Claude Code

Feature-specific subagents with skills outperform generic QA or backend engineer agents

Don't create generic subagents. Create feature-specific subagents with relevant skills already loaded (progressive disclosure) and focused system prompts. Specificity dramatically improves quality for each task type.

"have feature specific sub-agents (extra context) with skills (progressive disclosure) instead of general qa, backend engineer."

↗ Source

#178 Both

The 'Beads' pattern: chain agents sequentially, each reads only the previous agent's output

Each agent in the chain receives only the previous agent's structured summary — not the full accumulated context. This prevents context rot in long chains and keeps each agent focused on its specific contribution.

"Decision framework: Teams vs Multi-Instance vs Dual-Instance vs Beads."

↗ Source

#179 Both

Use @ mentions in the prompt to precisely scope agent attention to specific files

Both tools support @file syntax to include specific files as context inline in the prompt. This is more precise than letting the agent discover files — it ensures attention is focused on the right files from the start.

"Type @ in the composer to open a fuzzy file search over the workspace root; press Tab or Enter to drop the highlighted path into your message."

↗ Source

#180 Codex

Press Enter mid-run to inject instructions into the current turn; Tab to queue for next

In the Codex TUI, pressing Enter while Codex is running injects new instructions into the current turn. Tab queues a follow-up prompt for the next turn. This enables real-time course correction without interrupting the agent.

"Press Enter while Codex is running to inject new instructions into the current turn, or press Tab to queue a follow-up prompt for the next turn."

↗ Source

#181 Claude Code

Use the VS Code plan review panel to comment on plans before implementation begins

In the VS Code extension, the plan preview panel updates live as Claude refines its plan. You can comment on the plan while Claude is still working on it. Commenting before implementation is dramatically cheaper than fixing after.

"VS Code plan preview: auto-updates as Claude iterates, enables commenting only when the plan is ready for review, and keeps the preview open when rejecting so Claude can revise."

↗ Source

#182 Both

Structured agent handoffs: each role writes to its own scoped folder, not shared files

In multi-agent workflows, each role (designer, backend, frontend, tester) writes its deliverables to its own folder (/design/, /backend/, etc.). No shared file conflicts, clear ownership, easy to audit which agent produced what.

"Deliverables (write to /tests): TEST_PLAN.md – bullet list… Each role writes scoped artifacts in its own folder before handing control back to the project manager."

↗ Source

#183 Both

The 'dual-instance' pattern: run two agents simultaneously, one builds, one continuously tests

Run two Claude/Codex sessions in parallel: one builds the feature incrementally, one continuously runs the test suite and reports failures back. The builder fixes immediately rather than accumulating test debt.

"Dual-Instance… use test time compute — separate context windows make results better; one agent can cause bugs and another (same model) can find them."

↗ Source

#184 Claude Code

Use the built-in /claude-api skill when building Anthropic SDK integrations

Claude Code ships a built-in /claude-api skill that provides specialized context for building with the Claude API and Anthropic SDK. Invoke it when building API integrations to get accurate, up-to-date patterns.

"Added the /claude-api skill for building applications with the Claude API and Anthropic SDK."

↗ Source

#185 Codex

Use --add-dir to coordinate changes across multiple repos in a single session

Launch with codex --cd apps/frontend --add-dir ../backend --add-dir ../shared to work across multiple project directories simultaneously. Codex can make coordinated changes across all of them.

"Expose more writable roots with --add-dir (for example, codex --cd apps/frontend --add-dir ../backend --add-dir ../shared) when you need to coordinate changes across more than one project."

↗ Source

#186 Both

Permanent worktrees for long-lived agent environments — not auto-deleted, fully reusable

Create a permanent worktree from the Codex app sidebar three-dot menu. Unlike temp worktrees, permanent worktrees aren't automatically deleted and support multiple threads. Ideal for long-running agent projects.

"If you want a long-lived environment, create a permanent worktree from the three-dot menu on a project in the sidebar. Permanent worktrees are not automatically deleted, and you can start multiple threads from the same worktree."

↗ Source

#187 Both

Handoff between local and worktree mid-session using the Handoff feature

The Codex app lets you move a thread between Local and Worktree mid-session using the Handoff feature. Start exploring locally, then move to a worktree for isolated implementation when scope is clear.

"You can also start threads on a worktree manually, and use Handoff to move a thread between Local and Worktree."

↗ Source

#188 Claude Code

Use /stats to see usage patterns, token counts, and session streaks

The /stats command opens a usage dashboard: favorite models, token consumption patterns, session counts, and streaks. Press Ctrl+S to copy for sharing. Use it to understand your actual cost distribution.

"Stats dashboard: Run /stats to see your usage patterns, favorite models, token counts, and streaks. Press Ctrl+S to copy for sharing."

↗ Source

#189 Both

Agents inherit your shell — your tool choices directly shape model behavior

Claude Code and Codex inherit your shell environment. If you have ripgrep, fd, or custom scripts in your PATH, the agent can use them. The better your shell tooling, the better your agent's ability to explore and act.

"Claude Code inherits your shell, which means your tooling choices directly shape model behavior."

↗ Source

#190 Codex

Use /permissions inside an interactive session to switch sandbox mode on-the-fly

Don't restart Codex to change permission levels. Use /permissions inside an active session to switch between auto, read-only, and full access modes as your comfort level changes mid-task.

"Use /permissions inside an interactive session to switch modes as your comfort level changes."

↗ Source

#191 Both

The 'Fountain' production metric: 50% faster development with multi-agent coordination

Fountain achieved 50% faster development using multi-agent Claude Code coordination in production. The gains come from context multiplication and parallel execution, not model capability improvements.

"Production metrics from real companies (autonomous C compiler, 500K hours saved) 5 validated workflows… Fountain: 50% faster."

↗ Source

#192 Both

Verify MCP server permissions regularly — servers can read/write your entire codebase

Review what each connected MCP server can actually access. Servers with broad filesystem permissions can read proprietary code, API keys in config files, or database credentials. Audit MCP permissions as rigorously as npm packages.

"MCP servers can read/write your codebase. Strategy: Systematic audit (5-min checklist). Community-vetted MCP Safe List. Vetting workflow documented."

↗ Source

#193 Claude Code

Use /mcp in VS Code to manage MCP servers without switching to terminal

In the VS Code extension, /mcp opens a native MCP server management dialog — enable/disable servers, reconnect, and manage OAuth authentication directly in the chat panel without editing config files.

"Added native MCP server management dialog — use /mcp in the chat panel to enable/disable servers, reconnect, and manage OAuth authentication without switching to the terminal."

↗ Source

#194 Both

The skill description is a routing signal for the model — write it for automatic invocation

Write skill descriptions as precise trigger conditions: 'Use when the user asks to analyze SQL query performance in PostgreSQL.' The model reads descriptions to decide when to auto-invoke. Vague descriptions cause wrong-skill invocations.

"Because implicit matching depends on description, write descriptions with clear scope and boundaries."

↗ Source

#195 Both

Track context usage live with status bars or monitoring hooks — don't guess when to compact

Use PreCompact hooks that display context percentage in a status line. Or use /context regularly. Guessing when to compact wastes quality (compact too late) or tokens (compact too early).

"[!] 25.0% free (50.0K/200K) -> .claude/backups/3-backup-26th-Jan-2026-5-45pm.md … the statusline shows the backup path."

↗ Source

#196 Both

Use reasoning with 'high' effort for architecture decisions — medium or low for routine code gen

Different tasks warrant different reasoning effort levels. Architecture decisions and complex debugging benefit from high reasoning. Formatting, simple refactors, and test generation work fine with medium or low effort at lower cost.

"Choose a reasoning level based on how hard the task is and test what works best for your workflow. Different users and tasks work best with different settings."

↗ Source

#197 Claude Code

Use the VS Code spark icon to list all Claude Code sessions and open them as full editors

A spark icon in the VS Code activity bar lists all Claude Code sessions, with sessions opening as full editors. Full markdown document view for plans, with support for adding inline comments to provide feedback on the plan.

"Added spark icon in VS Code activity bar that lists all Claude Code sessions, with sessions opening as full editors. Added full markdown document view for plans."

↗ Source

#198 Both

Treat coding agents as junior engineers with tools and memory — not magic code generators

The fundamental mental model shift: agents work best when given the right context, clear goals, and iteration cycles. Not when given a magic prompt and expected to produce perfect output in one shot.

"Rather than treating Claude as a chatbot, the core insight is this: Claude Code works best when treated like a junior engineer with tools, memory, and iteration — not a magic code generator."

↗ Source

#199 Both

The developers who will thrive: those who orchestrate AI, not just code alongside it

90% of traditional programming skills are becoming commoditized. The remaining 10% — architecture decisions, system design, knowing when to delegate and when to intervene — becomes worth 1000x more. Agents change what skills matter.

"The developers and teams who understand this shift — who learn to orchestrate AI rather than just code alongside it — will thrive in this new landscape."

↗ Source

#200 Both

Codex shipped a userpromptsubmit hook — prompts can be blocked or augmented before history

A recent Codex changelog addition: the userpromptsubmit hook fires before prompts enter history, enabling blocking or augmentation before execution. Claude Code has had this as UserPromptSubmit — now Codex has the equivalent.

"Added a userpromptsubmit hook so prompts can be blocked or augmented before execution and before they enter history."

↗ Source