OptimizationAdvanced

Workflow Optimization

Parallel Agents & Token Reduction

Workflow Optimization

Running Agents in Parallel

Sequential execution is safe but slow. For independent tasks, parallel execution dramatically reduces total latency.

Pattern: Map-Reduce

  1. Map: Spawn $N$ agents, each handling a subset of the work (e.g., processing 10 files each).
  2. Wait: Aggregator waits for all promises to resolve.
  3. Reduce: Combine the results into a final report.
const results = await Promise.all(
  files.map(file => runAgent(file))
);

Reducing Token Consumption

Large contexts = high cost + slower inference.

1. Context Pruning

Don't dump the entire codebase into the prompt.

  • Use grep or cscope to find relevant lines.
  • Use SKILL.md (summaries) instead of full implementation files during the planning phase.

2. Incremental Summarization

If a conversation gets too long:

  1. Ask the model to summarize the progress.
  2. Flush the history.
  3. Inject the summary as the new "system prompt" context.

3. Specific Output Formats

Ask for JSON or concise bullet points. Verbose prose wastes output tokens.

Caching

Cache tool outputs. If an agent asks for git status twice in 5 seconds, serves the cached result to save time and API calls.