Workflow Optimization
Running Agents in Parallel
Sequential execution is safe but slow. For independent tasks, parallel execution dramatically reduces total latency.
Pattern: Map-Reduce
- Map: Spawn $N$ agents, each handling a subset of the work (e.g., processing 10 files each).
- Wait: Aggregator waits for all promises to resolve.
- Reduce: Combine the results into a final report.
const results = await Promise.all(
files.map(file => runAgent(file))
);
Reducing Token Consumption
Large contexts = high cost + slower inference.
1. Context Pruning
Don't dump the entire codebase into the prompt.
- Use
greporcscopeto find relevant lines. - Use
SKILL.md(summaries) instead of full implementation files during the planning phase.
2. Incremental Summarization
If a conversation gets too long:
- Ask the model to summarize the progress.
- Flush the history.
- Inject the summary as the new "system prompt" context.
3. Specific Output Formats
Ask for JSON or concise bullet points. Verbose prose wastes output tokens.
Caching
Cache tool outputs. If an agent asks for git status twice in 5 seconds, serves the cached result to save time and API calls.