Reduces token consumption by compressing web content through local LLMs before feeding it to Claude Code.
cargo install ai-summary
How It Works
Pipeline:
Web Search → Fetch Pages → Readability Extract → LLM Summary → Compressed output (60–98% smaller)
Instead of sending raw 50K+ page content to Claude, ai-summary returns a focused 1–4K summary.
Features
- Search + Summarize — Gemini (Google Search grounding), DuckDuckGo, or Brave
- Fetch + Summarize — Fetch any URL, extract article content, summarize with LLM
- Stdin Summarize — Pipe any text through for compression
- Fast Compress — No-LLM text extraction for instant compression
- JS-heavy Pages — agent-browser and Cloudflare Browser Rendering
- Pipe-friendly —
cat urls.txt | ai-summary fetch,--jsonoutput, standard exit codes - Claude Code Hooks — PostToolUse hooks auto-compress WebFetch, WebSearch, test output
- Rich Statistics — Time-period breakdown, ROI tracking
- Multiple LLM Backends — opencode (free), oMLX (local), OpenAI, Groq, DeepSeek
Quick Start
ai-summary "what is the latest Rust version"
ai-summary fetch https://example.com/article -p "key points"
echo "large text" | ai-summary compress -m 4000
ai-summary stats
Subcommands
ai-summary <query>— Search + summarizeai-summary fetch <urls> -p ...— Fetch + summarizeai-summary sum <prompt>— Summarize stdin via LLMai-summary compress -m <chars>— Fast compression (no LLM)ai-summary crawl <url>— Crawl via CF Browser Renderingai-summary stats— Token savings statsai-summary config— Show/create config
Flags: --deep, --raw, --json, --browser, --cf, --api-url, --api-key, --model
Claude Code Integration
Hooks
postwebfetch.sh— Compress WebFetch responses (skip if <10% savings)postwebsearch.sh— Compress WebSearch resultspostbash.sh— Summarize test output with structured totals
Benchmarks (v1.2.1)
Real-world evaluation run on 2026-03-14. All tests use default config with free LLM backends.
| Scenario | Input | Output | Compression | Time |
|---|---|---|---|---|
| Search (Gemini CLI) | ~12K tokens (raw pages) | ~700 tokens (summary) | 94% | 28s |
| Fetch (docs.rs/tokio) | ~4.1K tokens | ~445 tokens | 89% | 10s |
| Fetch (Wikipedia) | ~1K tokens | ~445 tokens | 57% | 88s |
| Compress 5K chars | 5,085 chars | 2,001 chars | 61% | 0.02s |
| Compress 20K chars | 20,585 chars | 2,001 chars | 90% | 0.02s |
| Deep Search (5 pages) | ~15K tokens | ~670 tokens | 96% | 31s |
Cumulative Stats (41 queries)
| Metric | Value |
|---|---|
| Total tokens saved | 67,929 |
| Overall compression | 77.6% |
| Estimated Claude cost saved | $0.20 (at $3/M input tokens) |
| LLM cost | $0.02 (mostly free backends) |
| ROI | 10x return |
By Mode
| Mode | Queries | Tokens Saved | Avg Compression |
|---|---|---|---|
| gemini-cli (search) | 14 | 39,385 | 88% |
| fetch (single URL) | 13 | 10,576 | 71% |
| compress (no LLM) | 5 | 10,039 | 90% |
| gemini API | 1 | 3,265 | 93% |
| stdin pipe | 5 | 1,672 | 71% |
| hook-bash (auto) | 2 | 867 | 30% |
Compress mode uses no LLM — pure text extraction at near-instant speed. Search/fetch use free LLMs (opencode, Gemini CLI).
Installation
From crates.io
cargo install ai-summary
From source
git clone https://github.com/sunoj/ai-summary
cd ai-summary
cargo install --path .
Releases: GitHub Releases