ai-summary

Web search & summarization CLI for AI coding agents

Reduces token consumption by compressing web content through local LLMs before feeding it to Claude Code.

cargo install ai-summary

How It Works

Pipeline:

Web Search → Fetch Pages → Readability Extract → LLM Summary → Compressed output (60–98% smaller)

Instead of sending raw 50K+ page content to Claude, ai-summary returns a focused 1–4K summary.

Features

Quick Start

ai-summary "what is the latest Rust version"
ai-summary fetch https://example.com/article -p "key points"
echo "large text" | ai-summary compress -m 4000
ai-summary stats

Subcommands

Flags: --deep, --raw, --json, --browser, --cf, --api-url, --api-key, --model

Claude Code Integration

Hooks

Benchmarks (v1.2.1)

Real-world evaluation run on 2026-03-14. All tests use default config with free LLM backends.

Scenario Input Output Compression Time
Search (Gemini CLI) ~12K tokens (raw pages) ~700 tokens (summary) 94% 28s
Fetch (docs.rs/tokio) ~4.1K tokens ~445 tokens 89% 10s
Fetch (Wikipedia) ~1K tokens ~445 tokens 57% 88s
Compress 5K chars 5,085 chars 2,001 chars 61% 0.02s
Compress 20K chars 20,585 chars 2,001 chars 90% 0.02s
Deep Search (5 pages) ~15K tokens ~670 tokens 96% 31s

Cumulative Stats (41 queries)

Metric Value
Total tokens saved67,929
Overall compression77.6%
Estimated Claude cost saved$0.20 (at $3/M input tokens)
LLM cost$0.02 (mostly free backends)
ROI10x return

By Mode

Mode Queries Tokens Saved Avg Compression
gemini-cli (search)1439,38588%
fetch (single URL)1310,57671%
compress (no LLM)510,03990%
gemini API13,26593%
stdin pipe51,67271%
hook-bash (auto)286730%

Compress mode uses no LLM — pure text extraction at near-instant speed. Search/fetch use free LLMs (opencode, Gemini CLI).

Installation

From crates.io

cargo install ai-summary

From source

git clone https://github.com/sunoj/ai-summary
cd ai-summary
cargo install --path .

Releases: GitHub Releases