5 min read

I Built a CLAUDE.md Linter in One Session. Here's What I Found in 773 Sessions of Context Files.

Every AI coding tool reads .md files for context. I built a Rust linter to grade them. The finding: most of what we write in CLAUDE.md never changes Claude's behavior. Here's the data.

#Claude Code#AI Tools#Rust#Developer Tools#CLAUDE.md

The Morning: An Idea That Wouldn't Leave

My day started with a Twitter engagement routine — replying to threads about AI agents, Claude Code, and vibe coding. Then I had an idea.

Every AI coding tool — Claude Code, Cursor, Copilot, Codex, Windsurf, Devin — reads plain Markdown files for context. CLAUDE.md, AGENTS.md, .cursorrules. 60,000+ repos already have these files. And they're all just... unstructured prose. No validation. No types. No feedback.

What if I built a linter for these files?

Building the Linter (~6 Hours)

I had a Rust compiler pipeline already built for another project — lexer, parser, checker, emitter. In about 6 hours, I added a document module: 13 Rust files, ~3,700 lines of code. A complete pipeline:

  • Parser — 7 content types (rules, paragraphs, code blocks, tables, references, notes, lists)
  • Checker — 12 validation rules (D001-D012)
  • 4 output formats — JSON (for AI), HTML (for humans), Markdown (for compatibility), and TOON (Token-Oriented Object Notation)
  • Converterlatdoc convert CLAUDE.md turns any markdown file into the structured format
  • Rule scorer — 7 heuristic risk factors per rule
  • Visualizer — HTML dashboard showing per-rule health scores

Everything runs in under 1 millisecond. Zero external dependencies.

The Council Said: "Validate First"

Before shipping anything, I ran an LLM Council — 5 AI advisors with different perspectives, plus peer review and a chairman synthesis.

The council was unanimous: "Your core hypothesis is unvalidated. No proof that structured context helps AI agents. Run a benchmark before any public launch."

So I did.

The Benchmark That Changed Everything

I collected 10 real CLAUDE.md files from GitHub — SwiftFormat, FL Chart, Grafana, Instructure UI, Niivue, Mapbox MCP Server, DrawnUI, and more. I tested whether AI follows rules better from structured JSON vs plain Markdown.

The result: Markdown 100%, JSON 100%, TOON 100%. All identical compliance.

The format doesn't matter. Claude follows rules from a messy paragraph just as well as from perfectly structured JSON. The council was right — the format was a solution looking for a problem.

What TOON Does Save

TOON (Token-Oriented Object Notation) — a format designed for LLM consumption — does save tokens:

  • 5-11% fewer tokens than Markdown
  • 13-25% fewer tokens than JSON
  • Same compliance. Same rules. Just cheaper per request.

Not a game-changer for a 500-token file. But at scale, it adds up.

The Real Discovery: Nobody Has Structured Rules

The linter extracts rules from any CLAUDE.md file — finding ### Always, ### Never, ### Prefer sections and scoring each rule.

I ran it against 30 real files — 10 from GitHub, 20 from my own Claude Code setup (project files, skills, agents, memory, rules, plugins).

100% of files — all 30 — have zero formally structured rules.

Every rule in every file is:

  • Buried in prose paragraphs
  • Hidden in bold text within long sections
  • Listed as bullets without severity
  • Scattered across tables and comments

And the most damning finding? My compliance section — a critical requirement for a project I'm building — was completely empty. Just a heading with nothing under it. Nobody noticed for weeks.

773 Sessions, Zero Feedback

I analyzed my Claude Code usage data (subagent sessions excluded):

MetricValue
Total sessions773
Conversation data1.7 GB
Context loaded per session~23,000 tokens

Most of what I wrote in CLAUDE.md never changed Claude's behavior in any measurable way. And across 773 sessions, not once did the system report which rules it actually followed.

The missing piece isn't better formatting — it's a feedback loop.

The Linter Found Things I Didn't Expect

Running the linter against my own files revealed:

  • Rules that contradicted each other
  • Instructions so vague they could mean anything
  • An entire section Claude provably ignored
  • Code blocks without language hints (the AI can't syntax-check what it can't identify)

The self-audit was the most useful output the linter produced.

What I'm Building Now

Three layers, in order:

Layer 1 (Active now): At the end of every Claude Code session, the AI reports which context it used, where it lacked guidance, and what contradicted. Costs 50 tokens. Already running.

Layer 2 (Next): Rule provenance — each rule gets a scope, owner, and expiry date.

Layer 3 (Later): Behavioral diffing — test each rule against real tasks to see if it actually changes AI behavior.

The Linter Is Open Source

I extracted the linter into a standalone tool — zero dependencies, MIT licensed:

github.com/xlreon/latdoc-linter

bash
# Install
cargo install --path .

# Lint your CLAUDE.md
latdoc lint CLAUDE.md

# See rule health dashboard
latdoc visualize CLAUDE.md

# Analyze token efficiency
latdoc analyze CLAUDE.md

# Convert to structured format
latdoc convert CLAUDE.md

If you have a CLAUDE.md with more than 20 rules, run it through. Tell me what breaks.

The Takeaway

The question isn't "how do we format these files better?" It's "how do we know which parts actually matter?"

Format is irrelevant — Claude reads prose just fine. What's missing is the feedback loop between what you write and what the AI does. That's the gap I'm closing.


Built with Claude Code (Opus 4.6). Benchmarked against 10 real CLAUDE.md files. The linter, scorer, and visualizer are real, working tools.

Follow the journey: @satapathy9 on X

Sidharth Satapathy

Sidharth Satapathy

AI Engineer & Builder. 8+ years shipping at scale. Building AI-native tools with Claude Code, MCP servers, and agentic workflows.

Related Posts