H
Harvest
AI Summarized Content

LLM-Wiki: A Pattern for Building Personal Knowledge Bases Using LLMs

Andrej Karpathy proposes a new approach to personal knowledge management: instead of relying on RAG (Retrieval-Augmented Generation) that rediscovers knowledge from scratch every time, have an LLM incrementally build and maintain a persistent, interlinked wiki of markdown files. The human curates sources and asks questions; the LLM handles all the tedious bookkeeping, cross-referencing, and maintenance that usually causes people to abandon their wikis.


1. The Core Idea: From RAG to a Persistent Wiki

Most people's experience with LLMs and documents follows the RAG pattern: you upload files, the LLM retrieves relevant chunks at query time, and generates an answer. Tools like NotebookLM, ChatGPT file uploads, and most RAG systems work this way. It functions, but there's a fundamental problem — there's no accumulation. The LLM is rediscovering knowledge from scratch on every question. Ask something subtle that requires synthesizing five documents, and the LLM has to find and piece together fragments every single time. Nothing is built up.

The idea proposed here is fundamentally different. Instead of just retrieving from raw documents, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting contradictions, and strengthening or challenging the evolving synthesis.

"The wiki is a persistent, compounding artifact." The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

The crucial point is that you never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. In practice, Karpathy describes having the LLM agent open on one side and Obsidian open on the other, watching the wiki evolve in real time:

"Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase."


2. Use Cases: Where This Pattern Shines

This pattern can apply to a surprisingly wide range of contexts 🌍:

  • Personal: Tracking goals, health, psychology, and self-improvement — filing journal entries, articles, and podcast notes to build a structured picture of yourself over time.
  • Research: Going deep on a topic over weeks or months — reading papers, articles, and reports while incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: Filing each chapter as you go, building pages for characters, themes, and plot threads. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages built by volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing.
  • Business/team: An internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, and customer calls — possibly with humans reviewing updates.
  • And more: Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

3. Architecture: Three Layers

The system is built on three clean layers:

📁 Raw Sources — Your curated collection of source documents: articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

📝 The Wiki — A directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, overviews, and syntheses. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

⚙️ The Schema — A configuration document (e.g., CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what conventions to follow, and what workflows to use when ingesting sources, answering questions, or maintaining the wiki. This is what transforms the LLM from a generic chatbot into a disciplined wiki maintainer. You and the LLM co-evolve this document over time as you figure out what works for your domain.


4. Operations: Ingest, Query, and Lint

Ingest 📥

You drop a new source into the raw collection and tell the LLM to process it. The typical flow: the LLM reads the source, discusses key takeaways with you, writes a summary page, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10–15 wiki pages.

Karpathy personally prefers ingesting sources one at a time while staying involved — reading summaries, checking updates, and guiding the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. The workflow is yours to develop and document in the schema.

Query 🔍

You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms — a markdown page, a comparison table, a slide deck, a chart, or a canvas. The key insight here is powerful:

"Good answers can be filed back into the wiki as new pages." A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint 🔧

Periodically, you ask the LLM to health-check the wiki. It looks for contradictions between pages, stale claims superseded by newer sources, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, and data gaps that could be filled with a web search. The LLM is also good at suggesting new questions to investigate and new sources to look for, keeping the wiki healthy as it grows.


5. Indexing and Logging

Two special files help both the LLM and you navigate the wiki as it grows:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count, organized by category (entities, concepts, sources, etc.). When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This approach works surprisingly well at moderate scale (~100 sources, hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix like ## [2026-04-02] ingest | Article Title, the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log provides a timeline of the wiki's evolution and helps the LLM understand what's been done recently.


6. Optional CLI Tools and Tips

CLI Tools 🛠️

As the wiki grows, you may want small tools to help the LLM operate more efficiently. A search engine over wiki pages is the most obvious need. At small scale the index file suffices, but at larger scale you want proper search. qmd is recommended — it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It offers both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you code a naive search script as the need arises.

Tips and Tricks 💡

  • Obsidian Web Clipper — A browser extension that converts web articles to markdown. Great for quickly getting sources into your raw collection.
  • Download images locally — In Obsidian's settings, set the attachment folder path to a fixed directory (e.g., raw/assets/), then bind a hotkey for "Download attachments for current file." This lets the LLM view and reference images directly instead of relying on URLs that may break. Note: LLMs can't natively read markdown with inline images in one pass — the workaround is having the LLM read the text first, then view referenced images separately.
  • Obsidian's graph view — The best way to see the shape of your wiki: what's connected, which pages are hubs, and which are orphans.
  • Marp — A markdown-based slide deck format with an Obsidian plugin, useful for generating presentations directly from wiki content.
  • Dataview — An Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • Git — The wiki is just a git repo of markdown files, so you get version history, branching, and collaboration for free.

7. Why This Works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value.

"LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero."

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

Karpathy connects this idea to Vannevar Bush's Memex (1945) — a vision of a personal, curated knowledge store with associative trails between documents. Bush's vision was actually closer to this pattern than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part Bush couldn't solve was who does the maintenance. The LLM handles that. 🧩


8. A Note on Flexibility

The document is intentionally abstract. It describes the pattern, not a specific implementation. The exact directory structure, schema conventions, page formats, and tooling will all depend on your domain, preferences, and LLM of choice. Everything mentioned is optional and modular — pick what's useful, ignore what isn't.

"The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest."

Your sources might be text-only (no image handling needed). Your wiki might be small enough that the index file alone suffices. You might not care about slide decks and just want markdown pages. The pattern is a starting point — your LLM collaborator can help you build the specifics that fit your workflow. 🚀

Summary completed: 5/12/2026, 7:47:13 PM

Need a summary like this?

Get instant summaries with Harvest

5-second summaries
AI-powered analysis
📱
All devices
Web, iOS, Chrome
🔍
Smart search
Rediscover anytime
Start Summarizing
Try Harvest