LLM Knowledge Bases

article Andrej Karpathy Apr 6, 2026

Expanded description of the LLM wiki pattern. Data ingest from articles, papers, repos into a raw directory, then LLM “compiles” a wiki of interlinked markdown files. Obsidian as the IDE frontend. Q&A against the wiki at scale (~100 articles, ~400K words) without RAG — LLM auto-maintains index files and summaries. Outputs rendered as markdown, slides (Marp), or matplotlib images, filed back into the wiki. LLM “health checks” for inconsistencies, missing data, new article candidates. Additional tooling: naive search engine over the wiki, usable by LLM via CLI. Future direction: synthetic data generation and finetuning to embed knowledge into weights. Builds on his earlier gist on the same pattern.