The Oracle Corpus

The Oracle corpus is the knowledge base behind Rancher Oracle. It indexes 48,000+ real issue resolutions from 13 open-source repositories across the Rancher and Kubernetes ecosystem.

Indexed repositories

Repository	Domain
`rancher/rancher`	Rancher Manager
`k3s-io/k3s`	K3s lightweight Kubernetes
`rancher/rke2`	RKE2 Kubernetes
`harvester/harvester`	Harvester HCI
`longhorn/longhorn`	Longhorn distributed storage
`neuvector/neuvector`	NeuVector container security
`rancher/fleet`	Fleet GitOps
`rancher/system-upgrade-controller`	Automated upgrades
`rancher/local-path-provisioner`	Local path storage
`rancher/webhook`	Rancher admission webhooks
`rancher/charts`	Helm charts
`rancher/kontainer-driver-metadata`	Kubernetes version metadata
`rancher/norman`	Rancher API framework

What’s indexed

The corpus includes:

Issue resolutions — closed issues with confirmed fixes
Pull requests — merged PRs with linked issues
Discussions — community Q&A with accepted answers
Release notes — version-specific changes and known issues

How indexing works

Ingestion

Issues, PRs, discussions, and release notes are collected from each repository via the GitHub API.

Chunking

Documents are split into semantically meaningful chunks — preserving context around error messages, stack traces, and configuration snippets.

Embedding

Each chunk is embedded using a transformer model that captures the semantic meaning of Kubernetes errors, configuration patterns, and troubleshooting procedures.

Storage

Embeddings are stored in a vector database optimized for high-recall semantic search.

Retrieval

On query, semantic search retrieves the top-k matching chunks, which are passed as context to the LLM for grounded response generation.

Why embeddings matter

Traditional keyword search fails for Kubernetes troubleshooting. An error like failed to create pod sandbox has dozens of root causes — CNI misconfiguration, disk pressure, container runtime issues, and more. Semantic embeddings capture the meaning behind error messages and stack traces, not just the keywords. This means Oracle can match a user’s error against resolutions that describe the same underlying problem in different terms.

Corpus updates

The corpus is updated regularly as new issues are resolved across the indexed repositories. New resolutions, PRs, and discussions are ingested, chunked, embedded, and added to the vector database on a recurring schedule.

Enterprise customers can request custom corpus additions — internal runbooks, private repositories, or domain-specific documentation indexed alongside the public corpus.

Documentation Index

​Indexed repositories

​What’s indexed

​How indexing works

​Why embeddings matter

​Corpus updates

Indexed repositories

What’s indexed

How indexing works

Why embeddings matter

Corpus updates