Skip to content

The Polyrepo Graph

A per-repo index can tell you what exists inside one codebase. It answers:

  • what functions are defined in this file
  • what symbols does this module export
  • where is this class used within this repo

It is weaker at answering the questions that matter most in a multi-repo platform:

  • which frontend caller reaches this backend route
  • which repos depend on the same shared contract version
  • which event is produced in one service and consumed in two others
  • which downstream services are likely affected by a change to this type

The cross-repo blast radius problem is not a search problem. It is a graph traversal problem. The answer requires not just finding the relevant code in each repo, but knowing how those pieces connect to each other across repo boundaries.

The polyrepo code graph exists to make those cross-repo relationships first-class, queryable edges rather than implicit knowledge held in senior engineers’ heads.

The graph uses a typed node model. Every node has a kind, a deterministic ID, a repo attribution (or a marker indicating it is workspace-level), and source span information where applicable.

These represent static code structure as extracted from source files:

KindWhat it represents
FileA source file in a configured repo
FunctionA function or method definition
ClassA class declaration
TypeA named type alias or interface
ModuleA module-level namespace or barrel
ImportAn import declaration
DecoratorA decorator application on a class, method, or parameter

These represent framework-level and runtime concepts extracted by framework-aware rules:

KindWhat it represents
EntityA schema or model construct (e.g., a database entity or Mongoose document)
RouteAn HTTP route, modeled as a virtual node shared between handler and callers
TopicA message topic (e.g., Kafka topic), shared across producers and consumers
QueueA message queue
SubjectA pub-sub subject (e.g., NATS subject)
StreamA message stream
EventA named domain event
ServiceA service-level node derived from framework registration

These are the bridge points that connect separate repositories in the graph:

KindWhat it represents
SharedSymbolA symbol exported by a shared package, keyed by package name, version, and symbol name
PayloadContractAn inferred payload shape for a topic or queue, per side (producer or consumer)

These support git-aware intelligence and ownership signals:

KindWhat it represents
RepoA configured repository in the workspace
ConventionA detected coding or architectural convention
CommitA git commit
PRA pull request
ReviewA review on a PR
CommentA comment on a PR or review
AuthorA code author derived from git history
TicketA linked issue or ticket reference

Edges are also typed. The graph stores edge kind, source node, target node, resolver name, and confidence on every edge.

KindWhat it connects
DefinesFile or class defines a function, type, or member
CallsFunction calls another function
ImportsModule imports another module or symbol
ExportsModule exports a symbol
ExtendsClass extends a base class
ImplementsClass implements an interface
ReferencesNode references another node without a direct call or import
DependsOnManifest-level package dependency
UsesDecoratorA class or method applies a decorator
KindWhat it connects
PublishesA function or class publishes to a topic, queue, or event
ConsumesA function or class consumes from a topic, queue, or event
TriggersAn event triggers a handler
ServesA handler serves a route
PersistsToA handler or service persists to a storage entity
UsesSharedA node uses a shared symbol from a cross-repo package
KindWhat it connects
BreaksIfChangedA change to this node would likely break the target
CoChangesWithTwo nodes are frequently modified together (git-derived)
OwnedByA file or node is attributed to an author
CrossRepoDependsA cross-repo dependency edge beyond the shared-symbol mechanism
PropagatesEventAn event propagates from one node to a downstream consumer
DriftsFromA payload contract field drifts from its counterpart on the other side
ContractOnA payload contract is attached to a specific topic or queue

Node IDs are not random UUIDs. They are computed deterministically from the node’s semantic identity: its repo, file path, kind, and canonical name.

This matters for three reasons:

  1. Idempotent re-indexing. Re-running gather-step index on an unchanged workspace produces an identical graph. The same node identity is found in the same table slot. No orphan nodes accumulate from repeated runs.

  2. Stable cross-repo attachment. A virtual node for a route or topic has the same ID every time it is computed, regardless of which repo triggers its creation. When repo A defines a handler and repo B defines a caller, both attach to the same route node because both compute the same stitch key.

  3. Compact downstream analysis. Analysis functions can cache results keyed by node ID without worrying that a re-index has changed which entity a given ID refers to.

Cross-repo relationships are normalized through virtual nodes. A virtual node is a graph node whose identity is derived from a canonical external name — a route, topic, queue, or shared symbol — rather than from a physical file location.

The stitch key is the canonical qualified name used to compute the virtual node’s ID. Different repos producing or consuming the same external surface compute the same stitch key, find the same virtual node (creating it if it does not yet exist), and attach their local nodes to it.

Implemented stitch key formats:

SurfaceStitch key formatExample
HTTP route__route__METHOD__/path__route__POST__/orders
Kafka / message topic__topic__protocol__name__topic__kafka__order.created
Message queue__queue__protocol__name__queue__rabbitmq__invoicing
Shared symbol__shared__package@version__symbol__shared__contracts@2.1.0__OrderDto

Route matching also normalizes HTTP method aliases (for example, FETCH is normalized to GET) and strips trailing slashes before computing the stitch key, so equivalent route definitions from different codebases converge to the same node.

Once virtual nodes are in place, cross-repo graph questions become single traversals:

  • “Which frontends call /orders?” — find the Route virtual node for __route__POST__/orders, walk incoming Consumes edges, collect the Function nodes on the caller side.
  • “Which services consume order.created?” — find the Topic virtual node for __topic__kafka__order.created, walk incoming Consumes edges, collect the handler functions and their owning repos.
  • “What is the blast radius of changing OrderDto in the contracts package?” — find the SharedSymbol virtual node, walk UsesShared edges, collect all File and Function nodes that depend on it, group by repo.

None of those queries require knowledge of which repo each side lives in. The virtual node handles the stitching.

When an AI assistant asks “what would break if I change this event’s payload?”, it needs a precomputed graph answer, not another scan of raw source files.

The polyrepo code graph provides:

  • Stable IDs. An assistant can reference a node ID across sessions without it changing on re-index.
  • Repo-aware search. Results are attributed to the repo they came from, so the assistant can tell the engineer “this consumer is in repo_beta, file src/handlers/order.handler.ts.”
  • Route and event topology. The graph already knows which functions produce or consume which topics. No reasoning from text is needed.
  • Bounded task packs. Instead of returning a raw multi-hop neighborhood, the context pack system slices the graph into a byte-budgeted, mode-specific bundle shaped for the specific task at hand.

The retrieval step has already happened. The assistant can focus on synthesis.