Skip to content

gather-step configuration reference

The gather-step.config.yaml file declares which repositories belong to the workspace and controls how they are indexed. The file must exist before running index, watch, or any analysis command. Run gather-step init to generate a starting config by auto-discovery.

Given --workspace /path/to/workspace, the tool resolves these paths automatically. Override individual paths with the corresponding command flags.

PurposeDefault path
Config/path/to/workspace/gather-step.config.yaml
Registry/path/to/workspace/.gather-step/registry.json
Storage root/path/to/workspace/.gather-step/storage
Graph store/path/to/workspace/.gather-step/storage/graph.redb
Metadata store/path/to/workspace/.gather-step/storage/metadata.sqlite

The example below uses all optional fields. In practice, most workspaces need only the repos block and optionally indexing.workspace_concurrency.

gather-step.config.yaml
repos:
- name: backend_standard
path: apps/backend_standard
depth: full
- name: frontend_standard
path: apps/frontend_standard
depth: level2
- name: shared_contracts
path: packages/shared_contracts
allow_listed_repos:
- shared_contracts
github:
owner: example-org
api_base_url: https://api.github.com
token_env: GITHUB_TOKEN
jira:
project_key: ENG
base_url: https://example.atlassian.net
token_env: JIRA_API_TOKEN
indexing:
exclude:
- node_modules
- dist
- "*.min.js"
- "*.map"
- "*.lock"
- "*.d.ts"
language_excludes:
- language: typescript
patterns:
- "*.generated.ts"
include_languages:
- typescript
- javascript
include_dotfiles: false
min_file_size: 100B
max_file_size: 1MB
workspace_concurrency: 4

Required. A list of repository entries. At least one must be present. The config is rejected if repos is empty.

Each entry in the list has the following structure:

FieldTypeRequiredDescription
namestringyesLogical repo name used in all output, registry entries, and tool responses. Must be non-empty. Must not contain /, \, newlines, or the special segments . and ... Must be unique within the config.
pathstringyesRepo root path relative to the config file’s directory. Must be a relative path. Must not escape the config root via .. segments. Must not overlap with another repo’s path.
depthenumnoPer-repo depth override. Accepts level1, level2, level3, or full. When omitted, the value from --depth (if passed on the CLI) is used; otherwise defaults to full.

Repo paths that resolve through symlinks are rejected at index time.


Optional. Default: [].

A list of repo names from repos that are granted elevated trust within the workspace. Currently stored in the config model and validated against the repos list. Every name in allow_listed_repos must appear in repos; duplicate names are rejected.


Optional. GitHub integration settings. When provided, future phases will use this block for GitHub-sourced metadata such as pull request history.

FieldTypeRequiredDescription
ownerstringyesGitHub organization or user name.
api_base_urlstringnoGitHub API base URL. Defaults to https://api.github.com when absent.
token_envstringnoName of the environment variable that holds the GitHub personal access token.

Optional. Jira integration settings. Stored for future use in linking commits to issue tracking.

FieldTypeRequiredDescription
project_keystringyesJira project key, e.g. ENG.
base_urlstringnoJira instance base URL, e.g. https://example.atlassian.net.
token_envstringnoName of the environment variable that holds the Jira API token.

Optional. Controls how source files are selected and processed during indexing. All sub-fields are optional; the defaults are designed to work correctly for most TypeScript and JavaScript workspaces without configuration.

Type: string[]
Default: ["node_modules", "dist", "*.min.js", "*.map", "*.lock", "*.d.ts"]

Glob patterns for files and directories to exclude from indexing. Patterns are matched against relative file paths within each repo. The default list excludes common build artifacts and package directories.

Type: list of {language: string, patterns: string[]}
Default: []

Per-language exclusion rules. Each entry applies patterns only when the file’s detected language matches language. Useful for excluding generated files in specific languages without excluding them globally.

Example:

language_excludes:
- language: typescript
patterns:
- "*.generated.ts"
- "src/graphql/**"

Type: string[]
Default: [] (all languages indexed)

When non-empty, only files whose detected language is in this list are indexed. Languages are lowercase strings such as typescript, javascript, python, rust. An empty list means all supported languages are eligible.

Type: bool
Default: false

When true, files and directories beginning with a dot (e.g., .config/) are included in indexing. When false (the default), dotfiles and dot-directories are skipped.

Type: string (human-readable size)
Default: null (no minimum)

Files smaller than this size are excluded from indexing. Accepts human-readable strings such as 100B, 1KB, 1MB. When omitted, no minimum size is applied.

Type: string (human-readable size)
Default: "1MB"

Files larger than this size are excluded from indexing. Accepts human-readable strings such as 512KB, 2MB. Prevents very large generated or binary files from degrading parse performance.

Type: usize (optional)
Default: null (determined by the indexer)

Controls the number of repos indexed concurrently at the workspace level. When omitted, the indexer chooses a default based on available resources. Setting this to 1 forces sequential per-repo indexing, which is useful for debugging.

The config parser uses #[serde(deny_unknown_fields)] on every struct. Unknown YAML keys are hard errors, not warnings.

RuleDetail
repos must not be emptyAt least one repo entry is required.
Repo names must be non-emptyBlank names are rejected.
Repo names must not contain path separators/, \, newlines, carriage returns, and the segments . and .. are rejected.
Repo names must not contain null bytesNUL bytes in names or paths are rejected.
Repo paths must be relativeAbsolute paths are rejected.
Repo paths must not escape the config rootPaths containing .. components are rejected.
Repo names must be uniqueDuplicate names within the config are rejected.
Repo paths must not overlapA repo path that is a prefix of another repo’s path is rejected.
allow_listed_repos must reference known reposAny name not present in repos is rejected.
allow_listed_repos names must be uniqueDuplicate entries are rejected.
Repo root symlinks are rejected at index timeA repo whose path resolves through a symlink is rejected during index or watch.

The depth field on each repo entry (and the --depth CLI flag) controls what the indexer extracts from the repo.

ValueDescription
level1Lightest pass. File nodes and top-level module structure only. Suitable for large dependency repos where symbol-level detail is not needed.
level2File nodes plus top-level declarations (functions, classes, types). No caller/callee edges.
level3Full declarations plus call edges within files. No cross-file resolution pass.
fullComplete indexing including cross-file and cross-repo reference resolution, virtual node construction, payload contract inference, and git history analytics. This is the default and produces the richest graph.

Depth values are accepted as lowercase strings in YAML. The numeric aliases 1, 2, and 3 are accepted for level1, level2, and level3 respectively.

The files under .gather-step/ are generated and should not be committed to version control. The .gather-step/ directory itself is created automatically during index and managed by the clean command.

For workspace setup instructions, see the workspace setup guide.