gather-step configuration reference
The gather-step.config.yaml file declares which repositories belong to the workspace and controls how they are indexed. The file must exist before running index, watch, or any analysis command. Run gather-step init to generate a starting config by auto-discovery.
Default paths
Section titled “Default paths”Given --workspace /path/to/workspace, the tool resolves these paths automatically. Override individual paths with the corresponding command flags.
| Purpose | Default path |
|---|---|
| Config | /path/to/workspace/gather-step.config.yaml |
| Registry | /path/to/workspace/.gather-step/registry.json |
| Storage root | /path/to/workspace/.gather-step/storage |
| Graph store | /path/to/workspace/.gather-step/storage/graph.redb |
| Metadata store | /path/to/workspace/.gather-step/storage/metadata.sqlite |
Canonical example
Section titled “Canonical example”The example below uses all optional fields. In practice, most workspaces need only the repos block and optionally indexing.workspace_concurrency.
repos: - name: backend_standard path: apps/backend_standard depth: full - name: frontend_standard path: apps/frontend_standard depth: level2 - name: shared_contracts path: packages/shared_contracts
allow_listed_repos: - shared_contracts
github: owner: example-org api_base_url: https://api.github.com token_env: GITHUB_TOKEN
jira: project_key: ENG base_url: https://example.atlassian.net token_env: JIRA_API_TOKEN
indexing: exclude: - node_modules - dist - "*.min.js" - "*.map" - "*.lock" - "*.d.ts" language_excludes: - language: typescript patterns: - "*.generated.ts" include_languages: - typescript - javascript include_dotfiles: false min_file_size: 100B max_file_size: 1MB workspace_concurrency: 4Top-level fields
Section titled “Top-level fields”Required. A list of repository entries. At least one must be present. The config is rejected if repos is empty.
Each entry in the list has the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Logical repo name used in all output, registry entries, and tool responses. Must be non-empty. Must not contain /, \, newlines, or the special segments . and ... Must be unique within the config. |
path | string | yes | Repo root path relative to the config file’s directory. Must be a relative path. Must not escape the config root via .. segments. Must not overlap with another repo’s path. |
depth | enum | no | Per-repo depth override. Accepts level1, level2, level3, or full. When omitted, the value from --depth (if passed on the CLI) is used; otherwise defaults to full. |
Repo paths that resolve through symlinks are rejected at index time.
allow_listed_repos
Section titled “allow_listed_repos”Optional. Default: [].
A list of repo names from repos that are granted elevated trust within the workspace. Currently stored in the config model and validated against the repos list. Every name in allow_listed_repos must appear in repos; duplicate names are rejected.
github
Section titled “github”Optional. GitHub integration settings. When provided, future phases will use this block for GitHub-sourced metadata such as pull request history.
| Field | Type | Required | Description |
|---|---|---|---|
owner | string | yes | GitHub organization or user name. |
api_base_url | string | no | GitHub API base URL. Defaults to https://api.github.com when absent. |
token_env | string | no | Name of the environment variable that holds the GitHub personal access token. |
Optional. Jira integration settings. Stored for future use in linking commits to issue tracking.
| Field | Type | Required | Description |
|---|---|---|---|
project_key | string | yes | Jira project key, e.g. ENG. |
base_url | string | no | Jira instance base URL, e.g. https://example.atlassian.net. |
token_env | string | no | Name of the environment variable that holds the Jira API token. |
indexing
Section titled “indexing”Optional. Controls how source files are selected and processed during indexing. All sub-fields are optional; the defaults are designed to work correctly for most TypeScript and JavaScript workspaces without configuration.
indexing.exclude
Section titled “indexing.exclude”Type: string[]
Default: ["node_modules", "dist", "*.min.js", "*.map", "*.lock", "*.d.ts"]
Glob patterns for files and directories to exclude from indexing. Patterns are matched against relative file paths within each repo. The default list excludes common build artifacts and package directories.
indexing.language_excludes
Section titled “indexing.language_excludes”Type: list of {language: string, patterns: string[]}
Default: []
Per-language exclusion rules. Each entry applies patterns only when the file’s detected language matches language. Useful for excluding generated files in specific languages without excluding them globally.
Example:
language_excludes: - language: typescript patterns: - "*.generated.ts" - "src/graphql/**"indexing.include_languages
Section titled “indexing.include_languages”Type: string[]
Default: [] (all languages indexed)
When non-empty, only files whose detected language is in this list are indexed. Languages are lowercase strings such as typescript, javascript, python, rust. An empty list means all supported languages are eligible.
indexing.include_dotfiles
Section titled “indexing.include_dotfiles”Type: bool
Default: false
When true, files and directories beginning with a dot (e.g., .config/) are included in indexing. When false (the default), dotfiles and dot-directories are skipped.
indexing.min_file_size
Section titled “indexing.min_file_size”Type: string (human-readable size)
Default: null (no minimum)
Files smaller than this size are excluded from indexing. Accepts human-readable strings such as 100B, 1KB, 1MB. When omitted, no minimum size is applied.
indexing.max_file_size
Section titled “indexing.max_file_size”Type: string (human-readable size)
Default: "1MB"
Files larger than this size are excluded from indexing. Accepts human-readable strings such as 512KB, 2MB. Prevents very large generated or binary files from degrading parse performance.
indexing.workspace_concurrency
Section titled “indexing.workspace_concurrency”Type: usize (optional)
Default: null (determined by the indexer)
Controls the number of repos indexed concurrently at the workspace level. When omitted, the indexer chooses a default based on available resources. Setting this to 1 forces sequential per-repo indexing, which is useful for debugging.
Validation rules
Section titled “Validation rules”The config parser uses #[serde(deny_unknown_fields)] on every struct. Unknown YAML keys are hard errors, not warnings.
| Rule | Detail |
|---|---|
repos must not be empty | At least one repo entry is required. |
| Repo names must be non-empty | Blank names are rejected. |
| Repo names must not contain path separators | /, \, newlines, carriage returns, and the segments . and .. are rejected. |
| Repo names must not contain null bytes | NUL bytes in names or paths are rejected. |
| Repo paths must be relative | Absolute paths are rejected. |
| Repo paths must not escape the config root | Paths containing .. components are rejected. |
| Repo names must be unique | Duplicate names within the config are rejected. |
| Repo paths must not overlap | A repo path that is a prefix of another repo’s path is rejected. |
allow_listed_repos must reference known repos | Any name not present in repos is rejected. |
allow_listed_repos names must be unique | Duplicate entries are rejected. |
| Repo root symlinks are rejected at index time | A repo whose path resolves through a symlink is rejected during index or watch. |
Depth values
Section titled “Depth values”The depth field on each repo entry (and the --depth CLI flag) controls what the indexer extracts from the repo.
| Value | Description |
|---|---|
level1 | Lightest pass. File nodes and top-level module structure only. Suitable for large dependency repos where symbol-level detail is not needed. |
level2 | File nodes plus top-level declarations (functions, classes, types). No caller/callee edges. |
level3 | Full declarations plus call edges within files. No cross-file resolution pass. |
full | Complete indexing including cross-file and cross-repo reference resolution, virtual node construction, payload contract inference, and git history analytics. This is the default and produces the richest graph. |
Depth values are accepted as lowercase strings in YAML. The numeric aliases 1, 2, and 3 are accepted for level1, level2, and level3 respectively.
Generated state paths
Section titled “Generated state paths”The files under .gather-step/ are generated and should not be committed to version control. The .gather-step/ directory itself is created automatically during index and managed by the clean command.
For workspace setup instructions, see the workspace setup guide.