build_tools.syllable_walk_web.run_discovery

Run directory discovery for the syllable-walk web pipeline history.

History discovery is manifest-driven: a run is discoverable only when manifest.json exists and is parseable. This keeps the run directory itself as the single source of truth and avoids legacy text-file parsing heuristics.

Classes

RunInfo

Metadata about one manifest-backed pipeline run directory.

Functions

`discover_runs`([base_path])	Discover all pipeline run directories.
`get_selection_data`(selection_path)	Load selection data from a JSON file.
`get_run_by_id`(run_id[, base_path])	Get a specific run by its directory name.

Module Contents

class build_tools.syllable_walk_web.run_discovery.RunInfo[source]

Metadata about one manifest-backed pipeline run directory.

path: Absolute path to the run directory

run_id: Canonical run identifier (matches directory name)

extractor_type: Type of extractor (“nltk” or “pyphen”)

timestamp: Run timestamp in YYYYMMDD_HHMMSS format

display_name: Human-readable display name

corpus_db_path: Path to corpus.db artifact if present and exists

annotated_json_path: Path to annotated JSON artifact if present and exists

syllable_count: Number of unique syllables from manifest metrics

selections: Dict mapping name class to selection file path

path: pathlib.Path

run_id: str

extractor_type: str

timestamp: str

display_name: str

corpus_db_path: pathlib.Path | None

annotated_json_path: pathlib.Path | None

syllable_count: int

source_path: str | None = None

files_processed: int | None = None

processing_time: str | None = None

output_tree_lines: list[str] = []

selections: dict[str, pathlib.Path]

status: str = 'unknown'

created_at_utc: str | None = None

completed_at_utc: str | None = None

stage_statuses: dict[str, str]

ipc_input_hash: str | None = None

ipc_output_hash: str | None = None

to_dict()[source]

Convert to dictionary for JSON serialization.

Returns:: Dictionary with all run metadata
Return type:: dict

build_tools.syllable_walk_web.run_discovery.discover_runs(base_path=None)[source]

Discover all pipeline run directories.

Scans _working/output/ (or specified base path) for directories matching the pattern YYYYMMDD_HHMMSS_{extractor}. Returns metadata for all valid runs found, sorted by timestamp (newest first).

Parameters:: base_path (pathlib.Path | None) – Directory to scan. Default: _working/output/
Returns:: List of RunInfo objects, sorted by timestamp (newest first)
Return type:: list[RunInfo]

Examples

>>> runs = discover_runs()
>>> len(runs)
2
>>> runs[0].extractor_type
'nltk'

build_tools.syllable_walk_web.run_discovery.get_selection_data(selection_path)[source]

Load selection data from a JSON file.

Parameters:

selection_path (pathlib.Path) – Path to selection JSON file

Returns:

Dictionary with metadata and selections list

Raises:

FileNotFoundError – If file doesn’t exist
json.JSONDecodeError – If file is not valid JSON

Return type:

dict

build_tools.syllable_walk_web.run_discovery.get_run_by_id(run_id, base_path=None)[source]

Get a specific run by its directory name.

Parameters:

run_id (str) – Run directory name (e.g., “20260121_084017_nltk”)
base_path (pathlib.Path | None) – Base path to search. Default: _working/output/

Returns:

RunInfo for the run, or None if not found

Return type:

RunInfo | None