build_tools.syllable_walk_web.run_discovery

Run directory discovery for the syllable-walk web pipeline history.

History discovery is manifest-driven: a run is discoverable only when manifest.json exists and is parseable. This keeps the run directory itself as the single source of truth and avoids legacy text-file parsing heuristics.

Classes

RunInfo

Metadata about one manifest-backed pipeline run directory.

Functions

discover_runs([base_path])

Discover all pipeline run directories.

get_selection_data(selection_path)

Load selection data from a JSON file.

get_run_by_id(run_id[, base_path])

Get a specific run by its directory name.

Module Contents

class build_tools.syllable_walk_web.run_discovery.RunInfo[source]

Metadata about one manifest-backed pipeline run directory.

path

Absolute path to the run directory

run_id

Canonical run identifier (matches directory name)

extractor_type

Type of extractor (“nltk” or “pyphen”)

timestamp

Run timestamp in YYYYMMDD_HHMMSS format

display_name

Human-readable display name

corpus_db_path

Path to corpus.db artifact if present and exists

annotated_json_path

Path to annotated JSON artifact if present and exists

syllable_count

Number of unique syllables from manifest metrics

selections

Dict mapping name class to selection file path

path: pathlib.Path
run_id: str
extractor_type: str
timestamp: str
display_name: str
corpus_db_path: pathlib.Path | None
annotated_json_path: pathlib.Path | None
syllable_count: int
source_path: str | None = None
files_processed: int | None = None
processing_time: str | None = None
output_tree_lines: list[str] = []
selections: dict[str, pathlib.Path]
status: str = 'unknown'
created_at_utc: str | None = None
completed_at_utc: str | None = None
stage_statuses: dict[str, str]
ipc_input_hash: str | None = None
ipc_output_hash: str | None = None
to_dict()[source]

Convert to dictionary for JSON serialization.

Returns:

Dictionary with all run metadata

Return type:

dict

build_tools.syllable_walk_web.run_discovery.discover_runs(base_path=None)[source]

Discover all pipeline run directories.

Scans _working/output/ (or specified base path) for directories matching the pattern YYYYMMDD_HHMMSS_{extractor}. Returns metadata for all valid runs found, sorted by timestamp (newest first).

Parameters:

base_path (pathlib.Path | None) – Directory to scan. Default: _working/output/

Returns:

List of RunInfo objects, sorted by timestamp (newest first)

Return type:

list[RunInfo]

Examples

>>> runs = discover_runs()
>>> len(runs)
2
>>> runs[0].extractor_type
'nltk'
build_tools.syllable_walk_web.run_discovery.get_selection_data(selection_path)[source]

Load selection data from a JSON file.

Parameters:

selection_path (pathlib.Path) – Path to selection JSON file

Returns:

Dictionary with metadata and selections list

Raises:
Return type:

dict

build_tools.syllable_walk_web.run_discovery.get_run_by_id(run_id, base_path=None)[source]

Get a specific run by its directory name.

Parameters:
  • run_id (str) – Run directory name (e.g., “20260121_084017_nltk”)

  • base_path (pathlib.Path | None) – Base path to search. Default: _working/output/

Returns:

RunInfo for the run, or None if not found

Return type:

RunInfo | None