build_tools.syllable_walk_web.run_discovery =========================================== .. py:module:: build_tools.syllable_walk_web.run_discovery .. autoapi-nested-parse:: Run directory discovery for the syllable-walk web pipeline history. History discovery is manifest-driven: a run is discoverable only when ``manifest.json`` exists and is parseable. This keeps the run directory itself as the single source of truth and avoids legacy text-file parsing heuristics. Classes ------- .. autoapisummary:: build_tools.syllable_walk_web.run_discovery.RunInfo Functions --------- .. autoapisummary:: build_tools.syllable_walk_web.run_discovery.discover_runs build_tools.syllable_walk_web.run_discovery.get_selection_data build_tools.syllable_walk_web.run_discovery.get_run_by_id Module Contents --------------- .. py:class:: RunInfo Metadata about one manifest-backed pipeline run directory. .. attribute:: path Absolute path to the run directory .. attribute:: run_id Canonical run identifier (matches directory name) .. attribute:: extractor_type Type of extractor ("nltk" or "pyphen") .. attribute:: timestamp Run timestamp in YYYYMMDD_HHMMSS format .. attribute:: display_name Human-readable display name .. attribute:: corpus_db_path Path to corpus.db artifact if present and exists .. attribute:: annotated_json_path Path to annotated JSON artifact if present and exists .. attribute:: syllable_count Number of unique syllables from manifest metrics .. attribute:: selections Dict mapping name class to selection file path .. py:attribute:: path :type: pathlib.Path .. py:attribute:: run_id :type: str .. py:attribute:: extractor_type :type: str .. py:attribute:: timestamp :type: str .. py:attribute:: display_name :type: str .. py:attribute:: corpus_db_path :type: pathlib.Path | None .. py:attribute:: annotated_json_path :type: pathlib.Path | None .. py:attribute:: syllable_count :type: int .. py:attribute:: source_path :type: str | None :value: None .. py:attribute:: files_processed :type: int | None :value: None .. py:attribute:: processing_time :type: str | None :value: None .. py:attribute:: output_tree_lines :type: list[str] :value: [] .. py:attribute:: selections :type: dict[str, pathlib.Path] .. py:attribute:: status :type: str :value: 'unknown' .. py:attribute:: created_at_utc :type: str | None :value: None .. py:attribute:: completed_at_utc :type: str | None :value: None .. py:attribute:: stage_statuses :type: dict[str, str] .. py:attribute:: ipc_input_hash :type: str | None :value: None .. py:attribute:: ipc_output_hash :type: str | None :value: None .. py:method:: to_dict() Convert to dictionary for JSON serialization. :returns: Dictionary with all run metadata .. py:function:: discover_runs(base_path = None) Discover all pipeline run directories. Scans _working/output/ (or specified base path) for directories matching the pattern YYYYMMDD_HHMMSS_{extractor}. Returns metadata for all valid runs found, sorted by timestamp (newest first). :param base_path: Directory to scan. Default: _working/output/ :returns: List of RunInfo objects, sorted by timestamp (newest first) .. admonition:: Examples >>> runs = discover_runs() >>> len(runs) 2 >>> runs[0].extractor_type 'nltk' .. py:function:: get_selection_data(selection_path) Load selection data from a JSON file. :param selection_path: Path to selection JSON file :returns: Dictionary with metadata and selections list :raises FileNotFoundError: If file doesn't exist :raises json.JSONDecodeError: If file is not valid JSON .. py:function:: get_run_by_id(run_id, base_path = None) Get a specific run by its directory name. :param run_id: Run directory name (e.g., "20260121_084017_nltk") :param base_path: Base path to search. Default: _working/output/ :returns: RunInfo for the run, or None if not found