build_tools.syllable_walk_web.services.pipeline_manifest ======================================================== .. py:module:: build_tools.syllable_walk_web.services.pipeline_manifest .. autoapi-nested-parse:: Manifest helpers for syllable-walk web pipeline runs. This module centralises ``manifest.json`` creation and updates for pipeline runs under ``_working/output//manifest.json``. Design goals: - deterministic output ordering (stable JSON, sorted artifact lists) - additive schema that tolerates partial/failed/cancelled runs - explicit stage timing records suitable for History-tab rendering - minimal coupling so pipeline runner orchestration remains readable Attributes ---------- .. autoapisummary:: build_tools.syllable_walk_web.services.pipeline_manifest.IPC_SCHEMA_VERSION build_tools.syllable_walk_web.services.pipeline_manifest.IPC_LIBRARY_NAME build_tools.syllable_walk_web.services.pipeline_manifest.IPC_LIBRARY_REF Classes ------- .. autoapisummary:: build_tools.syllable_walk_web.services.pipeline_manifest.ManifestIPCVerificationResult Functions --------- .. autoapisummary:: build_tools.syllable_walk_web.services.pipeline_manifest.utc_now_iso build_tools.syllable_walk_web.services.pipeline_manifest.create_manifest build_tools.syllable_walk_web.services.pipeline_manifest.upsert_stage build_tools.syllable_walk_web.services.pipeline_manifest.set_terminal_status build_tools.syllable_walk_web.services.pipeline_manifest.refresh_metrics_and_artifacts build_tools.syllable_walk_web.services.pipeline_manifest.refresh_ipc build_tools.syllable_walk_web.services.pipeline_manifest.verify_manifest_ipc build_tools.syllable_walk_web.services.pipeline_manifest.verify_manifest_ipc_file build_tools.syllable_walk_web.services.pipeline_manifest.write_manifest Module Contents --------------- .. py:data:: IPC_SCHEMA_VERSION :value: 1 .. py:data:: IPC_LIBRARY_NAME :value: 'pipeworks-ipc' .. py:data:: IPC_LIBRARY_REF .. py:class:: ManifestIPCVerificationResult Outcome of validating one manifest's IPC hash integrity. .. py:attribute:: status :type: str .. py:attribute:: reason :type: str .. py:attribute:: input_hash :type: str | None :value: None .. py:attribute:: output_hash :type: str | None :value: None .. py:function:: utc_now_iso() Return a UTC timestamp in ISO-8601 ``YYYY-MM-DDTHH:MM:SSZ`` format. The manifest schema uses second precision because millisecond precision is unnecessary for stage telemetry and makes snapshot diffs noisier. .. py:function:: create_manifest(*, run_id, extractor, language, source_path, file_pattern, min_syllable_length, max_syllable_length, run_normalize, run_annotate, created_at_utc) Create a new in-memory manifest document. :param run_id: Run directory name (e.g. ``20260222_093033_pyphen``). :param extractor: Extractor type (``pyphen`` or ``nltk``). :param language: Language selector used by the pipeline. :param source_path: Source file or directory input path. :param file_pattern: Glob pattern for directory mode extraction. :param min_syllable_length: Minimum syllable length filter. :param max_syllable_length: Maximum syllable length filter. :param run_normalize: Whether normalize stage was requested. :param run_annotate: Whether annotate stage was requested. :param created_at_utc: Run start timestamp in UTC ISO format. :returns: Manifest dictionary matching schema v1. .. py:function:: upsert_stage(manifest, *, name, status, started_at_utc = None, ended_at_utc = None) Insert or update one stage record in-place. :param manifest: Mutable manifest document. :param name: Stage name (extract, normalize, annotate, database). :param status: Stage status (running/completed/failed/cancelled/skipped). :param started_at_utc: Optional stage start timestamp. :param ended_at_utc: Optional stage end timestamp. .. py:function:: set_terminal_status(manifest, *, status, completed_at_utc, error_message = None) Set final run status and optional error in-place. .. py:function:: refresh_metrics_and_artifacts(manifest, *, run_directory, source_path, file_pattern) Populate manifest ``metrics`` and ``artifacts`` from run outputs. This helper is idempotent and deterministic: - artifact list is always sorted by relative path - file counts use a stable path/glob rule set - syllable count prefers ``data/corpus.db`` when present .. py:function:: refresh_ipc(manifest) Refresh deterministic IPC fields from current manifest state. Input hash is computed from canonical run configuration fields. Output hash is computed from a canonical serialized payload containing: - artifact summaries (path/type/size), already deterministically sorted - selected metrics (syllable_count_unique/files_processed) .. py:function:: verify_manifest_ipc(manifest) Verify that stored manifest IPC hashes match deterministic payload hashes. Returns ``verified`` when both hashes are present and match canonical recomputation from manifest content. .. py:function:: verify_manifest_ipc_file(run_directory) Read ``manifest.json`` and verify its IPC hash integrity. .. py:function:: write_manifest(run_directory, manifest) Write manifest to ``run_directory/manifest.json`` atomically. Atomic write semantics: 1. Write JSON to ``manifest.json.tmp``. 2. Replace target path with ``Path.replace``. This prevents partially-written files if a run is interrupted mid-write.