build_tools.syllable_walk_web.services.pipeline_manifest

Manifest helpers for syllable-walk web pipeline runs.

This module centralises manifest.json creation and updates for pipeline runs under _working/output/<run_id>/manifest.json.

Design goals:

  • deterministic output ordering (stable JSON, sorted artifact lists)

  • additive schema that tolerates partial/failed/cancelled runs

  • explicit stage timing records suitable for History-tab rendering

  • minimal coupling so pipeline runner orchestration remains readable

Attributes

IPC_SCHEMA_VERSION

IPC_LIBRARY_NAME

IPC_LIBRARY_REF

Classes

ManifestIPCVerificationResult

Outcome of validating one manifest's IPC hash integrity.

Functions

utc_now_iso()

Return a UTC timestamp in ISO-8601 YYYY-MM-DDTHH:MM:SSZ format.

create_manifest(*, run_id, extractor, language, ...)

Create a new in-memory manifest document.

upsert_stage(manifest, *, name, status[, ...])

Insert or update one stage record in-place.

set_terminal_status(manifest, *, status, completed_at_utc)

Set final run status and optional error in-place.

refresh_metrics_and_artifacts(manifest, *, ...)

Populate manifest metrics and artifacts from run outputs.

refresh_ipc(manifest)

Refresh deterministic IPC fields from current manifest state.

verify_manifest_ipc(manifest)

Verify that stored manifest IPC hashes match deterministic payload hashes.

verify_manifest_ipc_file(run_directory)

Read manifest.json and verify its IPC hash integrity.

write_manifest(run_directory, manifest)

Write manifest to run_directory/manifest.json atomically.

Module Contents

build_tools.syllable_walk_web.services.pipeline_manifest.IPC_SCHEMA_VERSION = 1
build_tools.syllable_walk_web.services.pipeline_manifest.IPC_LIBRARY_NAME = 'pipeworks-ipc'
build_tools.syllable_walk_web.services.pipeline_manifest.IPC_LIBRARY_REF
class build_tools.syllable_walk_web.services.pipeline_manifest.ManifestIPCVerificationResult[source]

Outcome of validating one manifest’s IPC hash integrity.

status: str
reason: str
input_hash: str | None = None
output_hash: str | None = None
build_tools.syllable_walk_web.services.pipeline_manifest.utc_now_iso()[source]

Return a UTC timestamp in ISO-8601 YYYY-MM-DDTHH:MM:SSZ format.

The manifest schema uses second precision because millisecond precision is unnecessary for stage telemetry and makes snapshot diffs noisier.

build_tools.syllable_walk_web.services.pipeline_manifest.create_manifest(*, run_id, extractor, language, source_path, file_pattern, min_syllable_length, max_syllable_length, run_normalize, run_annotate, created_at_utc)[source]

Create a new in-memory manifest document.

Parameters:
  • run_id (str) – Run directory name (e.g. 20260222_093033_pyphen).

  • extractor (str) – Extractor type (pyphen or nltk).

  • language (str) – Language selector used by the pipeline.

  • source_path (str | None) – Source file or directory input path.

  • file_pattern (str) – Glob pattern for directory mode extraction.

  • min_syllable_length (int) – Minimum syllable length filter.

  • max_syllable_length (int) – Maximum syllable length filter.

  • run_normalize (bool) – Whether normalize stage was requested.

  • run_annotate (bool) – Whether annotate stage was requested.

  • created_at_utc (str) – Run start timestamp in UTC ISO format.

Returns:

Manifest dictionary matching schema v1.

Return type:

dict[str, Any]

build_tools.syllable_walk_web.services.pipeline_manifest.upsert_stage(manifest, *, name, status, started_at_utc=None, ended_at_utc=None)[source]

Insert or update one stage record in-place.

Parameters:
  • manifest (dict[str, Any]) – Mutable manifest document.

  • name (str) – Stage name (extract, normalize, annotate, database).

  • status (str) – Stage status (running/completed/failed/cancelled/skipped).

  • started_at_utc (str | None) – Optional stage start timestamp.

  • ended_at_utc (str | None) – Optional stage end timestamp.

build_tools.syllable_walk_web.services.pipeline_manifest.set_terminal_status(manifest, *, status, completed_at_utc, error_message=None)[source]

Set final run status and optional error in-place.

build_tools.syllable_walk_web.services.pipeline_manifest.refresh_metrics_and_artifacts(manifest, *, run_directory, source_path, file_pattern)[source]

Populate manifest metrics and artifacts from run outputs.

This helper is idempotent and deterministic:

  • artifact list is always sorted by relative path

  • file counts use a stable path/glob rule set

  • syllable count prefers data/corpus.db when present

build_tools.syllable_walk_web.services.pipeline_manifest.refresh_ipc(manifest)[source]

Refresh deterministic IPC fields from current manifest state.

Input hash is computed from canonical run configuration fields. Output hash is computed from a canonical serialized payload containing:

  • artifact summaries (path/type/size), already deterministically sorted

  • selected metrics (syllable_count_unique/files_processed)

build_tools.syllable_walk_web.services.pipeline_manifest.verify_manifest_ipc(manifest)[source]

Verify that stored manifest IPC hashes match deterministic payload hashes.

Returns verified when both hashes are present and match canonical recomputation from manifest content.

build_tools.syllable_walk_web.services.pipeline_manifest.verify_manifest_ipc_file(run_directory)[source]

Read manifest.json and verify its IPC hash integrity.

build_tools.syllable_walk_web.services.pipeline_manifest.write_manifest(run_directory, manifest)[source]

Write manifest to run_directory/manifest.json atomically.

Atomic write semantics:

  1. Write JSON to manifest.json.tmp.

  2. Replace target path with Path.replace.

This prevents partially-written files if a run is interrupted mid-write.