Syllable Walker Web

Overview

Pipe-Works Build Tools — Web Application

Combined web interface for the Pipeline and Walker build tools, providing a browser-based alternative to pipeline_tui and syllable_walk_tui.

This is a build-time tool only — not used during runtime name generation.

Features:

Pipeline tool: extraction, normalization, annotation with live monitoring
Walker tool: dual-patch syllable walking, name combiner, name selector
Corpus analysis with terrain visualization and profile reach deep-dives
Name rendering and package export (ZIP with manifest + disk metadata persistence)
Dark/light theme support
18 API endpoints across Pipeline, Walker, Browse, Settings, and Version groups

Architecture:

api/: Request handlers (browse, pipeline, walker)
services/: Business logic (corpus_loader, combiner_runner, selector_runner, walk_generator, metrics, packager, pipeline_runner)
state.py: Dataclasses (PatchState, PipelineJobState, ServerState)
server.py: stdlib http.server with routing and static file serving

Usage:

Launch the web server from the command line:

python -m build_tools.syllable_walk_web
python -m build_tools.syllable_walk_web --port 9000
python -m build_tools.syllable_walk_web --output-base /path/to/output

Or programmatically:

>>> from build_tools.syllable_walk_web import run_server
>>> run_server(port=8000)

Syllable Walk Web — dual-patch Walker interface

Command-Line Interface

Launch the Pipe-Works Build Tools web application. Combines Pipeline (extraction/normalization/annotation) and Walker (dual-patch syllable walking, name generation) tools in a browser-based interface.

usage: python -m build_tools.syllable_walk_web [-h] [--port PORT] [--quiet]
                                               [--output-base OUTPUT_BASE]
                                               [--sessions-dir SESSIONS_DIR]
                                               [--config CONFIG]

Named Arguments

--port

Port to serve on. If not specified, automatically finds an available port (checks 8000-8099 first, then 8100-8999). Default: auto-detect

--quiet

Suppress HTTP request logging. Default: False

Default: False

--output-base

Base directory for pipeline run discovery. Default: _working/output

--sessions-dir

Optional directory for saved walker sessions. Default: <output_base>/sessions

--config

Path to INI config file. Reads the [build_tools] section for output_base, sessions_dir, corpus_dir_a, corpus_dir_b, port, and verbose. CLI arguments override INI values. Default: server.ini

Default: 'server.ini'

Examples:

# Launch on auto-detected port (default)
python -m build_tools.syllable_walk_web

# Launch on a specific port
python -m build_tools.syllable_walk_web --port 9000

# Launch in quiet mode (suppress HTTP request logs)
python -m build_tools.syllable_walk_web --quiet

# Use a custom config file
python -m build_tools.syllable_walk_web --config server.ini

Output Format

The web interface is an interactive browser-based tool with in-memory working state (pipeline job status, patch data, walks, candidates, selections).

It produces file outputs in two places:

Pipeline runs in <output_base>/<timestamp>_<extractor>/ (extract/normalize/annotate/db outputs)
Package builds from the Walker tab:
- Browser download: <name>-<version>.zip (HTTP response from /api/walker/package)
- Disk persistence (best-effort): <output_base>/packages/<name>-<version>_<timestamp>.zip plus <name>-<version>_<timestamp>_metadata.json

Interface Components:

Pipeline tab — Run the full extraction pipeline from the browser:
- Filesystem browser for source directory/file selection
- Extractor selection (Pyphen or NLTK), pyphen language selection
- Live monitor for stage progress and subprocess logs
- Run history view backed by manifest-discovered run directories (refreshes on tab entry and after run completion)
Walker tab — Dual-patch corpus exploration and name generation:
- Load corpora into Patch A and Patch B for side-by-side comparison
- Generate syllable walks with named profiles or custom walk parameters
- Combine syllables into candidates in flat-sampling or walk-based mode
- Select names by policy (first_name, last_name, place_name, etc.)
- Reach deep-dive per profile (top reachable syllables with export)
- Export selected names as text or build ZIP packages with manifest

Integration Guide

The web interface can run the full pipeline internally, so you can start from raw text without running CLI tools first.

Quickest path — start from scratch:

# Launch the web interface
python -m build_tools.syllable_walk_web

# In the browser:
# 1. Pipeline tab → browse to your source text → Start Pipeline
# 2. Walker tab → load the completed run into a patch → Walk / Combine / Select

Starting from existing pipeline output:

# If you already have pipeline runs in _working/output/
python -m build_tools.syllable_walk_web

# The Walker tab discovers runs automatically and lists them for loading

Custom output directory:

python -m build_tools.syllable_walk_web --output-base /path/to/corpus/output

INI configuration (``–config``):

The CLI reads [build_tools] settings from an INI file (default: server.ini). CLI arguments override INI values.

[build_tools]
output_base = _working/output
corpus_dir_a = /path/to/patch_a/runs
corpus_dir_b = /path/to/patch_b/runs
port = 8000
verbose = true

When to use this tool:

To run the full extraction pipeline without memorizing CLI arguments
To compare two corpora side-by-side (dual-patch mode)
To interactively explore syllable walks through a browser
To generate, filter, and export names in a single session
To build ZIP packages with manifest metadata for downstream consumption

Advanced Topics

Architecture

The module is organised into backend API, backend services, frontend modules, discovery/state, and server wiring:

Backend API handlers (api/):

browse.py — Filesystem directory listing
pipeline.py — Pipeline start/status/cancel endpoints
walker.py — Thin compatibility wrapper layer (route-level entrypoints)
walker_common.py — Shared validation/normalization helpers
walker_lock.py — Active session lock enforcement helpers
walker_session.py — Session save/list/load and run-state restore handlers
walker_cache_lock.py — Reach-cache rebuild + lock heartbeat/release handlers
walker_ops.py — Walk/combine/reach/select/export/package/analysis handlers
walker_types.py — TypedDict response contracts for extracted walker handler modules

Backend service modules (services/):

corpus_loader.py — Delegates to syllable_walk.db.load_syllables
combiner_runner.py — Delegates to name_combiner.combiner
selector_runner.py — Policy caching and delegation to name_selector
walk_generator.py — Walk generation with profile routing and seed offsets
metrics.py — Corpus shape metrics with length bucketing and terrain scores
packager.py — ZIP archive building with manifest and disk persistence
pipeline_runner.py — Background subprocess execution with cancellation
pipeline_manifest.py — Manifest IPC verification helpers
profile_reaches_cache.py — Reach profile cache read/write/verify helpers
walker_run_state_store.py — Authoritative run-local IPC sidecars for patch outputs
walker_session_store.py — Session artifact save/list/load/verify with lineage metadata
walker_session_lock.py — Cooperative single-user multi-tab lock leases (UX integrity)
session_paths.py — Runtime resolution of sessions base and session file paths

Frontend modules (static/js/walker/):

corpus.js — Orchestrator for Walk tab corpus/session behavior
corpus-api.js — Fetch wrappers for walker/session endpoints
corpus-state.js — In-memory UI state model
corpus-render.js — Hash/verification/rebuild/compare visual rendering
corpus-tooltips.js — Integrity/lock badge helpers and modal content
corpus-actions-session.js — Save/load/repair/takeover/release session actions
corpus-actions-cache.js — Rebuild reach-cache action wiring
corpus-contracts.js — Shared JSDoc typedef contracts for frontend payloads
controls.js / reach.js / operations.js — Walk, reach, combine/select/package controls and endpoint operations

Discovery and state:

run_discovery.py — Manifest-driven run discovery, selection discovery, and History payload shaping (status, timings, stage state, IPC hashes)
state.py — PatchState, PipelineJobState, and ServerState

Server (server.py):

stdlib http.server.ThreadingHTTPServer for concurrent XHR
Static file serving with directory-traversal guard
Route dispatch into API modules
Lazy API imports to avoid circular dependencies

Run Discovery

The server scans a base directory for run folders matching: YYYYMMDD_HHMMSS_{extractor}.

GET /api/pipeline/runs uses output_base by default.
GET /api/pipeline/runs?patch=a and ?patch=b use corpus_dir_a / corpus_dir_b when configured.

Discovery is strict and manifest-first:

Run folders must contain manifest.json.
manifest.json must include required keys and run_id must match folder name.
Missing/corrupt/non-conformant manifests are skipped (no legacy fallback parsing).

For each valid run, discovery reports:

folder/run id and extractor type
status and run timestamps
stage status map (extract/normalize/annotate/database)
manifest-derived metrics (including syllable count and processed-file count)
artifact paths (including corpus_db_path / annotated JSON when present)
IPC hashes (input/output) from manifest
selection file map by name class

Pipeline Execution Model

Pipeline execution runs in a background thread via services/pipeline_runner.py. Stages are subprocess-backed and logged line-by-line to job state:

extract (always)
normalize (if run_normalize=True)
annotate (if run_annotate=True and normalize ran)
database (runs after annotate; executes build_tools.corpus_sqlite_builder --force)

Status is polled through GET /api/pipeline/status and includes: status, current_stage, progress_percent, output_path, and structured log lines.

Corpus Loading and Walker Readiness

POST /api/walker/load-corpus performs two phases:

Synchronous data load: uses services/corpus_loader.load_corpus, which delegates to build_tools.syllable_walk.db.load_syllables (SQLite preferred, JSON fallback).
Background walker init: builds SyllableWalker and resolves profile reaches via run-local IPC cache.

Profile reach caching is run-directory local:

Cache path: <run_dir>/ipc/walker_profile_reaches.v1.json
Cache schema: build_tools/syllable_walk_web/schemas/walker_profile_reaches.v1.schema.json
Cache key material: - manifest IPC output hash (from <run_dir>/manifest.json) - walker graph settings (neighbor distance, inertia, feature costs) - reach settings (threshold + named profile parameters)
On cache hit, precomputed reaches are loaded.
On miss/invalid cache, reaches are recomputed and cache is rewritten.

The frontend polls GET /api/walker/stats until walker_ready=true. During load, loading_stage reports phase progress (e.g., building neighbor graph). The stats payload also includes reach_cache_status per patch (hit | miss | invalid | error | none) to make cache behavior explicit in diagnostics.

Important readiness guarantees:

Reach precomputation completes before walker_ready is set true.
loader_status and loading_error expose terminal failure states explicitly.
Load concurrency is guarded by per-patch generation tokens, so stale background threads cannot overwrite newer corpus loads.

Candidate Generation Modes

POST /api/walker/combine supports two modes:

Flat sampling (default; profile absent or "flat"): delegates to name_combiner.combine_syllables with frequency_weight.
Walk-based sampling (named profile or "custom"): generates walks first, then aggregates features from walked syllables.

The response includes generated, unique, and duplicates counts.

Dual-Patch Comparison

The Walker tab supports loading two independent corpora into Patch A and Patch B. Each patch maintains its own:

Annotated syllable data and frequency map
Walker instance (with pre-computed neighbor graph)
Generated walks, candidates, and selections

This enables side-by-side comparison of different extractors, languages, or source texts.

API Authority

The web frontend is presentation and UX only. The API is the behavioral authority for validation and execution semantics.

Frontend checks (for example min_length <= max_length) are UX helpers.
API handlers enforce the same constraints for all clients (UI and non-UI).
Requests that fail contract validation return JSON {"error": ...} with HTTP 400.
Backend response contracts for extracted walker handlers are declared in api/walker_types.py (TypedDict models).
Frontend request/response contract aliases are centralized in static/js/walker/corpus-contracts.js (JSDoc typedefs) and reused by corpus/session modules.

Examples of API-authoritative behavior:

POST /api/walker/walk validates numeric constraints including neighbor_limit, min_length, and max_length.
POST /api/pipeline/start validates min_syllable_length / max_syllable_length ranges server-side.
GET /api/walker/name-classes is the source of truth for selector class options (UI options are populated from this endpoint).

Walker State Model

GET /api/walker/stats returns independent status for patch_a and patch_b. Each patch reports loader_status plus readiness/error metadata.

`loader_status`	Meaning
`idle`	No active load thread. Patch may be empty, or may have prior corpus metadata without a currently running initialization.
`loading`	Corpus load generation is in progress. `loading_stage` reports the current phase (for example `"Building neighbour graph"`).
`ready`	Walker and pre-computed reaches are available; `walker_ready=true`.
`error`	Current load generation failed. `loading_error` contains terminal error text.

Response fields per patch include:

corpus (active run_id)
corpus_type (nltk or pyphen)
syllable_count
walker_ready, loading_stage, loading_error, loader_status
has_walks, has_candidates, has_selections
reaches (when available; includes reach count and computation timing)

Patch Isolation and Race Safety

Patch A and Patch B are fully isolated in server state.

Loading a corpus resets only the target patch state.
Walks/candidates/selections from one patch never overwrite the other patch.
Loader concurrency is generation-token guarded:
- each load-corpus increments load_generation;
- background init writes are applied only if generation is still current;
- stale loader threads exit without mutating patch state.

This prevents rapid corpus switches from producing stale overwrite races.

Determinism and Seed Behavior

Walk generation is deterministic for fixed request parameters and seed.
Batched walks use seed + i per walk to keep outputs deterministic while still varying entries within one request.
Flat combiner and selector paths accept explicit seed values for deterministic output ordering/sampling.
Without a seed, behavior remains valid but non-deterministic between runs.

API Endpoints

Endpoint	Method	Description
`/api/pipeline/runs`	GET	List discovered runs; supports `?patch=a\|b` for per-patch run roots
`/api/pipeline/status`	GET	Get pipeline job status, progress, and log lines
`/api/pipeline/start`	POST	Start extraction pipeline (source path, extractor, and optional stage/constraint fields)
`/api/pipeline/cancel`	POST	Cancel a running pipeline job
`/api/browse-directory`	POST	Browse a filesystem directory (for source/output selection)
`/api/walker/stats`	GET	Get dual-patch state (loaded corpora, loader/cache status, readiness, reach metadata)
`/api/walker/analysis/{patch}`	GET	Corpus shape metrics for a patch (terrain scores, distributions)
`/api/walker/name-classes`	GET	List available name class policies from `name_classes.yml`
`/api/walker/load-corpus`	POST	Load a run’s corpus into a patch (builds walker in background)
`/api/walker/sessions`	GET	List saved dual-patch sessions with verification and lock metadata
`/api/walker/save-session`	POST	Persist current patch assignments as one immutable session revision
`/api/walker/load-session`	POST	Load one saved session, verify references, restore trusted sidecars
`/api/walker/walk`	POST	Generate syllable walks with validated constraints and optional seed
`/api/walker/combine`	POST	Generate candidates (flat mode or walk-based mode), returns deduplication stats
`/api/walker/reach-syllables`	POST	Return top reachable syllables for one profile/patch (reach deep-dive tables)
`/api/walker/select`	POST	Select names by policy (name class, mode, count)
`/api/walker/export`	POST	Export selected names as a list
`/api/walker/package`	POST	Build ZIP archive with manifest (binary response) and persist package files to disk
`/api/walker/rebuild-reach-cache`	POST	Recompute and rewrite reach-cache IPC artifact for one loaded patch
`/api/walker/session-lock/heartbeat`	POST	Refresh active session lock lease for one holder
`/api/walker/session-lock/release`	POST	Release active session lock lease for one holder
`/api/settings`	GET	Get current server settings (resolved `output_base` and `sessions_base`)
`/api/settings/output-base`	POST	Update the output base directory
`/api/version`	GET	Return package version for UI header display

The web server uses Python’s standard library http.server (no Flask dependency).

Common Request Fields

Key request bodies for current API routes:

For mutating Walker endpoints (load-corpus, walk, combine, select, package, rebuild-reach-cache), include lock_holder_id when operating against an actively locked session.

Endpoint	Important request fields
`POST /api/pipeline/start`	`source_path` (required), `output_dir` (optional), `extractor` (default `pyphen`), `language` (default `auto`), `file_pattern` (default `*.txt`), `min_syllable_length`/`max_syllable_length` (defaults `2`/`8`), `run_normalize`/`run_annotate` (default `true`/`true`)
`POST /api/walker/load-corpus`	`patch` (`a`/`b`), `run_id` (required non-empty string), optional `lock_holder_id` (required when active session is lock-guarded)
`POST /api/walker/save-session`	optional `label`, optional `session_id` (explicit id mode), optional `repair_from_session_id` (immutable revision mode), optional `lock_holder_id` (required when active session is lock-guarded)
`POST /api/walker/load-session`	`session_id` (required), optional `lock_holder_id` (recommended for lock-coordinated multi-tab flows; required when using `force_lock`), optional `force_lock` (take-over flow)
`POST /api/walker/walk`	`patch`, `count`, `steps`, `seed`, optional `profile`. Custom constraints are always accepted: `max_flips`, `temperature`, `frequency_weight`, `neighbor_limit`, `min_length`, `max_length`. API validates ranges (for example `min_length <= max_length`). Include `lock_holder_id` when session lock is active.
`POST /api/walker/combine`	`patch`, `count`, `syllables` (int or list), `seed`. Flat mode: `frequency_weight`. Walk mode: `profile` (named or `custom`); custom supports `max_flips`, `temperature`, `frequency_weight`. Include `lock_holder_id` when session lock is active.
`POST /api/walker/reach-syllables`	`patch` and `profile` (must match one of the precomputed profile keys)
`POST /api/walker/select`	`patch`, `name_class`, `count`, `mode` (`hard`/`soft`), `order` (`alphabetical`/`random`), `seed`. Include `lock_holder_id` when session lock is active.
`POST /api/walker/package`	`name`, `version`, include flags: `include_walks_a`, `include_walks_b`, `include_candidates`, `include_selections`. Include `lock_holder_id` when session lock is active.
`POST /api/walker/rebuild-reach-cache`	`patch` (required), optional `run_id` (must match loaded patch context if provided), optional `lock_holder_id` (required when session lock is active)
`POST /api/walker/session-lock/heartbeat`	`session_id` and `lock_holder_id` (both required)
`POST /api/walker/session-lock/release`	`session_id` and `lock_holder_id` (both required)

Walker Endpoint Contract Details

Endpoint	Contract and validation rules	Success payload highlights
`GET /api/walker/stats`	No request body. Returns state for both patches, including `loader_status` and optional `reaches` map when available. Includes `patch_comparison` with `corpus_hash_relation` and policy semantics.	`patch_a` / `patch_b` objects with corpus, readiness, loading/error fields, `has_*` flags, and top-level `patch_comparison`.
`POST /api/walker/load-corpus`	Requires `patch in {"a","b"}` and non-empty `run_id`. Errors for invalid patch, missing run, or corpus load failure. If active session lock is set, requires matching `lock_holder_id`.	`patch`, `run_id`, `corpus_type`, `syllable_count`, `source`, `status="loading"`.
`GET /api/walker/sessions`	No request body. Lists saved session artifacts ordered newest-first. Includes verification, lineage, and lock metadata.	`sessions` list with `session_id`, patch run ids, verification status/reason, `root_session_id`, `parent_session_id`, `revision`, `lock_status`, `lock`.
`POST /api/walker/save-session`	Saves current patch references as session IPC artifact. `session_id` and `repair_from_session_id` are mutually exclusive. If active session lock is set, requires matching `lock_holder_id`.	`status`, `reason`, `session_id`, per-patch save status/reason, `ipc_input_hash`, `ipc_output_hash`, lineage fields.
`POST /api/walker/load-session`	Requires `session_id`. `lock_holder_id` is optional but recommended for lock-coordinated multi-tab flows; `force_lock` requires `lock_holder_id` and enables explicit take-over. Verifies session artifact, loads referenced patch runs, restores only verified run-state sidecars. Stale hash-drift session payloads may be loaded for continuity but remain explicitly integrity-signaled.	Per-patch `loaded`/`restored`/`restored_kinds` and `verification_status`/`verification_reason`, plus `session_lock` block and `recovered_from_stale_session` flag.
`POST /api/walker/walk`	Requires ready walker for target patch. Validates numeric fields: `count >= 1`, `steps >= 0`, `max_flips >= 1`, `neighbor_limit >= 1`, `min_length >= 1`, `max_length >= 1`, `min_length <= max_length`, `temperature > 0`, and integer-or-null seed. If active session lock is set, requires matching `lock_holder_id`.	`patch` and `walks` (each walk includes `formatted`, `syllables`, `steps`).
`POST /api/walker/combine`	Requires loaded corpus. `profile` controls mode: absent/`flat` uses flat combiner; named/custom profile uses walker path and requires walker readiness. If active session lock is set, requires matching `lock_holder_id`.	`generated`, `unique`, `duplicates`, `syllables`, `source`.
`POST /api/walker/reach-syllables`	Requires precomputed reaches and valid `profile` key for target patch. Errors if reach data or walker is not ready.	`profile`, `reach`, `total`, `unique_reachable`, `syllables` list.
`POST /api/walker/select`	Requires existing candidates. Validates patch and delegates policy validation to selector service (unknown name class returns error). If active session lock is set, requires matching `lock_holder_id`.	`name_class`, `mode`, `count`, `requested`, `names`.
`POST /api/walker/export`	Requires prior selection output for target patch.	`patch`, `count`, `names`.
`POST /api/walker/package`	Accepts package metadata and include flags. Builds ZIP from in-memory state. If active session lock is set, requires matching `lock_holder_id`.	Binary ZIP response with attachment filename `<name>-<version>.zip`.
`POST /api/walker/rebuild-reach-cache`	Requires loaded walker and patch context. Optional `run_id` must match loaded patch run when provided. If active session lock is set, requires matching `lock_holder_id`.	`status="rebuilt"`, `patch`, `run_id`, cache IPC hashes, verification status/reason.
`POST /api/walker/session-lock/heartbeat`	Requires `session_id` and `lock_holder_id`. Returns `held` for active lease, `missing` when lease absent, and error payload on conflicts.	`status`/`reason` and `lock` payload when available.
`POST /api/walker/session-lock/release`	Requires `session_id` and `lock_holder_id`. Release succeeds only for the current lock owner.	`status`/`reason` and released `lock` payload when available.

Pipeline Configure ↔ API Mapping

The Pipeline Configure tab now maps directly to POST /api/pipeline/start:

Configure control	Request field / behavior
Source picker (directory or file)	`source_path` (required)
Output picker	`output_dir` (optional). If not selected, server default `output_base` is used.
Extractor (`pyphen` / `nltk`)	`extractor`
Language radios + custom language code	`language`. For `pyphen`, custom code overrides radio value; for `nltk`, frontend sends `"auto"`.
File pattern	`file_pattern`
Min / Max syllable length	`min_syllable_length` / `max_syllable_length` (frontend validates `min <= max` and API rejects invalid ranges/types)
Normalize toggle	`run_normalize`
Annotate toggle	`run_annotate` (frontend enforces annotate requires normalize)

Pipeline Output ↔ API Mapping

Monitor and History views consume pipeline API responses as follows:

UI output area	API field(s) used
Monitor status/progress/log	`GET /api/pipeline/status`: `status`, `current_stage`, `progress_percent`, `log_lines`
Monitor completion message	`GET /api/pipeline/status`: `output_path` (shown when available)
Monitor stage chips	`current_stage` + requested stage toggles from start payload
History run list	`GET /api/pipeline/runs`: `run_id`, `path`, `timestamp`, `extractor_type`, `syllable_count`, `status`
History run detail metadata	`source_path`, `files_processed`, `processing_time`, `created_at_utc`, `completed_at_utc` (from `manifest.json`)
History output tree	`output_tree_lines` (manifest artifact list rendered as a deterministic tree)
History database stage chip	`stage_statuses.database`
History stage chips (all stages)	`stage_statuses.extract\|normalize\|annotate\|database`
History IPC hash fields	`ipc_input_hash`, `ipc_output_hash` (compact display + full tooltip)

Walker Controls ↔ API Mapping

Walk, Combine, and Select controls map to Walker endpoints as follows.

Walker control	API field	Runtime effect
Patch selector (A/B context)	`patch`	Routes request to isolated patch state.
Walk count / steps	`count` / `steps`	Sets number of generated walks and walk length.
Walk profile cards	`profile`	Named profile uses tuned walker profile path; `custom` uses explicit slider/spinner fields.
Walk max flips / temperature / frequency	`max_flips` / `temperature` / `frequency_weight`	Controls walker transition behavior in custom mode.
Walk neighbors	`neighbor_limit`	Limits candidate neighbors evaluated per step.
Walk min/max length	`min_length` / `max_length`	Constrains syllable-length eligibility for starts/transitions.
Walk seed	`seed`	Enables deterministic walk batches (internally offset per walk).
Combine profile cards	`profile` on `/api/walker/combine`	Chooses flat combiner mode vs walk-based generation mode.
Combine count/syllables/seed	`count` / `syllables` / `seed`	Controls candidate volume, name length classes, and deterministic sampling.
Selector class dropdown	`name_class` on `/api/walker/select`	Applies selected policy from `name_classes.yml`.
Selector mode/order/count/seed	`mode` / `order` / `count` / `seed`	Controls strictness, output ordering, and deterministic random ordering.

History Manifest Contract

History discovery is strict manifest-first (no legacy fallback parsing):

Run directory must contain manifest.json.
Manifest must include required contract keys: manifest_version, run_id, status, extractor, config, metrics, stages, artifacts.
run_id must match the run directory name.
Missing/corrupt/non-conformant manifests are skipped by discovery.

This keeps the run directory as the single source of truth and avoids cross-file drift between legacy metadata files and API payloads.

Pipeline Manifest and IPC

Each pipeline run writes <run_dir>/manifest.json as the canonical run record.

High-value fields used by History and diagnostics:

status plus created_at_utc / completed_at_utc
config and metrics (including files_processed and unique syllable count)
stages (per-stage status and duration)
artifacts (deterministic run output inventory)
ipc block:
- input_hash from canonical run configuration
- output_hash from canonical artifact+metric payload
- library metadata (version/ref) for provenance

Patch A/B Session IPC, Locks, and Rebuild Semantics

Session and patch restoration now use authoritative IPC artifacts:

Run-level state artifact:
- <run_dir>/ipc/walker_run_state.v1.json
Patch output sidecars (written and verified per run):
- <run_dir>/ipc/patch_a_walks.v1.json
- <run_dir>/ipc/patch_a_candidates.v1.json
- <run_dir>/ipc/patch_a_selections.v1.json
- <run_dir>/ipc/patch_a_package.v1.json
- same pattern for Patch B (patch_b_*)
Session artifact (runtime sessions base, not hardcoded to _working):
- <sessions_base>/<session_id>.json
- sessions_base resolves from explicit config override or defaults to <output_base>/sessions

Session lineage fields:

root_session_id: immutable origin session id
parent_session_id: immediate source session id for repaired revisions
revision: integer revision counter (original is 0)

Verification status semantics (API authority):

verified: all relevant IPC links/hashes are valid and trusted
mismatch: artifact exists but linkage/hash verification failed
missing: artifact or required hash fields are absent
error: parse/read/validation/internal failure

Stale session recovery vs repair:

Recovery on load-session is intentionally narrow: hash-drift mismatch can be loaded for continuity if the raw payload is readable.
Recovery does not auto-upgrade trust. The result remains integrity-signaled (stale/mismatch) until repaired.
Repair creates a new immutable revision (new session_id with lineage) and preserves prior artifact history.

Cooperative lock model:

Endpoints:
- POST /api/walker/session-lock/heartbeat
- POST /api/walker/session-lock/release
Lock lease TTL is currently 45 seconds and refreshed by heartbeat.
load-session acquires lock with lock_holder_id and optional force_lock (take-over flow).
Mutating endpoints enforce active session lock ownership.
This is an integrity/UX coordination mechanism for single-user multi-tab use, not a security or authorization boundary.

Patch comparison and rebuild policy decisions:

GET /api/walker/stats exposes:
- patch_comparison.corpus_hash_relation: same | different | unknown
- patch_comparison.policy: currently warn | none
Current product policy keeps compare mode as warn-only/no-policy (no block mode yet).
POST /api/walker/rebuild-reach-cache is already an explicit rebuild action. We intentionally do not expose a separate force/invalidate mode at this stage.

Manual QA Checklist (Phase 5)

Use this checklist when validating Walker session IPC behavior:

Two-tab lock conflict:
- Load same session in tab A and tab B with different holders.
- Confirm tab B shows lock conflict and cannot mutate without take-over.
- Take over in tab B and confirm tab A heartbeats/release reflect loss of ownership.
Stale recovery and immutable repair:
- Create a stale-session condition (hash drift), then load session.
- Confirm load is continuity-tolerant but explicitly marked stale/mismatch.
- Run repair and verify new session_id with incremented lineage revision.
- Confirm original session artifact remains unchanged.
Rebuild reach-cache states:
- Trigger rebuild and verify transition through guidance states (for example rebuilding -> rebuilt/verified).
- Validate status handling for verified, recommended, missing, error.
- Confirm IPC hashes update after successful rebuild.
Session list and detail integrity:
- Verify GET /api/walker/sessions shows verification, lineage, and lock metadata.
- Confirm UI labels and run detail reflect backend verification outputs exactly.
Regression safety:
- Validate walk/combine/select/package flows still work when session features are unused.
- Validate pipeline tab behavior is unchanged.

Notes

Dependencies:

Uses standard library http.server for the web interface (no Flask)
Uses subprocess for pipeline stage execution
Requires NumPy for efficient feature matrix operations (build-time dependency)

Troubleshooting:

Port Already in Use:

The server auto-discovers available ports starting at 8000. If a specific port is requested with --port and is unavailable, the server will fail with an error message.

# Auto-discover (tries 8000, 8001, 8002, ...)
python -m build_tools.syllable_walk_web

# Specific port (fails if unavailable)
python -m build_tools.syllable_walk_web --port 9000

No Runs Found:

If no runs are discovered in the Walker tab, ensure you have pipeline output directories in the configured output base, or use the Pipeline tab to run an extraction first. If patch-specific run roots are configured (corpus_dir_a / corpus_dir_b), verify those paths contain timestamped run directories with valid manifest.json files.

# Check for existing runs
ls _working/output/

# Or run the pipeline from the web UI's Pipeline tab

Walker Load Fails or Stalls:

Use GET /api/walker/stats as the source of truth:

If loader_status="loading", inspect loading_stage for current phase.
If loader_status="error", inspect loading_error and retry load.
walker_ready=true means walks/reaches are ready for that patch.

Common causes:

Run directory missing required artifacts (manifest declares missing files)
Corrupt/unreadable SQLite/JSON artifacts
Incompatible or malformed run directory copied into output roots

Rapid Corpus Switching (Race-Safe Behavior):

Loading a new run while a previous load is in progress is supported. The server uses per-patch load generations and accepts writes only from the current generation. Older background loads are ignored, preventing stale state from overwriting the newly selected corpus.

If you switch repeatedly:

trust the latest selected run in the UI;
use /api/walker/stats to confirm final corpus and loader_status.

Name Class Dropdown Empty or Unexpected:

Selector classes come from GET /api/walker/name-classes. If the dropdown is empty or stale:

verify API route availability and server health;
verify data/name_classes.yml exists and is valid YAML;
reload the page after fixing policy file issues.

Package Persistence Warnings:

The package endpoint always returns a ZIP download when package generation succeeds. Disk persistence to <output_base>/packages/ is best-effort; permission/path issues are logged as warnings on the server side and do not block the download.

Build-time tool:

This is a build-time analysis tool only - not used during runtime name generation.

Related Documentation:

Syllable Walker - Core syllable walker algorithm and CLI
Syllable Walker TUI - Interactive TUI for exploring phonetic space
Pipeline TUI - Interactive TUI for running extraction pipelines
Syllable Feature Annotator - Generates input data with phonetic features
Corpus SQLite Builder - Builds SQLite database for fast loading
Name Combiner - Generates name candidates
Name Selector - Selects names by policy

API Reference

Pipe-Works Build Tools — Web Application

Combined web interface for the Pipeline and Walker build tools, providing a browser-based alternative to pipeline_tui and syllable_walk_tui.

This is a build-time tool only — not used during runtime name generation.

Features:

Pipeline tool: extraction, normalization, annotation with live monitoring
Walker tool: dual-patch syllable walking, name combiner, name selector
Corpus analysis with terrain visualization and profile reach deep-dives
Name rendering and package export (ZIP with manifest + disk metadata persistence)
Dark/light theme support
18 API endpoints across Pipeline, Walker, Browse, Settings, and Version groups

Architecture:

api/: Request handlers (browse, pipeline, walker)
services/: Business logic (corpus_loader, combiner_runner, selector_runner, walk_generator, metrics, packager, pipeline_runner)
state.py: Dataclasses (PatchState, PipelineJobState, ServerState)
server.py: stdlib http.server with routing and static file serving

Usage:

Launch the web server from the command line:

python -m build_tools.syllable_walk_web
python -m build_tools.syllable_walk_web --port 9000
python -m build_tools.syllable_walk_web --output-base /path/to/output

Or programmatically:

>>> from build_tools.syllable_walk_web import run_server
>>> run_server(port=8000)

class build_tools.syllable_walk_web.CorpusBuilderHandler(request, client_address, server)[source]

Bases: BaseHTTPRequestHandler

HTTP request handler for the Corpus Builder web app.

Serves static files from the static/ directory and routes /api/* requests to the appropriate handlers.

do_GET()[source]

Handle GET requests.

Return type:: None

do_POST()[source]

Handle POST requests.

Return type:: None

log_message(format, *args)[source]

Override to respect verbose flag.

Return type:: None

server_version = 'PipeWorksCorpusBuilder/0.1'

service_log_label: str = 'syllable-walk-web'

state: ServerState = ServerState(patch_a=PatchState(run_id=None, corpus_type=None, corpus_dir=None, syllable_count=0, walker=None, walker_ready=False, loading_stage=None, load_generation=0, active_load_generation=None, loading_error=None, manifest_ipc_input_hash=None, manifest_ipc_output_hash=None, manifest_ipc_verification_status=None, manifest_ipc_verification_reason=None, reach_cache_status=None, reach_cache_ipc_input_hash=None, reach_cache_ipc_output_hash=None, reach_cache_ipc_verification_status=None, reach_cache_ipc_verification_reason=None, profile_reaches=None, annotated_data=None, frequencies=None, walks=[], candidates=None, candidates_path=None, selections_path=None, selected_names=[]), patch_b=PatchState(run_id=None, corpus_type=None, corpus_dir=None, syllable_count=0, walker=None, walker_ready=False, loading_stage=None, load_generation=0, active_load_generation=None, loading_error=None, manifest_ipc_input_hash=None, manifest_ipc_output_hash=None, manifest_ipc_verification_status=None, manifest_ipc_verification_reason=None, reach_cache_status=None, reach_cache_ipc_input_hash=None, reach_cache_ipc_output_hash=None, reach_cache_ipc_verification_status=None, reach_cache_ipc_verification_reason=None, profile_reaches=None, annotated_data=None, frequencies=None, walks=[], candidates=None, candidates_path=None, selections_path=None, selected_names=[]), pipeline_job=PipelineJobState(job_id=None, status='idle', config=None, current_stage=None, progress_percent=0, log_lines=[], output_path=None, error_message=None, process=None), output_base=PosixPath('_working/output'), sessions_base=None, corpus_dir_a=None, corpus_dir_b=None, walker_session_locks={}, walker_session_locks_guard=<unlocked _thread.lock object>, active_session_id=None, active_session_lock_holder_id=None)

verbose: bool = True

build_tools.syllable_walk_web.find_available_port(start=8000, max_tries=100)[source]

Find an available port starting from start.

Tries ports start through start + max_tries - 1. Returns the first available port, or None if none found.

Return type:: int | None

build_tools.syllable_walk_web.run_server(port=None, verbose=True, output_base=None, sessions_dir=None, corpus_dir_a=None, corpus_dir_b=None)[source]

Start the HTTP server.

Parameters:

port (int | None) – Port to listen on. If None, checks 8000-8099 first, then 8100-8999.
verbose (bool) – If True, log HTTP requests to stderr.
output_base (Path | None) – Base path for pipeline run discovery. Defaults to _working/output.
sessions_dir (Path | None) – Optional explicit directory for saved walker sessions. Defaults to None (callers derive output_base/sessions).
corpus_dir_a (str | None) – Run discovery directory for Patch A.
corpus_dir_b (str | None) – Run discovery directory for Patch B.

Returns:

0 for clean shutdown, 1 for error.

Return type:

Exit code