Syllable Walker Web

Overview

Pipe-Works Build Tools — Web Application

Combined web interface for the Pipeline and Walker build tools, providing a browser-based alternative to pipeline_tui and syllable_walk_tui.

This is a build-time tool only — not used during runtime name generation.

Features:
  • Pipeline tool: extraction, normalization, annotation with live monitoring

  • Walker tool: dual-patch syllable walking, name combiner, name selector

  • Corpus analysis with terrain visualization and profile reach deep-dives

  • Name rendering and package export (ZIP with manifest + disk metadata persistence)

  • Dark/light theme support

  • 18 API endpoints across Pipeline, Walker, Browse, Settings, and Version groups

Architecture:
  • api/: Request handlers (browse, pipeline, walker)

  • services/: Business logic (corpus_loader, combiner_runner, selector_runner, walk_generator, metrics, packager, pipeline_runner)

  • state.py: Dataclasses (PatchState, PipelineJobState, ServerState)

  • server.py: stdlib http.server with routing and static file serving

Usage:

Launch the web server from the command line:

python -m build_tools.syllable_walk_web
python -m build_tools.syllable_walk_web --port 9000
python -m build_tools.syllable_walk_web --output-base /path/to/output

Or programmatically:

>>> from build_tools.syllable_walk_web import run_server
>>> run_server(port=8000)
Syllable Walk Web — dual-patch Walker interface

Command-Line Interface

Launch the Pipe-Works Build Tools web application. Combines Pipeline (extraction/normalization/annotation) and Walker (dual-patch syllable walking, name generation) tools in a browser-based interface.

usage: python -m build_tools.syllable_walk_web [-h] [--port PORT] [--quiet]
                                               [--output-base OUTPUT_BASE]
                                               [--sessions-dir SESSIONS_DIR]
                                               [--config CONFIG]

Named Arguments

--port

Port to serve on. If not specified, automatically finds an available port (checks 8000-8099 first, then 8100-8999). Default: auto-detect

--quiet

Suppress HTTP request logging. Default: False

Default: False

--output-base

Base directory for pipeline run discovery. Default: _working/output

--sessions-dir

Optional directory for saved walker sessions. Default: <output_base>/sessions

--config

Path to INI config file. Reads the [build_tools] section for output_base, sessions_dir, corpus_dir_a, corpus_dir_b, port, and verbose. CLI arguments override INI values. Default: server.ini

Default: 'server.ini'

Examples:

# Launch on auto-detected port (default)
python -m build_tools.syllable_walk_web

# Launch on a specific port
python -m build_tools.syllable_walk_web --port 9000

# Launch in quiet mode (suppress HTTP request logs)
python -m build_tools.syllable_walk_web --quiet

# Use a custom config file
python -m build_tools.syllable_walk_web --config server.ini

Output Format

The web interface is an interactive browser-based tool with in-memory working state (pipeline job status, patch data, walks, candidates, selections).

It produces file outputs in two places:

  • Pipeline runs in <output_base>/<timestamp>_<extractor>/ (extract/normalize/annotate/db outputs)

  • Package builds from the Walker tab:

    • Browser download: <name>-<version>.zip (HTTP response from /api/walker/package)

    • Disk persistence (best-effort): <output_base>/packages/<name>-<version>_<timestamp>.zip plus <name>-<version>_<timestamp>_metadata.json

Interface Components:

  1. Pipeline tab — Run the full extraction pipeline from the browser:

    • Filesystem browser for source directory/file selection

    • Extractor selection (Pyphen or NLTK), pyphen language selection

    • Live monitor for stage progress and subprocess logs

    • Run history view backed by manifest-discovered run directories (refreshes on tab entry and after run completion)

  2. Walker tab — Dual-patch corpus exploration and name generation:

    • Load corpora into Patch A and Patch B for side-by-side comparison

    • Generate syllable walks with named profiles or custom walk parameters

    • Combine syllables into candidates in flat-sampling or walk-based mode

    • Select names by policy (first_name, last_name, place_name, etc.)

    • Reach deep-dive per profile (top reachable syllables with export)

    • Export selected names as text or build ZIP packages with manifest

Integration Guide

The web interface can run the full pipeline internally, so you can start from raw text without running CLI tools first.

Quickest path — start from scratch:

# Launch the web interface
python -m build_tools.syllable_walk_web

# In the browser:
# 1. Pipeline tab → browse to your source text → Start Pipeline
# 2. Walker tab → load the completed run into a patch → Walk / Combine / Select

Starting from existing pipeline output:

# If you already have pipeline runs in _working/output/
python -m build_tools.syllable_walk_web

# The Walker tab discovers runs automatically and lists them for loading

Custom output directory:

python -m build_tools.syllable_walk_web --output-base /path/to/corpus/output

INI configuration (``–config``):

The CLI reads [build_tools] settings from an INI file (default: server.ini). CLI arguments override INI values.

[build_tools]
output_base = _working/output
corpus_dir_a = /path/to/patch_a/runs
corpus_dir_b = /path/to/patch_b/runs
port = 8000
verbose = true

When to use this tool:

  • To run the full extraction pipeline without memorizing CLI arguments

  • To compare two corpora side-by-side (dual-patch mode)

  • To interactively explore syllable walks through a browser

  • To generate, filter, and export names in a single session

  • To build ZIP packages with manifest metadata for downstream consumption

Advanced Topics

Architecture

The module is organised into backend API, backend services, frontend modules, discovery/state, and server wiring:

Backend API handlers (api/):

  • browse.py — Filesystem directory listing

  • pipeline.py — Pipeline start/status/cancel endpoints

  • walker.py — Thin compatibility wrapper layer (route-level entrypoints)

  • walker_common.py — Shared validation/normalization helpers

  • walker_lock.py — Active session lock enforcement helpers

  • walker_session.py — Session save/list/load and run-state restore handlers

  • walker_cache_lock.py — Reach-cache rebuild + lock heartbeat/release handlers

  • walker_ops.py — Walk/combine/reach/select/export/package/analysis handlers

  • walker_types.py — TypedDict response contracts for extracted walker handler modules

Backend service modules (services/):

  • corpus_loader.py — Delegates to syllable_walk.db.load_syllables

  • combiner_runner.py — Delegates to name_combiner.combiner

  • selector_runner.py — Policy caching and delegation to name_selector

  • walk_generator.py — Walk generation with profile routing and seed offsets

  • metrics.py — Corpus shape metrics with length bucketing and terrain scores

  • packager.py — ZIP archive building with manifest and disk persistence

  • pipeline_runner.py — Background subprocess execution with cancellation

  • pipeline_manifest.py — Manifest IPC verification helpers

  • profile_reaches_cache.py — Reach profile cache read/write/verify helpers

  • walker_run_state_store.py — Authoritative run-local IPC sidecars for patch outputs

  • walker_session_store.py — Session artifact save/list/load/verify with lineage metadata

  • walker_session_lock.py — Cooperative single-user multi-tab lock leases (UX integrity)

  • session_paths.py — Runtime resolution of sessions base and session file paths

Frontend modules (static/js/walker/):

  • corpus.js — Orchestrator for Walk tab corpus/session behavior

  • corpus-api.js — Fetch wrappers for walker/session endpoints

  • corpus-state.js — In-memory UI state model

  • corpus-render.js — Hash/verification/rebuild/compare visual rendering

  • corpus-tooltips.js — Integrity/lock badge helpers and modal content

  • corpus-actions-session.js — Save/load/repair/takeover/release session actions

  • corpus-actions-cache.js — Rebuild reach-cache action wiring

  • corpus-contracts.js — Shared JSDoc typedef contracts for frontend payloads

  • controls.js / reach.js / operations.js — Walk, reach, combine/select/package controls and endpoint operations

Discovery and state:

  • run_discovery.py — Manifest-driven run discovery, selection discovery, and History payload shaping (status, timings, stage state, IPC hashes)

  • state.pyPatchState, PipelineJobState, and ServerState

Server (server.py):

  • stdlib http.server.ThreadingHTTPServer for concurrent XHR

  • Static file serving with directory-traversal guard

  • Route dispatch into API modules

  • Lazy API imports to avoid circular dependencies

Run Discovery

The server scans a base directory for run folders matching: YYYYMMDD_HHMMSS_{extractor}.

  • GET /api/pipeline/runs uses output_base by default.

  • GET /api/pipeline/runs?patch=a and ?patch=b use corpus_dir_a / corpus_dir_b when configured.

Discovery is strict and manifest-first:

  • Run folders must contain manifest.json.

  • manifest.json must include required keys and run_id must match folder name.

  • Missing/corrupt/non-conformant manifests are skipped (no legacy fallback parsing).

For each valid run, discovery reports:

  • folder/run id and extractor type

  • status and run timestamps

  • stage status map (extract/normalize/annotate/database)

  • manifest-derived metrics (including syllable count and processed-file count)

  • artifact paths (including corpus_db_path / annotated JSON when present)

  • IPC hashes (input/output) from manifest

  • selection file map by name class

Pipeline Execution Model

Pipeline execution runs in a background thread via services/pipeline_runner.py. Stages are subprocess-backed and logged line-by-line to job state:

  1. extract (always)

  2. normalize (if run_normalize=True)

  3. annotate (if run_annotate=True and normalize ran)

  4. database (runs after annotate; executes build_tools.corpus_sqlite_builder --force)

Status is polled through GET /api/pipeline/status and includes: status, current_stage, progress_percent, output_path, and structured log lines.

Corpus Loading and Walker Readiness

POST /api/walker/load-corpus performs two phases:

  1. Synchronous data load: uses services/corpus_loader.load_corpus, which delegates to build_tools.syllable_walk.db.load_syllables (SQLite preferred, JSON fallback).

  2. Background walker init: builds SyllableWalker and resolves profile reaches via run-local IPC cache.

Profile reach caching is run-directory local:

  • Cache path: <run_dir>/ipc/walker_profile_reaches.v1.json

  • Cache schema: build_tools/syllable_walk_web/schemas/walker_profile_reaches.v1.schema.json

  • Cache key material: - manifest IPC output hash (from <run_dir>/manifest.json) - walker graph settings (neighbor distance, inertia, feature costs) - reach settings (threshold + named profile parameters)

  • On cache hit, precomputed reaches are loaded.

  • On miss/invalid cache, reaches are recomputed and cache is rewritten.

The frontend polls GET /api/walker/stats until walker_ready=true. During load, loading_stage reports phase progress (e.g., building neighbor graph). The stats payload also includes reach_cache_status per patch (hit | miss | invalid | error | none) to make cache behavior explicit in diagnostics.

Important readiness guarantees:

  • Reach precomputation completes before walker_ready is set true.

  • loader_status and loading_error expose terminal failure states explicitly.

  • Load concurrency is guarded by per-patch generation tokens, so stale background threads cannot overwrite newer corpus loads.

Candidate Generation Modes

POST /api/walker/combine supports two modes:

  • Flat sampling (default; profile absent or "flat"): delegates to name_combiner.combine_syllables with frequency_weight.

  • Walk-based sampling (named profile or "custom"): generates walks first, then aggregates features from walked syllables.

The response includes generated, unique, and duplicates counts.

Dual-Patch Comparison

The Walker tab supports loading two independent corpora into Patch A and Patch B. Each patch maintains its own:

  • Annotated syllable data and frequency map

  • Walker instance (with pre-computed neighbor graph)

  • Generated walks, candidates, and selections

This enables side-by-side comparison of different extractors, languages, or source texts.

API Authority

The web frontend is presentation and UX only. The API is the behavioral authority for validation and execution semantics.

  • Frontend checks (for example min_length <= max_length) are UX helpers.

  • API handlers enforce the same constraints for all clients (UI and non-UI).

  • Requests that fail contract validation return JSON {"error": ...} with HTTP 400.

  • Backend response contracts for extracted walker handlers are declared in api/walker_types.py (TypedDict models).

  • Frontend request/response contract aliases are centralized in static/js/walker/corpus-contracts.js (JSDoc typedefs) and reused by corpus/session modules.

Examples of API-authoritative behavior:

  • POST /api/walker/walk validates numeric constraints including neighbor_limit, min_length, and max_length.

  • POST /api/pipeline/start validates min_syllable_length / max_syllable_length ranges server-side.

  • GET /api/walker/name-classes is the source of truth for selector class options (UI options are populated from this endpoint).

Walker State Model

GET /api/walker/stats returns independent status for patch_a and patch_b. Each patch reports loader_status plus readiness/error metadata.

loader_status

Meaning

idle

No active load thread. Patch may be empty, or may have prior corpus metadata without a currently running initialization.

loading

Corpus load generation is in progress. loading_stage reports the current phase (for example "Building neighbour graph").

ready

Walker and pre-computed reaches are available; walker_ready=true.

error

Current load generation failed. loading_error contains terminal error text.

Response fields per patch include:

  • corpus (active run_id)

  • corpus_type (nltk or pyphen)

  • syllable_count

  • walker_ready, loading_stage, loading_error, loader_status

  • has_walks, has_candidates, has_selections

  • reaches (when available; includes reach count and computation timing)

Patch Isolation and Race Safety

Patch A and Patch B are fully isolated in server state.

  • Loading a corpus resets only the target patch state.

  • Walks/candidates/selections from one patch never overwrite the other patch.

  • Loader concurrency is generation-token guarded:

    • each load-corpus increments load_generation;

    • background init writes are applied only if generation is still current;

    • stale loader threads exit without mutating patch state.

This prevents rapid corpus switches from producing stale overwrite races.

Determinism and Seed Behavior

  • Walk generation is deterministic for fixed request parameters and seed.

  • Batched walks use seed + i per walk to keep outputs deterministic while still varying entries within one request.

  • Flat combiner and selector paths accept explicit seed values for deterministic output ordering/sampling.

  • Without a seed, behavior remains valid but non-deterministic between runs.

API Endpoints

Endpoint

Method

Description

/api/pipeline/runs

GET

List discovered runs; supports ?patch=a|b for per-patch run roots

/api/pipeline/status

GET

Get pipeline job status, progress, and log lines

/api/pipeline/start

POST

Start extraction pipeline (source path, extractor, and optional stage/constraint fields)

/api/pipeline/cancel

POST

Cancel a running pipeline job

/api/browse-directory

POST

Browse a filesystem directory (for source/output selection)

/api/walker/stats

GET

Get dual-patch state (loaded corpora, loader/cache status, readiness, reach metadata)

/api/walker/analysis/{patch}

GET

Corpus shape metrics for a patch (terrain scores, distributions)

/api/walker/name-classes

GET

List available name class policies from name_classes.yml

/api/walker/load-corpus

POST

Load a run’s corpus into a patch (builds walker in background)

/api/walker/sessions

GET

List saved dual-patch sessions with verification and lock metadata

/api/walker/save-session

POST

Persist current patch assignments as one immutable session revision

/api/walker/load-session

POST

Load one saved session, verify references, restore trusted sidecars

/api/walker/walk

POST

Generate syllable walks with validated constraints and optional seed

/api/walker/combine

POST

Generate candidates (flat mode or walk-based mode), returns deduplication stats

/api/walker/reach-syllables

POST

Return top reachable syllables for one profile/patch (reach deep-dive tables)

/api/walker/select

POST

Select names by policy (name class, mode, count)

/api/walker/export

POST

Export selected names as a list

/api/walker/package

POST

Build ZIP archive with manifest (binary response) and persist package files to disk

/api/walker/rebuild-reach-cache

POST

Recompute and rewrite reach-cache IPC artifact for one loaded patch

/api/walker/session-lock/heartbeat

POST

Refresh active session lock lease for one holder

/api/walker/session-lock/release

POST

Release active session lock lease for one holder

/api/settings

GET

Get current server settings (resolved output_base and sessions_base)

/api/settings/output-base

POST

Update the output base directory

/api/version

GET

Return package version for UI header display

The web server uses Python’s standard library http.server (no Flask dependency).

Common Request Fields

Key request bodies for current API routes:

  • For mutating Walker endpoints (load-corpus, walk, combine, select, package, rebuild-reach-cache), include lock_holder_id when operating against an actively locked session.

Endpoint

Important request fields

POST /api/pipeline/start

source_path (required), output_dir (optional), extractor (default pyphen), language (default auto), file_pattern (default *.txt), min_syllable_length/max_syllable_length (defaults 2/8), run_normalize/run_annotate (default true/true)

POST /api/walker/load-corpus

patch (a/b), run_id (required non-empty string), optional lock_holder_id (required when active session is lock-guarded)

POST /api/walker/save-session

optional label, optional session_id (explicit id mode), optional repair_from_session_id (immutable revision mode), optional lock_holder_id (required when active session is lock-guarded)

POST /api/walker/load-session

session_id (required), optional lock_holder_id (recommended for lock-coordinated multi-tab flows; required when using force_lock), optional force_lock (take-over flow)

POST /api/walker/walk

patch, count, steps, seed, optional profile. Custom constraints are always accepted: max_flips, temperature, frequency_weight, neighbor_limit, min_length, max_length. API validates ranges (for example min_length <= max_length). Include lock_holder_id when session lock is active.

POST /api/walker/combine

patch, count, syllables (int or list), seed. Flat mode: frequency_weight. Walk mode: profile (named or custom); custom supports max_flips, temperature, frequency_weight. Include lock_holder_id when session lock is active.

POST /api/walker/reach-syllables

patch and profile (must match one of the precomputed profile keys)

POST /api/walker/select

patch, name_class, count, mode (hard/soft), order (alphabetical/random), seed. Include lock_holder_id when session lock is active.

POST /api/walker/package

name, version, include flags: include_walks_a, include_walks_b, include_candidates, include_selections. Include lock_holder_id when session lock is active.

POST /api/walker/rebuild-reach-cache

patch (required), optional run_id (must match loaded patch context if provided), optional lock_holder_id (required when session lock is active)

POST /api/walker/session-lock/heartbeat

session_id and lock_holder_id (both required)

POST /api/walker/session-lock/release

session_id and lock_holder_id (both required)

Walker Endpoint Contract Details

Endpoint

Contract and validation rules

Success payload highlights

GET /api/walker/stats

No request body. Returns state for both patches, including loader_status and optional reaches map when available. Includes patch_comparison with corpus_hash_relation and policy semantics.

patch_a / patch_b objects with corpus, readiness, loading/error fields, has_* flags, and top-level patch_comparison.

POST /api/walker/load-corpus

Requires patch in {"a","b"} and non-empty run_id. Errors for invalid patch, missing run, or corpus load failure. If active session lock is set, requires matching lock_holder_id.

patch, run_id, corpus_type, syllable_count, source, status="loading".

GET /api/walker/sessions

No request body. Lists saved session artifacts ordered newest-first. Includes verification, lineage, and lock metadata.

sessions list with session_id, patch run ids, verification status/reason, root_session_id, parent_session_id, revision, lock_status, lock.

POST /api/walker/save-session

Saves current patch references as session IPC artifact. session_id and repair_from_session_id are mutually exclusive. If active session lock is set, requires matching lock_holder_id.

status, reason, session_id, per-patch save status/reason, ipc_input_hash, ipc_output_hash, lineage fields.

POST /api/walker/load-session

Requires session_id. lock_holder_id is optional but recommended for lock-coordinated multi-tab flows; force_lock requires lock_holder_id and enables explicit take-over. Verifies session artifact, loads referenced patch runs, restores only verified run-state sidecars. Stale hash-drift session payloads may be loaded for continuity but remain explicitly integrity-signaled.

Per-patch loaded/restored/restored_kinds and verification_status/verification_reason, plus session_lock block and recovered_from_stale_session flag.

POST /api/walker/walk

Requires ready walker for target patch. Validates numeric fields: count >= 1, steps >= 0, max_flips >= 1, neighbor_limit >= 1, min_length >= 1, max_length >= 1, min_length <= max_length, temperature > 0, and integer-or-null seed. If active session lock is set, requires matching lock_holder_id.

patch and walks (each walk includes formatted, syllables, steps).

POST /api/walker/combine

Requires loaded corpus. profile controls mode: absent/flat uses flat combiner; named/custom profile uses walker path and requires walker readiness. If active session lock is set, requires matching lock_holder_id.

generated, unique, duplicates, syllables, source.

POST /api/walker/reach-syllables

Requires precomputed reaches and valid profile key for target patch. Errors if reach data or walker is not ready.

profile, reach, total, unique_reachable, syllables list.

POST /api/walker/select

Requires existing candidates. Validates patch and delegates policy validation to selector service (unknown name class returns error). If active session lock is set, requires matching lock_holder_id.

name_class, mode, count, requested, names.

POST /api/walker/export

Requires prior selection output for target patch.

patch, count, names.

POST /api/walker/package

Accepts package metadata and include flags. Builds ZIP from in-memory state. If active session lock is set, requires matching lock_holder_id.

Binary ZIP response with attachment filename <name>-<version>.zip.

POST /api/walker/rebuild-reach-cache

Requires loaded walker and patch context. Optional run_id must match loaded patch run when provided. If active session lock is set, requires matching lock_holder_id.

status="rebuilt", patch, run_id, cache IPC hashes, verification status/reason.

POST /api/walker/session-lock/heartbeat

Requires session_id and lock_holder_id. Returns held for active lease, missing when lease absent, and error payload on conflicts.

status/reason and lock payload when available.

POST /api/walker/session-lock/release

Requires session_id and lock_holder_id. Release succeeds only for the current lock owner.

status/reason and released lock payload when available.

Pipeline Configure ↔ API Mapping

The Pipeline Configure tab now maps directly to POST /api/pipeline/start:

Configure control

Request field / behavior

Source picker (directory or file)

source_path (required)

Output picker

output_dir (optional). If not selected, server default output_base is used.

Extractor (pyphen / nltk)

extractor

Language radios + custom language code

language. For pyphen, custom code overrides radio value; for nltk, frontend sends "auto".

File pattern

file_pattern

Min / Max syllable length

min_syllable_length / max_syllable_length (frontend validates min <= max and API rejects invalid ranges/types)

Normalize toggle

run_normalize

Annotate toggle

run_annotate (frontend enforces annotate requires normalize)

Pipeline Output ↔ API Mapping

Monitor and History views consume pipeline API responses as follows:

UI output area

API field(s) used

Monitor status/progress/log

GET /api/pipeline/status: status, current_stage, progress_percent, log_lines

Monitor completion message

GET /api/pipeline/status: output_path (shown when available)

Monitor stage chips

current_stage + requested stage toggles from start payload

History run list

GET /api/pipeline/runs: run_id, path, timestamp, extractor_type, syllable_count, status

History run detail metadata

source_path, files_processed, processing_time, created_at_utc, completed_at_utc (from manifest.json)

History output tree

output_tree_lines (manifest artifact list rendered as a deterministic tree)

History database stage chip

stage_statuses.database

History stage chips (all stages)

stage_statuses.extract|normalize|annotate|database

History IPC hash fields

ipc_input_hash, ipc_output_hash (compact display + full tooltip)

Walker Controls ↔ API Mapping

Walk, Combine, and Select controls map to Walker endpoints as follows.

Walker control

API field

Runtime effect

Patch selector (A/B context)

patch

Routes request to isolated patch state.

Walk count / steps

count / steps

Sets number of generated walks and walk length.

Walk profile cards

profile

Named profile uses tuned walker profile path; custom uses explicit slider/spinner fields.

Walk max flips / temperature / frequency

max_flips / temperature / frequency_weight

Controls walker transition behavior in custom mode.

Walk neighbors

neighbor_limit

Limits candidate neighbors evaluated per step.

Walk min/max length

min_length / max_length

Constrains syllable-length eligibility for starts/transitions.

Walk seed

seed

Enables deterministic walk batches (internally offset per walk).

Combine profile cards

profile on /api/walker/combine

Chooses flat combiner mode vs walk-based generation mode.

Combine count/syllables/seed

count / syllables / seed

Controls candidate volume, name length classes, and deterministic sampling.

Selector class dropdown

name_class on /api/walker/select

Applies selected policy from name_classes.yml.

Selector mode/order/count/seed

mode / order / count / seed

Controls strictness, output ordering, and deterministic random ordering.

History Manifest Contract

History discovery is strict manifest-first (no legacy fallback parsing):

  • Run directory must contain manifest.json.

  • Manifest must include required contract keys: manifest_version, run_id, status, extractor, config, metrics, stages, artifacts.

  • run_id must match the run directory name.

  • Missing/corrupt/non-conformant manifests are skipped by discovery.

This keeps the run directory as the single source of truth and avoids cross-file drift between legacy metadata files and API payloads.

Pipeline Manifest and IPC

Each pipeline run writes <run_dir>/manifest.json as the canonical run record.

High-value fields used by History and diagnostics:

  • status plus created_at_utc / completed_at_utc

  • config and metrics (including files_processed and unique syllable count)

  • stages (per-stage status and duration)

  • artifacts (deterministic run output inventory)

  • ipc block:

    • input_hash from canonical run configuration

    • output_hash from canonical artifact+metric payload

    • library metadata (version/ref) for provenance

Patch A/B Session IPC, Locks, and Rebuild Semantics

Session and patch restoration now use authoritative IPC artifacts:

  • Run-level state artifact:

    • <run_dir>/ipc/walker_run_state.v1.json

  • Patch output sidecars (written and verified per run):

    • <run_dir>/ipc/patch_a_walks.v1.json

    • <run_dir>/ipc/patch_a_candidates.v1.json

    • <run_dir>/ipc/patch_a_selections.v1.json

    • <run_dir>/ipc/patch_a_package.v1.json

    • same pattern for Patch B (patch_b_*)

  • Session artifact (runtime sessions base, not hardcoded to _working):

    • <sessions_base>/<session_id>.json

    • sessions_base resolves from explicit config override or defaults to <output_base>/sessions

Session lineage fields:

  • root_session_id: immutable origin session id

  • parent_session_id: immediate source session id for repaired revisions

  • revision: integer revision counter (original is 0)

Verification status semantics (API authority):

  • verified: all relevant IPC links/hashes are valid and trusted

  • mismatch: artifact exists but linkage/hash verification failed

  • missing: artifact or required hash fields are absent

  • error: parse/read/validation/internal failure

Stale session recovery vs repair:

  • Recovery on load-session is intentionally narrow: hash-drift mismatch can be loaded for continuity if the raw payload is readable.

  • Recovery does not auto-upgrade trust. The result remains integrity-signaled (stale/mismatch) until repaired.

  • Repair creates a new immutable revision (new session_id with lineage) and preserves prior artifact history.

Cooperative lock model:

  • Endpoints:

    • POST /api/walker/session-lock/heartbeat

    • POST /api/walker/session-lock/release

  • Lock lease TTL is currently 45 seconds and refreshed by heartbeat.

  • load-session acquires lock with lock_holder_id and optional force_lock (take-over flow).

  • Mutating endpoints enforce active session lock ownership.

  • This is an integrity/UX coordination mechanism for single-user multi-tab use, not a security or authorization boundary.

Patch comparison and rebuild policy decisions:

  • GET /api/walker/stats exposes:

    • patch_comparison.corpus_hash_relation: same | different | unknown

    • patch_comparison.policy: currently warn | none

  • Current product policy keeps compare mode as warn-only/no-policy (no block mode yet).

  • POST /api/walker/rebuild-reach-cache is already an explicit rebuild action. We intentionally do not expose a separate force/invalidate mode at this stage.

Manual QA Checklist (Phase 5)

Use this checklist when validating Walker session IPC behavior:

  1. Two-tab lock conflict:

    • Load same session in tab A and tab B with different holders.

    • Confirm tab B shows lock conflict and cannot mutate without take-over.

    • Take over in tab B and confirm tab A heartbeats/release reflect loss of ownership.

  2. Stale recovery and immutable repair:

    • Create a stale-session condition (hash drift), then load session.

    • Confirm load is continuity-tolerant but explicitly marked stale/mismatch.

    • Run repair and verify new session_id with incremented lineage revision.

    • Confirm original session artifact remains unchanged.

  3. Rebuild reach-cache states:

    • Trigger rebuild and verify transition through guidance states (for example rebuilding -> rebuilt/verified).

    • Validate status handling for verified, recommended, missing, error.

    • Confirm IPC hashes update after successful rebuild.

  4. Session list and detail integrity:

    • Verify GET /api/walker/sessions shows verification, lineage, and lock metadata.

    • Confirm UI labels and run detail reflect backend verification outputs exactly.

  5. Regression safety:

    • Validate walk/combine/select/package flows still work when session features are unused.

    • Validate pipeline tab behavior is unchanged.

Notes

Dependencies:

  • Uses standard library http.server for the web interface (no Flask)

  • Uses subprocess for pipeline stage execution

  • Requires NumPy for efficient feature matrix operations (build-time dependency)

Troubleshooting:

Port Already in Use:

The server auto-discovers available ports starting at 8000. If a specific port is requested with --port and is unavailable, the server will fail with an error message.

# Auto-discover (tries 8000, 8001, 8002, ...)
python -m build_tools.syllable_walk_web

# Specific port (fails if unavailable)
python -m build_tools.syllable_walk_web --port 9000

No Runs Found:

If no runs are discovered in the Walker tab, ensure you have pipeline output directories in the configured output base, or use the Pipeline tab to run an extraction first. If patch-specific run roots are configured (corpus_dir_a / corpus_dir_b), verify those paths contain timestamped run directories with valid manifest.json files.

# Check for existing runs
ls _working/output/

# Or run the pipeline from the web UI's Pipeline tab

Walker Load Fails or Stalls:

Use GET /api/walker/stats as the source of truth:

  • If loader_status="loading", inspect loading_stage for current phase.

  • If loader_status="error", inspect loading_error and retry load.

  • walker_ready=true means walks/reaches are ready for that patch.

Common causes:

  • Run directory missing required artifacts (manifest declares missing files)

  • Corrupt/unreadable SQLite/JSON artifacts

  • Incompatible or malformed run directory copied into output roots

Rapid Corpus Switching (Race-Safe Behavior):

Loading a new run while a previous load is in progress is supported. The server uses per-patch load generations and accepts writes only from the current generation. Older background loads are ignored, preventing stale state from overwriting the newly selected corpus.

If you switch repeatedly:

  • trust the latest selected run in the UI;

  • use /api/walker/stats to confirm final corpus and loader_status.

Name Class Dropdown Empty or Unexpected:

Selector classes come from GET /api/walker/name-classes. If the dropdown is empty or stale:

  • verify API route availability and server health;

  • verify data/name_classes.yml exists and is valid YAML;

  • reload the page after fixing policy file issues.

Package Persistence Warnings:

The package endpoint always returns a ZIP download when package generation succeeds. Disk persistence to <output_base>/packages/ is best-effort; permission/path issues are logged as warnings on the server side and do not block the download.

Build-time tool:

This is a build-time analysis tool only - not used during runtime name generation.

Related Documentation:

API Reference

Pipe-Works Build Tools — Web Application

Combined web interface for the Pipeline and Walker build tools, providing a browser-based alternative to pipeline_tui and syllable_walk_tui.

This is a build-time tool only — not used during runtime name generation.

Features:
  • Pipeline tool: extraction, normalization, annotation with live monitoring

  • Walker tool: dual-patch syllable walking, name combiner, name selector

  • Corpus analysis with terrain visualization and profile reach deep-dives

  • Name rendering and package export (ZIP with manifest + disk metadata persistence)

  • Dark/light theme support

  • 18 API endpoints across Pipeline, Walker, Browse, Settings, and Version groups

Architecture:
  • api/: Request handlers (browse, pipeline, walker)

  • services/: Business logic (corpus_loader, combiner_runner, selector_runner, walk_generator, metrics, packager, pipeline_runner)

  • state.py: Dataclasses (PatchState, PipelineJobState, ServerState)

  • server.py: stdlib http.server with routing and static file serving

Usage:

Launch the web server from the command line:

python -m build_tools.syllable_walk_web
python -m build_tools.syllable_walk_web --port 9000
python -m build_tools.syllable_walk_web --output-base /path/to/output

Or programmatically:

>>> from build_tools.syllable_walk_web import run_server
>>> run_server(port=8000)
class build_tools.syllable_walk_web.CorpusBuilderHandler(request, client_address, server)[source]

Bases: BaseHTTPRequestHandler

HTTP request handler for the Corpus Builder web app.

Serves static files from the static/ directory and routes /api/* requests to the appropriate handlers.

do_GET()[source]

Handle GET requests.

Return type:

None

do_POST()[source]

Handle POST requests.

Return type:

None

log_message(format, *args)[source]

Override to respect verbose flag.

Return type:

None

server_version = 'PipeWorksCorpusBuilder/0.1'
service_log_label: str = 'syllable-walk-web'
state: ServerState = ServerState(patch_a=PatchState(run_id=None, corpus_type=None, corpus_dir=None, syllable_count=0, walker=None, walker_ready=False, loading_stage=None, load_generation=0, active_load_generation=None, loading_error=None, manifest_ipc_input_hash=None, manifest_ipc_output_hash=None, manifest_ipc_verification_status=None, manifest_ipc_verification_reason=None, reach_cache_status=None, reach_cache_ipc_input_hash=None, reach_cache_ipc_output_hash=None, reach_cache_ipc_verification_status=None, reach_cache_ipc_verification_reason=None, profile_reaches=None, annotated_data=None, frequencies=None, walks=[], candidates=None, candidates_path=None, selections_path=None, selected_names=[]), patch_b=PatchState(run_id=None, corpus_type=None, corpus_dir=None, syllable_count=0, walker=None, walker_ready=False, loading_stage=None, load_generation=0, active_load_generation=None, loading_error=None, manifest_ipc_input_hash=None, manifest_ipc_output_hash=None, manifest_ipc_verification_status=None, manifest_ipc_verification_reason=None, reach_cache_status=None, reach_cache_ipc_input_hash=None, reach_cache_ipc_output_hash=None, reach_cache_ipc_verification_status=None, reach_cache_ipc_verification_reason=None, profile_reaches=None, annotated_data=None, frequencies=None, walks=[], candidates=None, candidates_path=None, selections_path=None, selected_names=[]), pipeline_job=PipelineJobState(job_id=None, status='idle', config=None, current_stage=None, progress_percent=0, log_lines=[], output_path=None, error_message=None, process=None), output_base=PosixPath('_working/output'), sessions_base=None, corpus_dir_a=None, corpus_dir_b=None, walker_session_locks={}, walker_session_locks_guard=<unlocked _thread.lock object>, active_session_id=None, active_session_lock_holder_id=None)
verbose: bool = True
build_tools.syllable_walk_web.find_available_port(start=8000, max_tries=100)[source]

Find an available port starting from start.

Tries ports start through start + max_tries - 1. Returns the first available port, or None if none found.

Return type:

int | None

build_tools.syllable_walk_web.run_server(port=None, verbose=True, output_base=None, sessions_dir=None, corpus_dir_a=None, corpus_dir_b=None)[source]

Start the HTTP server.

Parameters:
  • port (int | None) – Port to listen on. If None, checks 8000-8099 first, then 8100-8999.

  • verbose (bool) – If True, log HTTP requests to stderr.

  • output_base (Path | None) – Base path for pipeline run discovery. Defaults to _working/output.

  • sessions_dir (Path | None) – Optional explicit directory for saved walker sessions. Defaults to None (callers derive output_base/sessions).

  • corpus_dir_a (str | None) – Run discovery directory for Patch A.

  • corpus_dir_b (str | None) – Run discovery directory for Patch B.

Returns:

0 for clean shutdown, 1 for error.

Return type:

Exit code