Syllable Walker

Overview

Syllable Walker - Phonetic Feature Space Exploration

The syllable walker is a phonetic exploration tool that generates sequences of syllables by “walking” through phonetic feature space using cost-based random selection. It enables corpus analysis, pattern discovery, and exploration of phonetic relationships. This is a build-time analysis tool only - not used during runtime name generation.

The walker explores syllable datasets by moving probabilistically from one syllable to phonetically similar syllables. Each step considers:

  • Phonetic distance - How many features change (Hamming distance)

  • Frequency bias - Preference for common vs rare syllables

  • Temperature - Amount of randomness in selection

  • Inertia - Tendency to stay at current syllable

Key Features:

  • Four pre-configured profiles (clerical, dialect, goblin, ritual)

  • Custom parameter control for fine-tuned exploration

  • Deterministic walks (same seed = same walk, reproducible)

  • Batch processing to generate thousands of walks for analysis

  • Fast operation (<10ms per walk after initialization)

  • Large corpus support (efficiently handles 500k+ syllables)

Main Components:

  • SyllableWalker: Core walking algorithm with efficient neighbor graph

  • WalkProfile: Configuration preset for different walking behaviors

  • WALK_PROFILES: Predefined profiles (clerical, dialect, goblin, ritual)

Usage:
>>> from build_tools.syllable_walk import SyllableWalker
>>>
>>> # Load annotated syllables
>>> walker = SyllableWalker("data/annotated/syllables_annotated.json")
>>>
>>> # Walk using a profile
>>> walk = walker.walk_from_profile(
...     start="ka",
...     profile="dialect",
...     steps=5,
...     seed=42
... )
>>>
>>> # Display walk sequence
>>> print(" → ".join(s["syllable"] for s in walk))
ka → ki → ti → ta → da → de

CLI Usage:

# Walk with a profile
python -m build_tools.syllable_walk data.json --start ka --profile dialect --steps 5

# Batch walks for analysis
python -m build_tools.syllable_walk data.json --batch 100 --profile ritual

# For web interface, use the separate syllable_walk_web module:
python -m build_tools.syllable_walk_web

Core Concepts

Phonetic Distance

Each syllable has 12 binary phonetic features (from syllable_feature_annotator). The distance between two syllables is the number of features that differ (Hamming distance). The max_flips parameter limits how many features can change in a single step.

Neighbor Graph

During initialization, the walker pre-computes which syllables are “neighbors” (within the specified Hamming distance). This enables fast walk generation:

  • Distance 1: ~30 sec initialization, conservative walks

  • Distance 2: ~1 min initialization, moderate walks

  • Distance 3: ~3 min initialization, maximum flexibility

For 500k+ syllable datasets, distance 3 is recommended.

Determinism

The same seed always produces the same walk. This is essential for reproducible experiments, testing, and debugging. Each walk uses an isolated RNG instance to avoid global state contamination.

Walk Structure

Invariant: A syllable walk always produces one more syllable than the number of steps, as each step represents a transition (edge) between syllables (vertices).

Steps

Syllables Produced

Example

0

1

Starting syllable only (no transitions)

1

2

Start → one neighbor

5

6

Start → 5 transitions

10

11

Start → 10 transitions

This follows from graph theory: a path with n edges connects n+1 vertices.

Walk Profiles

The walker includes four pre-configured profiles:

Profile

Description

Steps

Max Flips

Temperature

Freq Weight

Use Case

clerical

Conservative, minimal change

5

1

0.3

1.0

Formal names

dialect

Balanced exploration

5

2

0.7

0.0

General use

goblin

Chaotic, high variation

5

2

1.5

-0.5

Exotic names

ritual

Maximum exploration

5

3

2.5

-1.0

Extreme variation

Frequency Weight controls syllable selection:

  • Positive values (e.g. 1.0) favor common syllables

  • Zero (0.0) is neutral

  • Negative values (e.g. -1.0) favor rare syllables

Temperature controls randomness:

  • Low (0.3) = more deterministic, prefer lowest-cost moves

  • High (2.5) = more random, explore high-cost moves

Command-Line Interface

Explore syllable feature space via cost-based random walks

usage: python -m build_tools.syllable_walk [-h] [--start SYLLABLE]
                                           [--profile NAME] [--steps N]
                                           [--seed SEED] [--max-flips N]
                                           [--temperature T]
                                           [--frequency-weight W]
                                           [--compare-profiles] [--batch N]
                                           [--search QUERY] [--output FILE]
                                           [--quiet] [--verbose]
                                           [--max-neighbor-distance N]
                                           data_file

Positional Arguments

data_file

Path to syllables_annotated.json file (output of syllable_feature_annotator). This file contains syllables with phonetic features and frequency information. Example: data/annotated/syllables_annotated.json

walk parameters

Parameters controlling syllable walk behavior. These work with any mode except –search.

--start

Starting syllable for the walk. If not specified, a random syllable will be chosen. Must be a syllable present in the data file. Use –search to find valid syllables. Examples: ‘ka’, ‘bak’, ‘the’. Default: random syllable

--profile

Possible choices: clerical, dialect, goblin, ritual

Walk profile preset defining behavior characteristics. Available profiles: clerical (conservative, favors common syllables), dialect (balanced exploration, neutral frequency), goblin (chaotic, favors rare syllables), ritual (maximum exploration, very rare syllables). Each profile has predefined max_flips, temperature, and frequency_weight values. Can be overridden with custom parameters. Default: dialect

Default: 'dialect'

--steps

Number of steps to take in the walk. Each step visits one syllable. Output length will be steps + 1 (includes starting syllable). Valid range: 0-1000. Examples: 5 (quick walk), 20 (longer exploration). Default: 5

Default: 5

--seed

Random seed for reproducible walks. Same seed with same parameters always produces identical walks. This is useful for testing, debugging, or generating consistent examples. If not specified, uses system randomness (non-reproducible). Examples: 42, 12345. Default: None (random)

custom parameters

Advanced parameters that override profile settings. Use these to fine-tune walk behavior beyond predefined profiles.

--max-flips

Possible choices: 1, 2, 3

Maximum number of phonetic features that can change per step. This controls the Hamming distance constraint between consecutive syllables. Higher values allow more dramatic phonetic changes. Valid values: 1 (very conservative), 2 (moderate), 3 (maximum). Must be <= max-neighbor-distance. Overrides profile setting. Examples: 1 for minimal change, 3 for maximum variation. Default: determined by profile

--temperature

Exploration temperature controlling randomness (0.1-5.0). Higher values increase randomness and exploration, making the walk more likely to choose high-cost transitions. Lower values make walks more deterministic, strongly preferring low-cost moves. Overrides profile setting. Typical values: 0.3 (conservative), 0.7 (balanced), 1.5 (exploratory), 2.5 (chaotic). Default: determined by profile

--frequency-weight

Frequency bias weight (-2.0 to 2.0). Controls whether the walk favors common or rare syllables. Positive values: Favor common syllables (e.g., 1.0 strongly favors common). Zero: Neutral, no frequency bias. Negative values: Favor rare syllables (e.g., -1.0 strongly favors rare). Overrides profile setting. Examples: 1.0 (prefer common), 0.0 (neutral), -1.0 (prefer rare). Default: determined by profile

operation modes

Different modes of operation. These modes are mutually exclusive. If no mode is specified, performs a single walk.

--compare-profiles

Compare all four walk profiles from the same starting syllable. Generates one walk for each profile (clerical, dialect, goblin, ritual) using the same seed (if specified), allowing direct comparison of different behaviors. The –profile argument is ignored in this mode. Output shows walks side-by-side with profile descriptions. Useful for understanding profile differences.

Default: False

--batch

Generate N walks in batch mode. Each walk starts from a random syllable (unless –start is specified, then all walks start from the same syllable). Useful for statistical analysis, corpus exploration, or generating large datasets. Combine with –output to save results to JSON file. Progress is shown during generation. Examples: –batch 100 for analysis, –batch 1000 for corpus stats. Valid range: 1-10000

--search

Search for syllables matching the query string. Performs case-insensitive substring match against all syllables in the dataset. Shows up to 20 matches with frequency information. Useful for finding valid starting syllables or exploring corpus contents. Does not perform walk generation. Examples: –search ‘th’ finds ‘the’, ‘thi’, ‘tha’, etc. –search ‘ka’ finds ‘ka’, ‘kan’, ‘kaf’, etc.

output options

Control output format, destination, and verbosity.

--output

Save results to JSON file instead of printing to console. Parent directories will be created if they don’t exist. Output format depends on mode: single walk saves walk details with profile and seed info; batch mode saves array of walks with metadata. File can be used for further analysis or visualization. Examples: –output results/walks.json, –output batch_data.json

--quiet

Suppress progress messages and verbose output. Only prints final results or errors. Useful for scripting, piping output, or when running in automated environments. Cannot be combined with –verbose. Progress bars and initialization messages are hidden in quiet mode.

Default: False

--verbose

Enable verbose output showing initialization progress, neighbor graph construction details, and detailed walk information. Shows memory usage, processing time, and intermediate steps. Useful for understanding performance, debugging, or learning how the walker works. Cannot be combined with –quiet. Significantly increases output volume.

Default: False

walker configuration

Advanced configuration for the walker engine. These settings affect initialization time and memory usage.

--max-neighbor-distance

Possible choices: 1, 2, 3

Maximum Hamming distance for pre-computing neighbor graph (1-3). During initialization, the walker computes which syllables are ‘neighbors’ (similar in phonetic features). Higher values allow larger –max-flips but significantly increase initialization time and memory usage. Should be >= largest –max-flips you plan to use. Initialization time (500k syllables): ~30 sec (1), ~1 min (2), ~3 min (3). Memory impact: ~50MB (1), ~150MB (2), ~300MB (3). Default: 3 (recommended for maximum flexibility)

Default: 3

# Generate a single walk with default profile (dialect)
python -m build_tools.syllable_walk data.json --start ka

# Use specific profile
python -m build_tools.syllable_walk data.json --start bak --profile goblin --steps 10

# Compare all profiles from same starting point
python -m build_tools.syllable_walk data.json --start ka --compare-profiles

# Generate batch of 50 walks and save to JSON
python -m build_tools.syllable_walk data.json --batch 50 --profile ritual --output walks.json

# Search for syllables containing "th"
python -m build_tools.syllable_walk data.json --search "th"

# Custom walk parameters (overrides profile)
python -m build_tools.syllable_walk data.json --start ka --steps 10 \
    --max-flips 2 --temperature 1.5 --frequency-weight -0.8 --seed 42

For interactive web interface, use the separate module:

python -m build_tools.syllable_walk_web
python -m build_tools.syllable_walk_web --port 9000

For detailed documentation, see: claude/build_tools/syllable_walk.md

Integration Guide

The syllable walker uses output from the feature annotator and/or the corpus database builder. It automatically discovers pipeline run directories from _working/output/.

Recommended Workflow:

# Step 1: Extract and normalize syllables
python -m build_tools.pyphen_syllable_extractor --file wordlist.txt
python -m build_tools.pyphen_syllable_normaliser \
  --run-dir _working/output/20260110_115453_pyphen/

# Step 2: Annotate with phonetic features
python -m build_tools.syllable_feature_annotator \
  --syllables _working/output/20260110_115453_pyphen/pyphen_syllables_unique.txt \
  --frequencies _working/output/20260110_115453_pyphen/pyphen_syllables_frequencies.json

# Step 3: (Optional) Build SQLite database for faster loading
python -m build_tools.corpus_sqlite_builder \
  --run-dir _working/output/20260110_115453_pyphen/

# Step 4: Explore syllable walks (choose one interface)

# CLI-based exploration
python -m build_tools.syllable_walk \
  _working/output/20260110_115453_pyphen/data/pyphen_syllables_annotated.json \
  --start ka --profile dialect --steps 10

# Web interface (separate module)
python -m build_tools.syllable_walk_web
# Auto-discovers port starting at 8000
# Shows all available run directories with selection counts

When to use this tool:

  • To explore phonetic connectivity in your syllable corpus

  • To compare different extractors (pyphen vs NLTK) and their phonetic behaviors

  • To test if desired phonetic transitions exist before creating patterns

  • To discover interesting phonetic progressions for name generation

  • To batch-generate walks for analysis

For browsing name selections and interactive web-based exploration, see Syllable Walker Web.

Advanced Topics

Algorithm Details

Cost Function:

Each potential step has a cost based on:

  1. Hamming distance - Number of features that change

  2. Feature-specific costs - Some features cost more to change

  3. Frequency weight - Bias toward common or rare syllables

  4. Inertia - Tendency to stay at current syllable

The walker uses softmax selection with temperature to probabilistically choose the next syllable:

For each neighbor n:
  hamming_cost = sum(feature_costs[i] for i where features differ)
  freq_cost = frequency_weight × log(frequency[n])
  total_cost = hamming_cost + freq_cost + inertia_cost

Probability of selecting n:
  P(n) = exp(-cost(n) / temperature) / sum(exp(-cost(k) / temperature))

Higher temperature = more random selection (flattens probability distribution)

Lower temperature = more deterministic (strongly favors lowest cost)

Performance

Walk Generation:

  • After initialization: <10ms per walk (instant)

  • Deterministic: Same seed always produces same walk

  • Scalable: Speed independent of corpus size

Initialization:

The neighbor graph must be built on startup, which takes time depending on max_neighbor_distance:

  • Distance 1: ~30 sec initialization

  • Distance 2: ~1 min initialization

  • Distance 3: ~3 min initialization (recommended for large corpora)

Notes

Dependencies:

  • Requires NumPy for efficient feature matrix operations (build-time dependency)

Troubleshooting:

Invalid Start Syllable:

If you get an error about an unknown syllable, use --search to find valid syllables:

# Search for syllables containing "th"
python -m build_tools.syllable_walk data.json --search "th"

Build-time tool:

This is a build-time analysis tool only - not used during runtime name generation.

Related Documentation:

For detailed usage guide, see: claude/build_tools/syllable_walk.md

API Reference

Syllable Walker - Phonetic Feature Space Exploration

The syllable walker is a phonetic exploration tool that generates sequences of syllables by “walking” through phonetic feature space using cost-based random selection. It enables corpus analysis, pattern discovery, and exploration of phonetic relationships. This is a build-time analysis tool only - not used during runtime name generation.

The walker explores syllable datasets by moving probabilistically from one syllable to phonetically similar syllables. Each step considers:

  • Phonetic distance - How many features change (Hamming distance)

  • Frequency bias - Preference for common vs rare syllables

  • Temperature - Amount of randomness in selection

  • Inertia - Tendency to stay at current syllable

Key Features:

  • Four pre-configured profiles (clerical, dialect, goblin, ritual)

  • Custom parameter control for fine-tuned exploration

  • Deterministic walks (same seed = same walk, reproducible)

  • Batch processing to generate thousands of walks for analysis

  • Fast operation (<10ms per walk after initialization)

  • Large corpus support (efficiently handles 500k+ syllables)

Main Components:

  • SyllableWalker: Core walking algorithm with efficient neighbor graph

  • WalkProfile: Configuration preset for different walking behaviors

  • WALK_PROFILES: Predefined profiles (clerical, dialect, goblin, ritual)

Usage:
>>> from build_tools.syllable_walk import SyllableWalker
>>>
>>> # Load annotated syllables
>>> walker = SyllableWalker("data/annotated/syllables_annotated.json")
>>>
>>> # Walk using a profile
>>> walk = walker.walk_from_profile(
...     start="ka",
...     profile="dialect",
...     steps=5,
...     seed=42
... )
>>>
>>> # Display walk sequence
>>> print(" → ".join(s["syllable"] for s in walk))
ka → ki → ti → ta → da → de

CLI Usage:

# Walk with a profile
python -m build_tools.syllable_walk data.json --start ka --profile dialect --steps 5

# Batch walks for analysis
python -m build_tools.syllable_walk data.json --batch 100 --profile ritual

# For web interface, use the separate syllable_walk_web module:
python -m build_tools.syllable_walk_web
class build_tools.syllable_walk.ReachResult(profile_name, reach, total, threshold, max_flips, temperature, frequency_weight, computation_ms, unique_reachable=0, reachable_indices=())[source]

Bases: object

Result of a thermodynamic reach computation for a single profile.

Encapsulates both the reach count and the full context of how it was computed, including the profile parameters and timing metadata.

profile_name

Name of the profile (e.g., “clerical”, “dialect”).

reach

Mean number of syllables reachable per starting node (rounded). This is the primary micro signal — the average effective vocabulary size at each step of a walk under this profile’s constraints.

total

Total syllables in the corpus (the “field” size).

threshold

Probability threshold used for the reachability test. A syllable is counted if p > threshold from the starting node.

max_flips

Profile’s max_flips parameter (edge existence constraint).

temperature

Profile’s temperature parameter (probability shape).

frequency_weight

Profile’s frequency_weight parameter (rarity bias).

computation_ms

Wall-clock time for this profile’s computation in milliseconds. Captured as metadata to monitor performance across different systems and corpus sizes.

unique_reachable

Total unique syllables reachable from at least one starting node (union across all nodes). This is supplementary context — the mean per-node count (reach) is the primary metric displayed in the UI.

reachable_indices

Tuple of (syllable_index, reachability_count) pairs for all syllables in the union reachable set, sorted by reachability count descending (most commonly reachable first). The count is how many starting nodes can reach that syllable. Maps to syllable text via walker.syllables[idx]. Omitted from to_dict() to keep API responses lean.

Example

>>> result = ReachResult(
...     profile_name="dialect",
...     reach=32,
...     total=2088,
...     threshold=0.001,
...     max_flips=2,
...     temperature=0.7,
...     frequency_weight=0.0,
...     computation_ms=42.5,
...     unique_reachable=1850,
... )
>>> result.reach
32
computation_ms: float
frequency_weight: float
max_flips: int
profile_name: str
reach: int
reachable_indices: tuple[tuple[int, int], ...] = ()
temperature: float
threshold: float
to_dict()[source]

Serialise to a plain dictionary for API responses.

Return type:

dict[str, Any]

Returns:

Dictionary with all fields, suitable for JSON serialisation.

Example

>>> result.to_dict()
{'profile_name': 'dialect', 'reach': 32, 'total': 2088, ...}
total: int
unique_reachable: int = 0
class build_tools.syllable_walk.SyllableWalker(data_path, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False)[source]

Bases: object

Navigate syllable feature space via cost-based random walks.

This class efficiently handles large syllable datasets (500k+) by pre-computing neighbor relationships and using vectorized operations where possible.

The walker performs a one-time expensive computation during initialization to build a neighbor graph, mapping each syllable to nearby syllables within a maximum Hamming distance. After initialization, walk generation is extremely fast (<10ms per walk).

syllables

List of all syllable strings

frequencies

NumPy array of syllable frequencies (uint32)

feature_matrix

NumPy array of binary feature vectors (N x 12, uint8)

syllable_to_idx

Dict mapping syllable text to index

neighbor_graph

Dict mapping syllable index to list of neighbor indices

max_neighbor_distance

Maximum Hamming distance for neighbors

feature_costs

Dict of costs for each feature flip

inertia_cost

Cost of staying at current syllable

Example

>>> walker = SyllableWalker("syllables_annotated.json", verbose=True)
>>> walk = walker.walk_from_profile(
...     start="ka", profile="dialect", steps=5, seed=42
... )
>>> print(walker.format_walk(walk))
ka → ki → ti → ta → da → de

Notes

  • Initialization time: ~2-3 minutes for 500k syllables

  • Walk generation: <10ms per walk after initialization

  • Memory usage: ~200-300 MB for 500k syllables

  • Thread safety: Not thread-safe (use separate instances)

__init__(data_path, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False)[source]

Initialize the syllable walker with pre-computed neighbor graph.

Parameters:
  • data_path (Path | str) – Path to syllables_annotated.json file (output of syllable_feature_annotator)

  • max_neighbor_distance (int) – Maximum Hamming distance for pre-computing neighbors (1-3). Higher values = more neighbors = slower initialization + more memory, but allows larger feature flips per step. Default: 3 (recommended)

  • feature_costs (dict[str, float] | None) – Custom feature cost dictionary. If None, uses DEFAULT_FEATURE_COSTS. Keys must match FEATURE_KEYS.

  • inertia_cost (float) – Cost of staying at current syllable. Higher values discourage staying put. Default: 0.5

  • verbose (bool) – If True, print progress during initialization (neighbor graph construction can take 2-3 minutes for 500k syllables)

Raises:

Notes

  • Initialization performs expensive one-time computation

  • Use verbose=True for long-running initializations

  • Consider caching the neighbor graph (future optimization)

format_walk(walk, arrow=' ')[source]

Format a walk as a string with arrows.

Parameters:
  • walk (list[dict]) – Walk result from walk() or walk_from_profile()

  • arrow (str) – Separator between syllables (default: “ → “)

Return type:

str

Returns:

Formatted walk string

Example

>>> walk = walker.walk_from_profile("ka", "dialect", steps=5, seed=42)
>>> walker.format_walk(walk)
'ka → ki → ti → ta → da → de'
>>> walker.format_walk(walk, arrow=" -> ")
'ka -> ki -> ti -> ta -> da -> de'
classmethod from_data(data, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False, progress_callback=None)[source]

Create a SyllableWalker from in-memory data.

This is useful when syllable data is loaded from a source other than a JSON file (e.g., SQLite database).

Parameters:
  • data (list[dict]) – List of syllable records, each with keys: ‘syllable’, ‘frequency’, ‘features’ (dict of bool values)

  • max_neighbor_distance (int) – Maximum Hamming distance for neighbors (1-3)

  • feature_costs (dict[str, float] | None) – Custom feature cost dictionary

  • inertia_cost (float) – Cost of staying at current syllable

  • verbose (bool) – If True, print progress during initialization

  • progress_callback (Callable[[str], None] | None) – Optional callable invoked with a progress message string during neighbor graph construction. Used by the web UI to show live loading progress to the user.

Return type:

SyllableWalker

Returns:

Initialized SyllableWalker instance

Example

>>> data = [
...     {"syllable": "ka", "frequency": 100,
...      "features": {"starts_with_vowel": False, ...}}
... ]
>>> walker = SyllableWalker.from_data(data, verbose=True)
get_available_profiles()[source]

Get all available walk profiles.

Return type:

dict[str, WalkProfile]

Returns:

Dictionary mapping profile names to WalkProfile objects

Example

>>> profiles = walker.get_available_profiles()
>>> for name in profiles:
...     print(name)
clerical
dialect
goblin
ritual
get_random_syllable(seed=None, min_length=None, max_length=None)[source]

Get a random syllable from the dataset.

Parameters:
  • seed (int | None) – Random seed for reproducibility (default: None)

  • min_length (int | None) – Optional minimum syllable length filter.

  • max_length (int | None) – Optional maximum syllable length filter.

Return type:

str

Returns:

Random syllable text

Raises:

ValueError – If length constraints are invalid or no syllables match.

Example

>>> walker.get_random_syllable(seed=42)
'ka'
>>> walker.get_random_syllable(seed=42)
'ka'  # Same seed = same result
get_syllable_info(syllable)[source]

Get information about a specific syllable.

Parameters:

syllable (str) – Syllable text to look up

Returns:

syllable, frequency, features Returns None if syllable not found

Return type:

Syllable dictionary with keys

Example

>>> info = walker.get_syllable_info("ka")
>>> if info:
...     print(f"Frequency: {info['frequency']}")
Frequency: 1234
walk(start, steps, max_flips, temperature=1.0, frequency_weight=0.0, neighbor_limit=None, min_length=None, max_length=None, seed=None)[source]

Execute a syllable walk through feature space.

Starting from a syllable, takes steps steps through feature space, choosing each next syllable probabilistically based on: - Feature flip cost (weighted Hamming distance) - Frequency cost (rarity penalty/bonus) - Temperature (exploration vs exploitation) - Inertia (tendency to stay put)

The walk uses softmax selection over candidate neighbors: 1. Find all neighbors within max_flips distance 2. Compute cost for each neighbor (flip cost + rarity cost) 3. Add inertia option (staying at current syllable) 4. Apply softmax with temperature: weight_i = exp(-cost_i / T) 5. Sample next syllable proportional to weights

Parameters:
  • start (int | str) – Starting syllable (syllable text or index)

  • steps (int) – Number of steps to take (each step visits one syllable)

  • max_flips (int) – Maximum feature flips allowed per step (1-3). Must be <= max_neighbor_distance from __init__.

  • temperature (float) – Exploration temperature (0.1-5.0). Higher values increase randomness. Typical values: - 0.3: Conservative, minimal exploration - 0.7: Balanced - 1.5: High exploration - 2.5: Maximum randomness

  • frequency_weight (float) – Frequency bias (-2.0 to 2.0): - Positive: Favor common syllables - Zero: Neutral - Negative: Favor rare syllables Typical values: -1.0, 0.0, 1.0

  • neighbor_limit (int | None) – Optional cap on neighbor candidates considered at each step. None means use all neighbors.

  • min_length (int | None) – Optional minimum syllable character length allowed during traversal.

  • max_length (int | None) – Optional maximum syllable character length allowed during traversal.

  • seed (int | None) – Random seed for reproducibility. Same seed = same walk. If None, uses system randomness (non-reproducible).

Returns:

  • “syllable”: Syllable text (str)

  • ”frequency”: Corpus frequency (int)

  • ”features”: Binary feature vector (list of 12 ints)

Length = steps + 1 (includes starting syllable)

Return type:

List of syllable dictionaries with keys

Raises:

Example

>>> walker = SyllableWalker("data.json")
>>> walk = walker.walk(
...     start="ka",
...     steps=5,
...     max_flips=2,
...     temperature=0.7,
...     frequency_weight=0.0,
...     seed=42
... )
>>> len(walk)
6  # start + 5 steps
>>> walk[0]["syllable"]
'ka'

Notes

  • Deterministic: Same seed always produces same walk

  • Uses local Random instance (doesn’t affect global random state)

  • Inertia option allows walk to stay at current syllable

walk_from_profile(start, profile, steps=5, neighbor_limit=None, min_length=None, max_length=None, seed=None)[source]

Execute a walk using a named profile.

Convenience method that uses predefined WalkProfile parameters. See WALK_PROFILES for available profiles.

Parameters:
  • start (int | str) – Starting syllable (text or index)

  • profile (str | WalkProfile) – Profile name (“clerical”, “dialect”, “goblin”, “ritual”) or WalkProfile object

  • steps (int) – Number of steps to take (default: 5)

  • neighbor_limit (int | None) – Optional cap on neighbors considered per step.

  • min_length (int | None) – Optional minimum syllable length allowed.

  • max_length (int | None) – Optional maximum syllable length allowed.

  • seed (int | None) – Random seed for reproducibility (default: None)

Return type:

list[dict]

Returns:

List of syllable dictionaries (same as walk())

Raises:

ValueError – If profile name not found

Example

>>> walker = SyllableWalker("data.json")
>>> walk = walker.walk_from_profile("ka", "goblin", steps=10, seed=42)
>>> print(walker.format_walk(walk))
ka → kha → gha → ghe → ge → gwe → ...
class build_tools.syllable_walk.WalkProfile(name, description, max_flips, temperature, frequency_weight)[source]

Bases: object

Configuration profile for a syllable walk.

A profile encapsulates all parameters needed for a walk, providing named presets for different behaviors.

name

Human-readable profile name (e.g., “Dialect Walk”)

description

Brief description of profile behavior

max_flips

Maximum feature flips allowed per step (1-3)

temperature

Exploration temperature (0.1-5.0)

frequency_weight

Frequency bias (-2.0 to 2.0)

Example

>>> profile = WalkProfile(
...     name="Custom Walk",
...     description="High temperature, neutral frequency",
...     max_flips=2,
...     temperature=2.0,
...     frequency_weight=0.0
... )
>>> print(profile)
Custom Walk: High temperature, neutral frequency
__str__()[source]

String representation showing name and description.

Return type:

str

description: str
frequency_weight: float
max_flips: int
name: str
temperature: float
build_tools.syllable_walk.compute_all_reaches(walker, threshold=0.001, progress_callback=None)[source]

Compute mean effective vocabulary for all four named walk profiles.

Iterates over the predefined profiles (clerical, dialect, goblin, ritual) and computes the mean per-node thermodynamic reach for each. Returns a dictionary mapping profile names to their ReachResult.

This is intended to be called once after the walker finishes initialising, typically in the background thread that builds the neighbor graph. The results are cached in PatchState and served via the stats endpoint.

Parameters:
  • walker (SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph.

  • threshold (float) – Minimum transition probability for reachability. Default: 0.001. See DEFAULT_REACH_THRESHOLD for rationale.

  • progress_callback (Callable[[str], None] | None) – Optional callable invoked with a progress message after each profile is computed. Used by the web UI to show incremental reach results like "Computing reaches: clerical ~4, dialect ~32...".

Return type:

dict[str, ReachResult]

Returns:

Dictionary mapping profile name to ReachResult. Keys: "clerical", "dialect", "goblin", "ritual".

Example

>>> reaches = compute_all_reaches(walker)
>>> for name, r in reaches.items():
...     print(f"{name}: reach={r.reach}, time={r.computation_ms}ms")
clerical: reach=4, time=12.3ms
dialect: reach=32, time=15.1ms
goblin: reach=58, time=14.8ms
ritual: reach=147, time=18.2ms

Note

Custom profile reach is not computed here. See the TODO note in api/walker.py regarding on-demand computation for custom profiles.

build_tools.syllable_walk.compute_reach(walker, profile_name, max_flips, temperature, frequency_weight, threshold=0.001)[source]

Compute mean effective vocabulary for a single profile.

For each syllable in the corpus, computes the softmax transition probability distribution over all neighbors within max_flips distance, using the profile’s temperature and frequency_weight. Counts how many neighbors exceed the probability threshold, then returns the mean of these per-node counts as the reach value.

This replicates the same math as SyllableWalker.walk() (lines 526–549 of walker.py), but exhaustively over all starting nodes rather than sampling a single stochastic path.

The computation is:
  1. For each starting syllable s: a. Collect all neighbors within max_flips Hamming distance b. Compute cost per neighbor: flip_cost + rarity_cost c. Add inertia option (staying at s) for normalisation d. Apply softmax: weight_i = exp(-cost_i / temperature) e. Normalise to probabilities f. Count other syllables (not s itself) with p > threshold.

    Inertia participates in normalisation but self-transitions do not count toward reach.

  2. Return the mean per-node count (rounded to nearest integer)

Why mean-per-node instead of union?

The union approach (counting syllables reachable from any starting node) saturates to near-total for production corpora. With N=1,757 nodes and threshold=0.001, almost every syllable is reachable from at least one starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node count captures the effective vocabulary per step, which scales correctly with corpus size.

Parameters:
  • walker (SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph. Must have neighbor_graph, _flip_cost(), _rarity_cost(), _hamming_distance(), and inertia_cost available.

  • profile_name (str) – Human-readable name for the profile (e.g., “dialect”). Stored in the result for identification.

  • max_flips (int) – Maximum feature flips per step (1–3). Determines which edges in the neighbor graph are traversable.

  • temperature (float) – Softmax temperature (0.1–5.0). Controls the shape of the probability distribution. Low temperature concentrates probability on low-cost transitions; high temperature flattens the distribution toward uniform.

  • frequency_weight (float) – Frequency bias (-2.0 to 2.0). Positive values penalise rare syllables (favour common); negative values reward rare syllables (favour uncommon).

  • threshold (float) – Minimum transition probability for a syllable to be counted as “effectively reachable.” Default: 0.001.

Return type:

ReachResult

Returns:

ReachResult with the mean per-node reach count, corpus total, unique reachable count (union), and metadata.

Raises:

ValueError – If walker has no syllables loaded.

Example

>>> result = compute_reach(
...     walker, "dialect",
...     max_flips=2, temperature=0.7, frequency_weight=0.0,
... )
>>> print(f"Dialect reach: {result.reach} / {result.total}")
Dialect reach: 32 / 2088
build_tools.syllable_walk.get_profile(name)[source]

Get a walk profile by name.

Parameters:

name (str) – Profile name (case-insensitive)

Return type:

WalkProfile

Returns:

WalkProfile object

Raises:

ValueError – If profile name not found

Example

>>> profile = get_profile("goblin")
>>> profile.temperature
1.5
>>> profile = get_profile("GOBLIN")  # Case-insensitive
>>> profile.temperature
1.5
build_tools.syllable_walk.list_profiles()[source]

Get all available walk profiles.

Return type:

dict[str, WalkProfile]

Returns:

Dictionary mapping profile names to WalkProfile objects (copy)

Example

>>> profiles = list_profiles()
>>> for name, profile in profiles.items():
...     print(f"{name}: {profile.description}")
clerical: Conservative, favors common syllables, minimal phonetic change
dialect: Moderate exploration, neutral frequency bias
goblin: Chaotic, favors rare syllables, high phonetic variation
ritual: Maximum exploration, strongly favors rare syllables