Syllable Walker
Overview
Syllable Walker - Phonetic Feature Space Exploration
The syllable walker is a phonetic exploration tool that generates sequences of syllables by “walking” through phonetic feature space using cost-based random selection. It enables corpus analysis, pattern discovery, and exploration of phonetic relationships. This is a build-time analysis tool only - not used during runtime name generation.
The walker explores syllable datasets by moving probabilistically from one syllable to phonetically similar syllables. Each step considers:
Phonetic distance - How many features change (Hamming distance)
Frequency bias - Preference for common vs rare syllables
Temperature - Amount of randomness in selection
Inertia - Tendency to stay at current syllable
Key Features:
Four pre-configured profiles (clerical, dialect, goblin, ritual)
Custom parameter control for fine-tuned exploration
Deterministic walks (same seed = same walk, reproducible)
Batch processing to generate thousands of walks for analysis
Fast operation (<10ms per walk after initialization)
Large corpus support (efficiently handles 500k+ syllables)
Main Components:
SyllableWalker: Core walking algorithm with efficient neighbor graph
WalkProfile: Configuration preset for different walking behaviors
WALK_PROFILES: Predefined profiles (clerical, dialect, goblin, ritual)
- Usage:
>>> from build_tools.syllable_walk import SyllableWalker >>> >>> # Load annotated syllables >>> walker = SyllableWalker("data/annotated/syllables_annotated.json") >>> >>> # Walk using a profile >>> walk = walker.walk_from_profile( ... start="ka", ... profile="dialect", ... steps=5, ... seed=42 ... ) >>> >>> # Display walk sequence >>> print(" → ".join(s["syllable"] for s in walk)) ka → ki → ti → ta → da → de
CLI Usage:
# Walk with a profile python -m build_tools.syllable_walk data.json --start ka --profile dialect --steps 5 # Batch walks for analysis python -m build_tools.syllable_walk data.json --batch 100 --profile ritual # For web interface, use the separate syllable_walk_web module: python -m build_tools.syllable_walk_web
Core Concepts
Phonetic Distance
Each syllable has 12 binary phonetic features (from syllable_feature_annotator). The distance between
two syllables is the number of features that differ (Hamming distance). The max_flips parameter limits
how many features can change in a single step.
Neighbor Graph
During initialization, the walker pre-computes which syllables are “neighbors” (within the specified Hamming distance). This enables fast walk generation:
Distance 1: ~30 sec initialization, conservative walks
Distance 2: ~1 min initialization, moderate walks
Distance 3: ~3 min initialization, maximum flexibility
For 500k+ syllable datasets, distance 3 is recommended.
Determinism
The same seed always produces the same walk. This is essential for reproducible experiments, testing, and debugging. Each walk uses an isolated RNG instance to avoid global state contamination.
Walk Structure
Invariant: A syllable walk always produces one more syllable than the number of steps, as each step represents a transition (edge) between syllables (vertices).
Steps |
Syllables Produced |
Example |
|---|---|---|
0 |
1 |
Starting syllable only (no transitions) |
1 |
2 |
Start → one neighbor |
5 |
6 |
Start → 5 transitions |
10 |
11 |
Start → 10 transitions |
This follows from graph theory: a path with n edges connects n+1 vertices.
Walk Profiles
The walker includes four pre-configured profiles:
Profile |
Description |
Steps |
Max Flips |
Temperature |
Freq Weight |
Use Case |
|---|---|---|---|---|---|---|
clerical |
Conservative, minimal change |
5 |
1 |
0.3 |
1.0 |
Formal names |
dialect |
Balanced exploration |
5 |
2 |
0.7 |
0.0 |
General use |
goblin |
Chaotic, high variation |
5 |
2 |
1.5 |
-0.5 |
Exotic names |
ritual |
Maximum exploration |
5 |
3 |
2.5 |
-1.0 |
Extreme variation |
Frequency Weight controls syllable selection:
Positive values (e.g. 1.0) favor common syllables
Zero (0.0) is neutral
Negative values (e.g. -1.0) favor rare syllables
Temperature controls randomness:
Low (0.3) = more deterministic, prefer lowest-cost moves
High (2.5) = more random, explore high-cost moves
Command-Line Interface
Explore syllable feature space via cost-based random walks
usage: python -m build_tools.syllable_walk [-h] [--start SYLLABLE]
[--profile NAME] [--steps N]
[--seed SEED] [--max-flips N]
[--temperature T]
[--frequency-weight W]
[--compare-profiles] [--batch N]
[--search QUERY] [--output FILE]
[--quiet] [--verbose]
[--max-neighbor-distance N]
data_file
Positional Arguments
- data_file
Path to syllables_annotated.json file (output of syllable_feature_annotator). This file contains syllables with phonetic features and frequency information. Example: data/annotated/syllables_annotated.json
walk parameters
Parameters controlling syllable walk behavior. These work with any mode except –search.
- --start
Starting syllable for the walk. If not specified, a random syllable will be chosen. Must be a syllable present in the data file. Use –search to find valid syllables. Examples: ‘ka’, ‘bak’, ‘the’. Default: random syllable
- --profile
Possible choices: clerical, dialect, goblin, ritual
Walk profile preset defining behavior characteristics. Available profiles: clerical (conservative, favors common syllables), dialect (balanced exploration, neutral frequency), goblin (chaotic, favors rare syllables), ritual (maximum exploration, very rare syllables). Each profile has predefined max_flips, temperature, and frequency_weight values. Can be overridden with custom parameters. Default: dialect
Default:
'dialect'- --steps
Number of steps to take in the walk. Each step visits one syllable. Output length will be steps + 1 (includes starting syllable). Valid range: 0-1000. Examples: 5 (quick walk), 20 (longer exploration). Default: 5
Default:
5- --seed
Random seed for reproducible walks. Same seed with same parameters always produces identical walks. This is useful for testing, debugging, or generating consistent examples. If not specified, uses system randomness (non-reproducible). Examples: 42, 12345. Default: None (random)
custom parameters
Advanced parameters that override profile settings. Use these to fine-tune walk behavior beyond predefined profiles.
- --max-flips
Possible choices: 1, 2, 3
Maximum number of phonetic features that can change per step. This controls the Hamming distance constraint between consecutive syllables. Higher values allow more dramatic phonetic changes. Valid values: 1 (very conservative), 2 (moderate), 3 (maximum). Must be <= max-neighbor-distance. Overrides profile setting. Examples: 1 for minimal change, 3 for maximum variation. Default: determined by profile
- --temperature
Exploration temperature controlling randomness (0.1-5.0). Higher values increase randomness and exploration, making the walk more likely to choose high-cost transitions. Lower values make walks more deterministic, strongly preferring low-cost moves. Overrides profile setting. Typical values: 0.3 (conservative), 0.7 (balanced), 1.5 (exploratory), 2.5 (chaotic). Default: determined by profile
- --frequency-weight
Frequency bias weight (-2.0 to 2.0). Controls whether the walk favors common or rare syllables. Positive values: Favor common syllables (e.g., 1.0 strongly favors common). Zero: Neutral, no frequency bias. Negative values: Favor rare syllables (e.g., -1.0 strongly favors rare). Overrides profile setting. Examples: 1.0 (prefer common), 0.0 (neutral), -1.0 (prefer rare). Default: determined by profile
operation modes
Different modes of operation. These modes are mutually exclusive. If no mode is specified, performs a single walk.
- --compare-profiles
Compare all four walk profiles from the same starting syllable. Generates one walk for each profile (clerical, dialect, goblin, ritual) using the same seed (if specified), allowing direct comparison of different behaviors. The –profile argument is ignored in this mode. Output shows walks side-by-side with profile descriptions. Useful for understanding profile differences.
Default:
False- --batch
Generate N walks in batch mode. Each walk starts from a random syllable (unless –start is specified, then all walks start from the same syllable). Useful for statistical analysis, corpus exploration, or generating large datasets. Combine with –output to save results to JSON file. Progress is shown during generation. Examples: –batch 100 for analysis, –batch 1000 for corpus stats. Valid range: 1-10000
- --search
Search for syllables matching the query string. Performs case-insensitive substring match against all syllables in the dataset. Shows up to 20 matches with frequency information. Useful for finding valid starting syllables or exploring corpus contents. Does not perform walk generation. Examples: –search ‘th’ finds ‘the’, ‘thi’, ‘tha’, etc. –search ‘ka’ finds ‘ka’, ‘kan’, ‘kaf’, etc.
output options
Control output format, destination, and verbosity.
- --output
Save results to JSON file instead of printing to console. Parent directories will be created if they don’t exist. Output format depends on mode: single walk saves walk details with profile and seed info; batch mode saves array of walks with metadata. File can be used for further analysis or visualization. Examples: –output results/walks.json, –output batch_data.json
- --quiet
Suppress progress messages and verbose output. Only prints final results or errors. Useful for scripting, piping output, or when running in automated environments. Cannot be combined with –verbose. Progress bars and initialization messages are hidden in quiet mode.
Default:
False- --verbose
Enable verbose output showing initialization progress, neighbor graph construction details, and detailed walk information. Shows memory usage, processing time, and intermediate steps. Useful for understanding performance, debugging, or learning how the walker works. Cannot be combined with –quiet. Significantly increases output volume.
Default:
False
walker configuration
Advanced configuration for the walker engine. These settings affect initialization time and memory usage.
- --max-neighbor-distance
Possible choices: 1, 2, 3
Maximum Hamming distance for pre-computing neighbor graph (1-3). During initialization, the walker computes which syllables are ‘neighbors’ (similar in phonetic features). Higher values allow larger –max-flips but significantly increase initialization time and memory usage. Should be >= largest –max-flips you plan to use. Initialization time (500k syllables): ~30 sec (1), ~1 min (2), ~3 min (3). Memory impact: ~50MB (1), ~150MB (2), ~300MB (3). Default: 3 (recommended for maximum flexibility)
Default:
3
# Generate a single walk with default profile (dialect)
python -m build_tools.syllable_walk data.json --start ka
# Use specific profile
python -m build_tools.syllable_walk data.json --start bak --profile goblin --steps 10
# Compare all profiles from same starting point
python -m build_tools.syllable_walk data.json --start ka --compare-profiles
# Generate batch of 50 walks and save to JSON
python -m build_tools.syllable_walk data.json --batch 50 --profile ritual --output walks.json
# Search for syllables containing "th"
python -m build_tools.syllable_walk data.json --search "th"
# Custom walk parameters (overrides profile)
python -m build_tools.syllable_walk data.json --start ka --steps 10 \
--max-flips 2 --temperature 1.5 --frequency-weight -0.8 --seed 42
For interactive web interface, use the separate module:
python -m build_tools.syllable_walk_web
python -m build_tools.syllable_walk_web --port 9000
For detailed documentation, see: claude/build_tools/syllable_walk.md
Integration Guide
The syllable walker uses output from the feature annotator and/or the corpus database builder.
It automatically discovers pipeline run directories from _working/output/.
Recommended Workflow:
# Step 1: Extract and normalize syllables
python -m build_tools.pyphen_syllable_extractor --file wordlist.txt
python -m build_tools.pyphen_syllable_normaliser \
--run-dir _working/output/20260110_115453_pyphen/
# Step 2: Annotate with phonetic features
python -m build_tools.syllable_feature_annotator \
--syllables _working/output/20260110_115453_pyphen/pyphen_syllables_unique.txt \
--frequencies _working/output/20260110_115453_pyphen/pyphen_syllables_frequencies.json
# Step 3: (Optional) Build SQLite database for faster loading
python -m build_tools.corpus_sqlite_builder \
--run-dir _working/output/20260110_115453_pyphen/
# Step 4: Explore syllable walks (choose one interface)
# CLI-based exploration
python -m build_tools.syllable_walk \
_working/output/20260110_115453_pyphen/data/pyphen_syllables_annotated.json \
--start ka --profile dialect --steps 10
# Web interface (separate module)
python -m build_tools.syllable_walk_web
# Auto-discovers port starting at 8000
# Shows all available run directories with selection counts
When to use this tool:
To explore phonetic connectivity in your syllable corpus
To compare different extractors (pyphen vs NLTK) and their phonetic behaviors
To test if desired phonetic transitions exist before creating patterns
To discover interesting phonetic progressions for name generation
To batch-generate walks for analysis
For browsing name selections and interactive web-based exploration, see Syllable Walker Web.
Advanced Topics
Algorithm Details
Cost Function:
Each potential step has a cost based on:
Hamming distance - Number of features that change
Feature-specific costs - Some features cost more to change
Frequency weight - Bias toward common or rare syllables
Inertia - Tendency to stay at current syllable
The walker uses softmax selection with temperature to probabilistically choose the next syllable:
For each neighbor n:
hamming_cost = sum(feature_costs[i] for i where features differ)
freq_cost = frequency_weight × log(frequency[n])
total_cost = hamming_cost + freq_cost + inertia_cost
Probability of selecting n:
P(n) = exp(-cost(n) / temperature) / sum(exp(-cost(k) / temperature))
Higher temperature = more random selection (flattens probability distribution)
Lower temperature = more deterministic (strongly favors lowest cost)
Performance
Walk Generation:
After initialization: <10ms per walk (instant)
Deterministic: Same seed always produces same walk
Scalable: Speed independent of corpus size
Initialization:
The neighbor graph must be built on startup, which takes time depending on
max_neighbor_distance:
Distance 1: ~30 sec initialization
Distance 2: ~1 min initialization
Distance 3: ~3 min initialization (recommended for large corpora)
Notes
Dependencies:
Requires NumPy for efficient feature matrix operations (build-time dependency)
Troubleshooting:
Invalid Start Syllable:
If you get an error about an unknown syllable, use --search to find valid syllables:
# Search for syllables containing "th"
python -m build_tools.syllable_walk data.json --search "th"
Build-time tool:
This is a build-time analysis tool only - not used during runtime name generation.
Related Documentation:
Syllable Walker Web - Browser-based Pipeline + Walker interface with run discovery and dual patches
Syllable Walker TUI - Interactive TUI for exploring phonetic space
Syllable Feature Annotator - Generates input data with phonetic features
Corpus SQLite Builder - Builds SQLite database for fast loading
Name Combiner - Generates name candidates
Name Selector - Selects names by policy
For detailed usage guide, see: claude/build_tools/syllable_walk.md
API Reference
Syllable Walker - Phonetic Feature Space Exploration
The syllable walker is a phonetic exploration tool that generates sequences of syllables by “walking” through phonetic feature space using cost-based random selection. It enables corpus analysis, pattern discovery, and exploration of phonetic relationships. This is a build-time analysis tool only - not used during runtime name generation.
The walker explores syllable datasets by moving probabilistically from one syllable to phonetically similar syllables. Each step considers:
Phonetic distance - How many features change (Hamming distance)
Frequency bias - Preference for common vs rare syllables
Temperature - Amount of randomness in selection
Inertia - Tendency to stay at current syllable
Key Features:
Four pre-configured profiles (clerical, dialect, goblin, ritual)
Custom parameter control for fine-tuned exploration
Deterministic walks (same seed = same walk, reproducible)
Batch processing to generate thousands of walks for analysis
Fast operation (<10ms per walk after initialization)
Large corpus support (efficiently handles 500k+ syllables)
Main Components:
SyllableWalker: Core walking algorithm with efficient neighbor graph
WalkProfile: Configuration preset for different walking behaviors
WALK_PROFILES: Predefined profiles (clerical, dialect, goblin, ritual)
- Usage:
>>> from build_tools.syllable_walk import SyllableWalker >>> >>> # Load annotated syllables >>> walker = SyllableWalker("data/annotated/syllables_annotated.json") >>> >>> # Walk using a profile >>> walk = walker.walk_from_profile( ... start="ka", ... profile="dialect", ... steps=5, ... seed=42 ... ) >>> >>> # Display walk sequence >>> print(" → ".join(s["syllable"] for s in walk)) ka → ki → ti → ta → da → de
CLI Usage:
# Walk with a profile python -m build_tools.syllable_walk data.json --start ka --profile dialect --steps 5 # Batch walks for analysis python -m build_tools.syllable_walk data.json --batch 100 --profile ritual # For web interface, use the separate syllable_walk_web module: python -m build_tools.syllable_walk_web
- class build_tools.syllable_walk.ReachResult(profile_name, reach, total, threshold, max_flips, temperature, frequency_weight, computation_ms, unique_reachable=0, reachable_indices=())[source]
Bases:
objectResult of a thermodynamic reach computation for a single profile.
Encapsulates both the reach count and the full context of how it was computed, including the profile parameters and timing metadata.
- profile_name
Name of the profile (e.g., “clerical”, “dialect”).
- reach
Mean number of syllables reachable per starting node (rounded). This is the primary micro signal — the average effective vocabulary size at each step of a walk under this profile’s constraints.
- total
Total syllables in the corpus (the “field” size).
- threshold
Probability threshold used for the reachability test. A syllable is counted if p > threshold from the starting node.
- max_flips
Profile’s max_flips parameter (edge existence constraint).
- temperature
Profile’s temperature parameter (probability shape).
- frequency_weight
Profile’s frequency_weight parameter (rarity bias).
- computation_ms
Wall-clock time for this profile’s computation in milliseconds. Captured as metadata to monitor performance across different systems and corpus sizes.
- unique_reachable
Total unique syllables reachable from at least one starting node (union across all nodes). This is supplementary context — the mean per-node count (
reach) is the primary metric displayed in the UI.
- reachable_indices
Tuple of
(syllable_index, reachability_count)pairs for all syllables in the union reachable set, sorted by reachability count descending (most commonly reachable first). The count is how many starting nodes can reach that syllable. Maps to syllable text viawalker.syllables[idx]. Omitted fromto_dict()to keep API responses lean.
Example
>>> result = ReachResult( ... profile_name="dialect", ... reach=32, ... total=2088, ... threshold=0.001, ... max_flips=2, ... temperature=0.7, ... frequency_weight=0.0, ... computation_ms=42.5, ... unique_reachable=1850, ... ) >>> result.reach 32
- class build_tools.syllable_walk.SyllableWalker(data_path, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False)[source]
Bases:
objectNavigate syllable feature space via cost-based random walks.
This class efficiently handles large syllable datasets (500k+) by pre-computing neighbor relationships and using vectorized operations where possible.
The walker performs a one-time expensive computation during initialization to build a neighbor graph, mapping each syllable to nearby syllables within a maximum Hamming distance. After initialization, walk generation is extremely fast (<10ms per walk).
- syllables
List of all syllable strings
- frequencies
NumPy array of syllable frequencies (uint32)
- feature_matrix
NumPy array of binary feature vectors (N x 12, uint8)
- syllable_to_idx
Dict mapping syllable text to index
- neighbor_graph
Dict mapping syllable index to list of neighbor indices
- max_neighbor_distance
Maximum Hamming distance for neighbors
- feature_costs
Dict of costs for each feature flip
- inertia_cost
Cost of staying at current syllable
Example
>>> walker = SyllableWalker("syllables_annotated.json", verbose=True) >>> walk = walker.walk_from_profile( ... start="ka", profile="dialect", steps=5, seed=42 ... ) >>> print(walker.format_walk(walk)) ka → ki → ti → ta → da → de
Notes
Initialization time: ~2-3 minutes for 500k syllables
Walk generation: <10ms per walk after initialization
Memory usage: ~200-300 MB for 500k syllables
Thread safety: Not thread-safe (use separate instances)
- __init__(data_path, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False)[source]
Initialize the syllable walker with pre-computed neighbor graph.
- Parameters:
data_path (
Path|str) – Path to syllables_annotated.json file (output of syllable_feature_annotator)max_neighbor_distance (
int) – Maximum Hamming distance for pre-computing neighbors (1-3). Higher values = more neighbors = slower initialization + more memory, but allows larger feature flips per step. Default: 3 (recommended)feature_costs (
dict[str,float] |None) – Custom feature cost dictionary. If None, uses DEFAULT_FEATURE_COSTS. Keys must match FEATURE_KEYS.inertia_cost (
float) – Cost of staying at current syllable. Higher values discourage staying put. Default: 0.5verbose (
bool) – If True, print progress during initialization (neighbor graph construction can take 2-3 minutes for 500k syllables)
- Raises:
FileNotFoundError – If data_path does not exist
ValueError – If data_path is not valid JSON
ValueError – If feature_costs keys don’t match FEATURE_KEYS
ValueError – If max_neighbor_distance < 1 or > len(FEATURE_KEYS)
Notes
Initialization performs expensive one-time computation
Use verbose=True for long-running initializations
Consider caching the neighbor graph (future optimization)
- format_walk(walk, arrow=' → ')[source]
Format a walk as a string with arrows.
- Parameters:
- Return type:
- Returns:
Formatted walk string
Example
>>> walk = walker.walk_from_profile("ka", "dialect", steps=5, seed=42) >>> walker.format_walk(walk) 'ka → ki → ti → ta → da → de' >>> walker.format_walk(walk, arrow=" -> ") 'ka -> ki -> ti -> ta -> da -> de'
- classmethod from_data(data, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False, progress_callback=None)[source]
Create a SyllableWalker from in-memory data.
This is useful when syllable data is loaded from a source other than a JSON file (e.g., SQLite database).
- Parameters:
data (
list[dict]) – List of syllable records, each with keys: ‘syllable’, ‘frequency’, ‘features’ (dict of bool values)max_neighbor_distance (
int) – Maximum Hamming distance for neighbors (1-3)feature_costs (
dict[str,float] |None) – Custom feature cost dictionaryinertia_cost (
float) – Cost of staying at current syllableverbose (
bool) – If True, print progress during initializationprogress_callback (
Callable[[str],None] |None) – Optional callable invoked with a progress message string during neighbor graph construction. Used by the web UI to show live loading progress to the user.
- Return type:
- Returns:
Initialized SyllableWalker instance
Example
>>> data = [ ... {"syllable": "ka", "frequency": 100, ... "features": {"starts_with_vowel": False, ...}} ... ] >>> walker = SyllableWalker.from_data(data, verbose=True)
- get_available_profiles()[source]
Get all available walk profiles.
- Return type:
- Returns:
Dictionary mapping profile names to WalkProfile objects
Example
>>> profiles = walker.get_available_profiles() >>> for name in profiles: ... print(name) clerical dialect goblin ritual
- get_random_syllable(seed=None, min_length=None, max_length=None)[source]
Get a random syllable from the dataset.
- Parameters:
- Return type:
- Returns:
Random syllable text
- Raises:
ValueError – If length constraints are invalid or no syllables match.
Example
>>> walker.get_random_syllable(seed=42) 'ka' >>> walker.get_random_syllable(seed=42) 'ka' # Same seed = same result
- get_syllable_info(syllable)[source]
Get information about a specific syllable.
- Parameters:
syllable (
str) – Syllable text to look up- Returns:
syllable, frequency, features Returns None if syllable not found
- Return type:
Syllable dictionary with keys
Example
>>> info = walker.get_syllable_info("ka") >>> if info: ... print(f"Frequency: {info['frequency']}") Frequency: 1234
- walk(start, steps, max_flips, temperature=1.0, frequency_weight=0.0, neighbor_limit=None, min_length=None, max_length=None, seed=None)[source]
Execute a syllable walk through feature space.
Starting from a syllable, takes steps steps through feature space, choosing each next syllable probabilistically based on: - Feature flip cost (weighted Hamming distance) - Frequency cost (rarity penalty/bonus) - Temperature (exploration vs exploitation) - Inertia (tendency to stay put)
The walk uses softmax selection over candidate neighbors: 1. Find all neighbors within max_flips distance 2. Compute cost for each neighbor (flip cost + rarity cost) 3. Add inertia option (staying at current syllable) 4. Apply softmax with temperature: weight_i = exp(-cost_i / T) 5. Sample next syllable proportional to weights
- Parameters:
start (
int|str) – Starting syllable (syllable text or index)steps (
int) – Number of steps to take (each step visits one syllable)max_flips (
int) – Maximum feature flips allowed per step (1-3). Must be <= max_neighbor_distance from __init__.temperature (
float) – Exploration temperature (0.1-5.0). Higher values increase randomness. Typical values: - 0.3: Conservative, minimal exploration - 0.7: Balanced - 1.5: High exploration - 2.5: Maximum randomnessfrequency_weight (
float) – Frequency bias (-2.0 to 2.0): - Positive: Favor common syllables - Zero: Neutral - Negative: Favor rare syllables Typical values: -1.0, 0.0, 1.0neighbor_limit (
int|None) – Optional cap on neighbor candidates considered at each step.Nonemeans use all neighbors.min_length (
int|None) – Optional minimum syllable character length allowed during traversal.max_length (
int|None) – Optional maximum syllable character length allowed during traversal.seed (
int|None) – Random seed for reproducibility. Same seed = same walk. If None, uses system randomness (non-reproducible).
- Returns:
“syllable”: Syllable text (str)
”frequency”: Corpus frequency (int)
”features”: Binary feature vector (list of 12 ints)
Length = steps + 1 (includes starting syllable)
- Return type:
List of syllable dictionaries with keys
- Raises:
ValueError – If start syllable not found in dataset
ValueError – If max_flips > max_neighbor_distance
ValueError – If steps < 0
Example
>>> walker = SyllableWalker("data.json") >>> walk = walker.walk( ... start="ka", ... steps=5, ... max_flips=2, ... temperature=0.7, ... frequency_weight=0.0, ... seed=42 ... ) >>> len(walk) 6 # start + 5 steps >>> walk[0]["syllable"] 'ka'
Notes
Deterministic: Same seed always produces same walk
Uses local Random instance (doesn’t affect global random state)
Inertia option allows walk to stay at current syllable
- walk_from_profile(start, profile, steps=5, neighbor_limit=None, min_length=None, max_length=None, seed=None)[source]
Execute a walk using a named profile.
Convenience method that uses predefined WalkProfile parameters. See WALK_PROFILES for available profiles.
- Parameters:
profile (
str|WalkProfile) – Profile name (“clerical”, “dialect”, “goblin”, “ritual”) or WalkProfile objectsteps (
int) – Number of steps to take (default: 5)neighbor_limit (
int|None) – Optional cap on neighbors considered per step.min_length (
int|None) – Optional minimum syllable length allowed.max_length (
int|None) – Optional maximum syllable length allowed.seed (
int|None) – Random seed for reproducibility (default: None)
- Return type:
- Returns:
List of syllable dictionaries (same as walk())
- Raises:
ValueError – If profile name not found
Example
>>> walker = SyllableWalker("data.json") >>> walk = walker.walk_from_profile("ka", "goblin", steps=10, seed=42) >>> print(walker.format_walk(walk)) ka → kha → gha → ghe → ge → gwe → ...
- class build_tools.syllable_walk.WalkProfile(name, description, max_flips, temperature, frequency_weight)[source]
Bases:
objectConfiguration profile for a syllable walk.
A profile encapsulates all parameters needed for a walk, providing named presets for different behaviors.
- name
Human-readable profile name (e.g., “Dialect Walk”)
- description
Brief description of profile behavior
- max_flips
Maximum feature flips allowed per step (1-3)
- temperature
Exploration temperature (0.1-5.0)
- frequency_weight
Frequency bias (-2.0 to 2.0)
Example
>>> profile = WalkProfile( ... name="Custom Walk", ... description="High temperature, neutral frequency", ... max_flips=2, ... temperature=2.0, ... frequency_weight=0.0 ... ) >>> print(profile) Custom Walk: High temperature, neutral frequency
- build_tools.syllable_walk.compute_all_reaches(walker, threshold=0.001, progress_callback=None)[source]
Compute mean effective vocabulary for all four named walk profiles.
Iterates over the predefined profiles (clerical, dialect, goblin, ritual) and computes the mean per-node thermodynamic reach for each. Returns a dictionary mapping profile names to their ReachResult.
This is intended to be called once after the walker finishes initialising, typically in the background thread that builds the neighbor graph. The results are cached in PatchState and served via the stats endpoint.
- Parameters:
walker (
SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph.threshold (
float) – Minimum transition probability for reachability. Default: 0.001. SeeDEFAULT_REACH_THRESHOLDfor rationale.progress_callback (
Callable[[str],None] |None) – Optional callable invoked with a progress message after each profile is computed. Used by the web UI to show incremental reach results like"Computing reaches: clerical ~4, dialect ~32...".
- Return type:
- Returns:
Dictionary mapping profile name to ReachResult. Keys:
"clerical","dialect","goblin","ritual".
Example
>>> reaches = compute_all_reaches(walker) >>> for name, r in reaches.items(): ... print(f"{name}: reach={r.reach}, time={r.computation_ms}ms") clerical: reach=4, time=12.3ms dialect: reach=32, time=15.1ms goblin: reach=58, time=14.8ms ritual: reach=147, time=18.2ms
Note
Custom profile reach is not computed here. See the TODO note in
api/walker.pyregarding on-demand computation for custom profiles.
- build_tools.syllable_walk.compute_reach(walker, profile_name, max_flips, temperature, frequency_weight, threshold=0.001)[source]
Compute mean effective vocabulary for a single profile.
For each syllable in the corpus, computes the softmax transition probability distribution over all neighbors within
max_flipsdistance, using the profile’stemperatureandfrequency_weight. Counts how many neighbors exceed the probability threshold, then returns the mean of these per-node counts as the reach value.This replicates the same math as
SyllableWalker.walk()(lines 526–549 of walker.py), but exhaustively over all starting nodes rather than sampling a single stochastic path.- The computation is:
For each starting syllable s: a. Collect all neighbors within max_flips Hamming distance b. Compute cost per neighbor: flip_cost + rarity_cost c. Add inertia option (staying at s) for normalisation d. Apply softmax: weight_i = exp(-cost_i / temperature) e. Normalise to probabilities f. Count other syllables (not s itself) with p > threshold.
Inertia participates in normalisation but self-transitions do not count toward reach.
Return the mean per-node count (rounded to nearest integer)
- Why mean-per-node instead of union?
The union approach (counting syllables reachable from any starting node) saturates to near-total for production corpora. With N=1,757 nodes and threshold=0.001, almost every syllable is reachable from at least one starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node count captures the effective vocabulary per step, which scales correctly with corpus size.
- Parameters:
walker (
SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph. Must haveneighbor_graph,_flip_cost(),_rarity_cost(),_hamming_distance(), andinertia_costavailable.profile_name (
str) – Human-readable name for the profile (e.g., “dialect”). Stored in the result for identification.max_flips (
int) – Maximum feature flips per step (1–3). Determines which edges in the neighbor graph are traversable.temperature (
float) – Softmax temperature (0.1–5.0). Controls the shape of the probability distribution. Low temperature concentrates probability on low-cost transitions; high temperature flattens the distribution toward uniform.frequency_weight (
float) – Frequency bias (-2.0 to 2.0). Positive values penalise rare syllables (favour common); negative values reward rare syllables (favour uncommon).threshold (
float) – Minimum transition probability for a syllable to be counted as “effectively reachable.” Default: 0.001.
- Return type:
- Returns:
ReachResult with the mean per-node reach count, corpus total, unique reachable count (union), and metadata.
- Raises:
ValueError – If walker has no syllables loaded.
Example
>>> result = compute_reach( ... walker, "dialect", ... max_flips=2, temperature=0.7, frequency_weight=0.0, ... ) >>> print(f"Dialect reach: {result.reach} / {result.total}") Dialect reach: 32 / 2088
- build_tools.syllable_walk.get_profile(name)[source]
Get a walk profile by name.
- Parameters:
name (
str) – Profile name (case-insensitive)- Return type:
- Returns:
WalkProfile object
- Raises:
ValueError – If profile name not found
Example
>>> profile = get_profile("goblin") >>> profile.temperature 1.5 >>> profile = get_profile("GOBLIN") # Case-insensitive >>> profile.temperature 1.5
- build_tools.syllable_walk.list_profiles()[source]
Get all available walk profiles.
- Return type:
- Returns:
Dictionary mapping profile names to WalkProfile objects (copy)
Example
>>> profiles = list_profiles() >>> for name, profile in profiles.items(): ... print(f"{name}: {profile.description}") clerical: Conservative, favors common syllables, minimal phonetic change dialect: Moderate exploration, neutral frequency bias goblin: Chaotic, favors rare syllables, high phonetic variation ritual: Maximum exploration, strongly favors rare syllables