Syllable Walker
Overview
Syllable Walker - Phonetic Feature Space Exploration
The syllable walker is a phonetic exploration tool that generates sequences of syllables by “walking” through phonetic feature space using cost-based random selection. It enables corpus analysis, pattern discovery, and exploration of phonetic relationships. This is a build-time analysis tool only - not used during runtime name generation.
The walker explores syllable datasets by moving probabilistically from one syllable to phonetically similar syllables. Each step considers:
Phonetic distance - How many features change (Hamming distance)
Frequency bias - Preference for common vs rare syllables
Temperature - Amount of randomness in selection
Inertia - Tendency to stay at current syllable
Key Features:
Four pre-configured profiles (clerical, dialect, goblin, ritual)
Custom parameter control for fine-tuned exploration
Deterministic walks (same seed = same walk, reproducible)
Interactive web interface for browser-based exploration
Batch processing to generate thousands of walks for analysis
Fast operation (<10ms per walk after initialization)
Large corpus support (efficiently handles 500k+ syllables)
Main Components:
SyllableWalker: Core walking algorithm with efficient neighbor graph
WalkProfile: Configuration preset for different walking behaviors
WALK_PROFILES: Predefined profiles (clerical, dialect, goblin, ritual)
- Usage:
>>> from build_tools.syllable_walk import SyllableWalker >>> >>> # Load annotated syllables >>> walker = SyllableWalker("data/annotated/syllables_annotated.json") >>> >>> # Walk using a profile >>> walk = walker.walk_from_profile( ... start="ka", ... profile="dialect", ... steps=5, ... seed=42 ... ) >>> >>> # Display walk sequence >>> print(" → ".join(s["syllable"] for s in walk)) ka → ki → ti → ta → da → de
CLI Usage:
# Walk with a profile python -m build_tools.syllable_walk data.json --start ka --profile dialect --steps 5 # Launch interactive web interface python -m build_tools.syllable_walk data.json --web --port 8080 # Batch walks for analysis python -m build_tools.syllable_walk data.json --batch 100 --profile ritual
Core Concepts
Phonetic Distance
Each syllable has 12 binary phonetic features (from syllable_feature_annotator). The distance between
two syllables is the number of features that differ (Hamming distance). The max_flips parameter limits
how many features can change in a single step.
Neighbor Graph
During initialization, the walker pre-computes which syllables are “neighbors” (within the specified Hamming distance). This enables fast walk generation:
Distance 1: ~30 sec initialization, conservative walks
Distance 2: ~1 min initialization, moderate walks
Distance 3: ~3 min initialization, maximum flexibility
For 500k+ syllable datasets, distance 3 is recommended.
Determinism
The same seed always produces the same walk. This is essential for reproducible experiments, testing, and debugging. Each walk uses an isolated RNG instance to avoid global state contamination.
Walk Profiles
The walker includes four pre-configured profiles:
Profile |
Description |
Steps |
Max Flips |
Temperature |
Freq Weight |
Use Case |
|---|---|---|---|---|---|---|
clerical |
Conservative, minimal change |
5 |
1 |
0.3 |
1.0 |
Formal names |
dialect |
Balanced exploration |
5 |
2 |
0.7 |
0.0 |
General use |
goblin |
Chaotic, high variation |
5 |
2 |
1.5 |
-0.5 |
Exotic names |
ritual |
Maximum exploration |
5 |
3 |
2.5 |
-1.0 |
Extreme variation |
Frequency Weight controls syllable selection:
Positive values (e.g. 1.0) favor common syllables
Zero (0.0) is neutral
Negative values (e.g. -1.0) favor rare syllables
Temperature controls randomness:
Low (0.3) = more deterministic, prefer lowest-cost moves
High (2.5) = more random, explore high-cost moves
Command-Line Interface
Explore syllable feature space via cost-based random walks
usage: python -m build_tools.syllable_walk [-h] [--start SYLLABLE]
[--profile NAME] [--steps N]
[--seed SEED] [--max-flips N]
[--temperature T]
[--frequency-weight W]
[--compare-profiles] [--batch N]
[--search QUERY] [--web]
[--output FILE] [--quiet]
[--verbose]
[--max-neighbor-distance N]
[--port PORT]
data_file
Positional Arguments
- data_file
Path to syllables_annotated.json file (output of syllable_feature_annotator). This file contains syllables with phonetic features and frequency information. Example: data/annotated/syllables_annotated.json
walk parameters
Parameters controlling syllable walk behavior. These work with any mode except –search.
- --start
Starting syllable for the walk. If not specified, a random syllable will be chosen. Must be a syllable present in the data file. Use –search to find valid syllables. Examples: ‘ka’, ‘bak’, ‘the’. Default: random syllable
- --profile
Possible choices: clerical, dialect, goblin, ritual
Walk profile preset defining behavior characteristics. Available profiles: clerical (conservative, favors common syllables), dialect (balanced exploration, neutral frequency), goblin (chaotic, favors rare syllables), ritual (maximum exploration, very rare syllables). Each profile has predefined max_flips, temperature, and frequency_weight values. Can be overridden with custom parameters. Default: dialect
Default:
'dialect'- --steps
Number of steps to take in the walk. Each step visits one syllable. Output length will be steps + 1 (includes starting syllable). Valid range: 0-1000. Examples: 5 (quick walk), 20 (longer exploration). Default: 5
Default:
5- --seed
Random seed for reproducible walks. Same seed with same parameters always produces identical walks. This is useful for testing, debugging, or generating consistent examples. If not specified, uses system randomness (non-reproducible). Examples: 42, 12345. Default: None (random)
custom parameters
Advanced parameters that override profile settings. Use these to fine-tune walk behavior beyond predefined profiles.
- --max-flips
Possible choices: 1, 2, 3
Maximum number of phonetic features that can change per step. This controls the Hamming distance constraint between consecutive syllables. Higher values allow more dramatic phonetic changes. Valid values: 1 (very conservative), 2 (moderate), 3 (maximum). Must be <= max-neighbor-distance. Overrides profile setting. Examples: 1 for minimal change, 3 for maximum variation. Default: determined by profile
- --temperature
Exploration temperature controlling randomness (0.1-5.0). Higher values increase randomness and exploration, making the walk more likely to choose high-cost transitions. Lower values make walks more deterministic, strongly preferring low-cost moves. Overrides profile setting. Typical values: 0.3 (conservative), 0.7 (balanced), 1.5 (exploratory), 2.5 (chaotic). Default: determined by profile
- --frequency-weight
Frequency bias weight (-2.0 to 2.0). Controls whether the walk favors common or rare syllables. Positive values: Favor common syllables (e.g., 1.0 strongly favors common). Zero: Neutral, no frequency bias. Negative values: Favor rare syllables (e.g., -1.0 strongly favors rare). Overrides profile setting. Examples: 1.0 (prefer common), 0.0 (neutral), -1.0 (prefer rare). Default: determined by profile
operation modes
Different modes of operation. These modes are mutually exclusive. If no mode is specified, performs a single walk.
- --compare-profiles
Compare all four walk profiles from the same starting syllable. Generates one walk for each profile (clerical, dialect, goblin, ritual) using the same seed (if specified), allowing direct comparison of different behaviors. The –profile argument is ignored in this mode. Output shows walks side-by-side with profile descriptions. Useful for understanding profile differences.
Default:
False- --batch
Generate N walks in batch mode. Each walk starts from a random syllable (unless –start is specified, then all walks start from the same syllable). Useful for statistical analysis, corpus exploration, or generating large datasets. Combine with –output to save results to JSON file. Progress is shown during generation. Examples: –batch 100 for analysis, –batch 1000 for corpus stats. Valid range: 1-10000
- --search
Search for syllables matching the query string. Performs case-insensitive substring match against all syllables in the dataset. Shows up to 20 matches with frequency information. Useful for finding valid starting syllables or exploring corpus contents. Does not perform walk generation. Examples: –search ‘th’ finds ‘the’, ‘thi’, ‘tha’, etc. –search ‘ka’ finds ‘ka’, ‘kan’, ‘kaf’, etc.
- --web
Start interactive web interface instead of command-line mode. Launches a web server with a browser-based interface for exploring walks interactively. All walk parameters can be adjusted in the browser. The server runs until stopped with Ctrl+C. Use –port to specify custom port (default: 5000). Other CLI arguments are ignored in web mode. Access at http://localhost:5000 after starting.
Default:
False
output options
Control output format, destination, and verbosity.
- --output
Save results to JSON file instead of printing to console. Parent directories will be created if they don’t exist. Output format depends on mode: single walk saves walk details with profile and seed info; batch mode saves array of walks with metadata. File can be used for further analysis or visualization. Examples: –output results/walks.json, –output batch_data.json
- --quiet
Suppress progress messages and verbose output. Only prints final results or errors. Useful for scripting, piping output, or when running in automated environments. Cannot be combined with –verbose. Progress bars and initialization messages are hidden in quiet mode.
Default:
False- --verbose
Enable verbose output showing initialization progress, neighbor graph construction details, and detailed walk information. Shows memory usage, processing time, and intermediate steps. Useful for understanding performance, debugging, or learning how the walker works. Cannot be combined with –quiet. Significantly increases output volume.
Default:
False
walker configuration
Advanced configuration for the walker engine. These settings affect initialization time and memory usage.
- --max-neighbor-distance
Possible choices: 1, 2, 3
Maximum Hamming distance for pre-computing neighbor graph (1-3). During initialization, the walker computes which syllables are ‘neighbors’ (similar in phonetic features). Higher values allow larger –max-flips but significantly increase initialization time and memory usage. Should be >= largest –max-flips you plan to use. Initialization time (500k syllables): ~30 sec (1), ~1 min (2), ~3 min (3). Memory impact: ~50MB (1), ~150MB (2), ~300MB (3). Default: 3 (recommended for maximum flexibility)
Default:
3- --port
Port number for web server when using –web mode. Only applies when –web flag is specified, otherwise ignored. Choose a port that is not already in use by another service. Common alternatives: 8000, 8080, 3000. If the port is in use, the server will fail to start with an error message. Valid range: 1024-65535 (ports below 1024 require root/admin). Default: 5000
Default:
5000
# Generate a single walk with default profile (dialect)
python -m build_tools.syllable_walk data.json --start ka
# Use specific profile
python -m build_tools.syllable_walk data.json --start bak --profile goblin --steps 10
# Compare all profiles from same starting point
python -m build_tools.syllable_walk data.json --start ka --compare-profiles
# Generate batch of 50 walks and save to JSON
python -m build_tools.syllable_walk data.json --batch 50 --profile ritual --output walks.json
# Search for syllables containing "th"
python -m build_tools.syllable_walk data.json --search "th"
# Custom walk parameters (overrides profile)
python -m build_tools.syllable_walk data.json --start ka --steps 10 \
--max-flips 2 --temperature 1.5 --frequency-weight -0.8 --seed 42
# Start interactive web interface (opens on http://localhost:5000)
python -m build_tools.syllable_walk data.json --web
# Start web interface on custom port
python -m build_tools.syllable_walk data.json --web --port 8000
For detailed documentation, see: claude/build_tools/syllable_walk.md
Output Format
Single Walk
{
"walk": [
{
"syllable": "ka",
"frequency": 20,
"features": [0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0]
},
{
"syllable": "pai",
"frequency": 9,
"features": [0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0]
}
],
"profile": "dialect",
"start": "ka",
"seed": 42
}
Batch Output
{
"walks": [
{
"walk": [
{"syllable": "ka", "frequency": 20, "features": [0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0]},
{"syllable": "ki", "frequency": 15, "features": [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0]}
],
"start": "ka",
"seed": 42
},
{
"walk": [
{"syllable": "bak", "frequency": 8, "features": [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]},
{"syllable": "pak", "frequency": 5, "features": [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]}
],
"start": "bak",
"seed": 43
}
],
"profile": "dialect",
"parameters": {
"steps": 5,
"max_flips": 2,
"temperature": 0.7,
"frequency_weight": 0.0
}
}
Integration Guide
The syllable walker uses output from the feature annotator:
# Step 1: Extract syllables from dictionary
python -m build_tools.pyphen_syllable_extractor --file wordlist.txt --auto
# Step 2: Normalize syllables
python -m build_tools.pyphen_syllable_normaliser \
--source data/corpus/ \
--output data/normalized/
# Step 3: Annotate with phonetic features
python -m build_tools.syllable_feature_annotator \
--syllables data/normalized/syllables_unique.txt \
--frequencies data/normalized/syllables_frequencies.json \
--output data/annotated/syllables_annotated.json
# Step 4: Explore with syllable walker
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json --web
When to use this tool:
To explore phonetic connectivity in your syllable corpus
To test if desired phonetic transitions exist before creating patterns
To discover interesting phonetic progressions for name generation
To analyze corpus structure and syllable relationships
To generate datasets for statistical analysis of phonetic patterns
Common Use Cases:
Understanding Corpus Structure:
Generate many walks to see how syllables connect:
# Generate 100 walks for corpus analysis
python -m build_tools.syllable_walk data.json --batch 100 --output corpus_walks.json
Analyze the JSON output to understand syllable connectivity, central hubs, and phonetic pathways.
Testing Pattern Viability:
Explore if desired phonetic transitions exist before creating new patterns:
# Test phonetic transitions with ritual profile
python -m build_tools.syllable_walk data.json --start the --profile ritual
Finding Interesting Sequences:
Discover unusual but valid phonetic progressions:
# Explore unusual sequences with goblin profile
python -m build_tools.syllable_walk data.json --profile goblin --steps 10
Statistical Analysis:
Generate large datasets for analysis:
# Generate 1000 walks with dialect profile
python -m build_tools.syllable_walk data.json --batch 1000 \
--profile dialect --output dialect_walks.json
# Generate 1000 walks with goblin profile
python -m build_tools.syllable_walk data.json --batch 1000 \
--profile goblin --output goblin_walks.json
Then analyze frequency distributions, transition patterns, etc.
Advanced Topics
Web Interface
The web interface provides an intuitive way to explore syllable walks without command-line complexity.
Starting the Server:
# Default port (5000)
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json --web
# Custom port
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json \
--web --port 8000
# Quiet mode (suppress initialization messages)
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json \
--web --quiet
Features:
Profile selection - Choose from four profiles or use custom parameters
Starting syllable - Specify start or use random
Real-time generation - Instant walk generation with visual feedback
Walk display - See full path and syllable details with frequencies
Statistics tracking - Total syllables and walks generated
Reproducible - Optional seed for deterministic walks
The web server uses Python’s standard library http.server (no Flask dependency).
Algorithm Details
Cost Function:
Each potential step has a cost based on:
Hamming distance - Number of features that change
Feature-specific costs - Some features cost more to change
Frequency weight - Bias toward common or rare syllables
Inertia - Tendency to stay at current syllable
The walker uses softmax selection with temperature to probabilistically choose the next syllable:
For each neighbor n:
hamming_cost = sum(feature_costs[i] for i where features differ)
freq_cost = frequency_weight × log(frequency[n])
total_cost = hamming_cost + freq_cost + inertia_cost
Probability of selecting n:
P(n) = exp(-cost(n) / temperature) / sum(exp(-cost(k) / temperature))
Higher temperature = more random selection (flattens probability distribution)
Lower temperature = more deterministic (strongly favors lowest cost)
Performance
Initialization Time (500k syllables):
Max Neighbor Distance |
Time |
Memory |
Max Flips Supported |
|---|---|---|---|
1 |
~30 seconds |
~50 MB |
1 |
2 |
~1 minute |
~150 MB |
1-2 |
3 |
~3 minutes |
~300 MB |
1-3 |
Walk Generation:
After initialization: <10ms per walk (instant)
Deterministic: Same seed always produces same walk
Scalable: Speed independent of corpus size
Notes
Dependencies:
Requires NumPy for efficient feature matrix operations (build-time dependency)
Uses standard library
http.serverfor web interface (no Flask)
Performance Characteristics:
Initialization is one-time cost (30 sec - 3 min depending on distance)
Walk generation is instant after initialization (<10ms per walk)
Designed for large corpus analysis (500k+ syllables)
Determinism guaranteed via isolated RNG instances
Troubleshooting:
Initialization Takes Too Long:
Reduce
--max-neighbor-distance(default 3 → try 2)Use smaller corpus for testing
Initialization is one-time cost, walk generation is instant
Getting Stuck at One Syllable:
Increase
--max-flips(allow bigger phonetic jumps)Increase
--temperature(more randomness)Check if starting syllable is isolated with
--search
Walks Too Random:
Decrease
--temperature(less randomness)Adjust
--frequency-weight(try -0.5 for rare syllables)Try different profiles (clerical for conservative)
Port Already in Use (Web Mode):
# Use a different port if 5000 is occupied
python -m build_tools.syllable_walk data.json --web --port 8000
Build-time tool:
This is a build-time analysis tool only - not used during runtime name generation.
Related Documentation:
Syllable Feature Annotator - Generates input data with phonetic features
Pyphen Syllable Normaliser - Prepares pyphen syllable corpus before annotation
NLTK Syllable Normaliser - Prepares NLTK syllable corpus before annotation
Pyphen Syllable Extractor - Extracts raw syllables using pyphen
NLTK Syllable Extractor - Extracts raw syllables using NLTK
Analysis Tools - Additional analysis tools for syllable data
For detailed usage guide, see: claude/build_tools/syllable_walk.md
API Reference
Syllable Walker - Phonetic Feature Space Exploration
The syllable walker is a phonetic exploration tool that generates sequences of syllables by “walking” through phonetic feature space using cost-based random selection. It enables corpus analysis, pattern discovery, and exploration of phonetic relationships. This is a build-time analysis tool only - not used during runtime name generation.
The walker explores syllable datasets by moving probabilistically from one syllable to phonetically similar syllables. Each step considers:
Phonetic distance - How many features change (Hamming distance)
Frequency bias - Preference for common vs rare syllables
Temperature - Amount of randomness in selection
Inertia - Tendency to stay at current syllable
Key Features:
Four pre-configured profiles (clerical, dialect, goblin, ritual)
Custom parameter control for fine-tuned exploration
Deterministic walks (same seed = same walk, reproducible)
Interactive web interface for browser-based exploration
Batch processing to generate thousands of walks for analysis
Fast operation (<10ms per walk after initialization)
Large corpus support (efficiently handles 500k+ syllables)
Main Components:
SyllableWalker: Core walking algorithm with efficient neighbor graph
WalkProfile: Configuration preset for different walking behaviors
WALK_PROFILES: Predefined profiles (clerical, dialect, goblin, ritual)
- Usage:
>>> from build_tools.syllable_walk import SyllableWalker >>> >>> # Load annotated syllables >>> walker = SyllableWalker("data/annotated/syllables_annotated.json") >>> >>> # Walk using a profile >>> walk = walker.walk_from_profile( ... start="ka", ... profile="dialect", ... steps=5, ... seed=42 ... ) >>> >>> # Display walk sequence >>> print(" → ".join(s["syllable"] for s in walk)) ka → ki → ti → ta → da → de
CLI Usage:
# Walk with a profile python -m build_tools.syllable_walk data.json --start ka --profile dialect --steps 5 # Launch interactive web interface python -m build_tools.syllable_walk data.json --web --port 8080 # Batch walks for analysis python -m build_tools.syllable_walk data.json --batch 100 --profile ritual
- class build_tools.syllable_walk.SyllableWalker(data_path, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False)[source]
Bases:
objectNavigate syllable feature space via cost-based random walks.
This class efficiently handles large syllable datasets (500k+) by pre-computing neighbor relationships and using vectorized operations where possible.
The walker performs a one-time expensive computation during initialization to build a neighbor graph, mapping each syllable to nearby syllables within a maximum Hamming distance. After initialization, walk generation is extremely fast (<10ms per walk).
- syllables
List of all syllable strings
- frequencies
NumPy array of syllable frequencies (uint32)
- feature_matrix
NumPy array of binary feature vectors (N x 12, uint8)
- syllable_to_idx
Dict mapping syllable text to index
- neighbor_graph
Dict mapping syllable index to list of neighbor indices
- max_neighbor_distance
Maximum Hamming distance for neighbors
- feature_costs
Dict of costs for each feature flip
- inertia_cost
Cost of staying at current syllable
Example
>>> walker = SyllableWalker("syllables_annotated.json", verbose=True) >>> walk = walker.walk_from_profile( ... start="ka", profile="dialect", steps=5, seed=42 ... ) >>> print(walker.format_walk(walk)) ka → ki → ti → ta → da → de
Notes
Initialization time: ~2-3 minutes for 500k syllables
Walk generation: <10ms per walk after initialization
Memory usage: ~200-300 MB for 500k syllables
Thread safety: Not thread-safe (use separate instances)
- __init__(data_path, max_neighbor_distance=3, feature_costs=None, inertia_cost=0.5, verbose=False)[source]
Initialize the syllable walker with pre-computed neighbor graph.
- Parameters:
data_path (
Path|str) – Path to syllables_annotated.json file (output of syllable_feature_annotator)max_neighbor_distance (
int) – Maximum Hamming distance for pre-computing neighbors (1-3). Higher values = more neighbors = slower initialization + more memory, but allows larger feature flips per step. Default: 3 (recommended)feature_costs (
Optional[Dict[str,float]]) – Custom feature cost dictionary. If None, uses DEFAULT_FEATURE_COSTS. Keys must match FEATURE_KEYS.inertia_cost (
float) – Cost of staying at current syllable. Higher values discourage staying put. Default: 0.5verbose (
bool) – If True, print progress during initialization (neighbor graph construction can take 2-3 minutes for 500k syllables)
- Raises:
FileNotFoundError – If data_path does not exist
ValueError – If data_path is not valid JSON
ValueError – If feature_costs keys don’t match FEATURE_KEYS
ValueError – If max_neighbor_distance < 1 or > len(FEATURE_KEYS)
Notes
Initialization performs expensive one-time computation
Use verbose=True for long-running initializations
Consider caching the neighbor graph (future optimization)
- format_walk(walk, arrow=' → ')[source]
Format a walk as a string with arrows.
- Parameters:
- Return type:
- Returns:
Formatted walk string
Example
>>> walk = walker.walk_from_profile("ka", "dialect", steps=5, seed=42) >>> walker.format_walk(walk) 'ka → ki → ti → ta → da → de' >>> walker.format_walk(walk, arrow=" -> ") 'ka -> ki -> ti -> ta -> da -> de'
- get_available_profiles()[source]
Get all available walk profiles.
- Return type:
- Returns:
Dictionary mapping profile names to WalkProfile objects
Example
>>> profiles = walker.get_available_profiles() >>> for name in profiles: ... print(name) clerical dialect goblin ritual
- get_random_syllable(seed=None)[source]
Get a random syllable from the dataset.
- Parameters:
seed (
Optional[int]) – Random seed for reproducibility (default: None)- Return type:
- Returns:
Random syllable text
Example
>>> walker.get_random_syllable(seed=42) 'ka' >>> walker.get_random_syllable(seed=42) 'ka' # Same seed = same result
- get_syllable_info(syllable)[source]
Get information about a specific syllable.
- Parameters:
syllable (
str) – Syllable text to look up- Returns:
syllable, frequency, features Returns None if syllable not found
- Return type:
Syllable dictionary with keys
Example
>>> info = walker.get_syllable_info("ka") >>> if info: ... print(f"Frequency: {info['frequency']}") Frequency: 1234
- walk(start, steps, max_flips, temperature=1.0, frequency_weight=0.0, seed=None)[source]
Execute a syllable walk through feature space.
Starting from a syllable, takes steps steps through feature space, choosing each next syllable probabilistically based on: - Feature flip cost (weighted Hamming distance) - Frequency cost (rarity penalty/bonus) - Temperature (exploration vs exploitation) - Inertia (tendency to stay put)
The walk uses softmax selection over candidate neighbors: 1. Find all neighbors within max_flips distance 2. Compute cost for each neighbor (flip cost + rarity cost) 3. Add inertia option (staying at current syllable) 4. Apply softmax with temperature: weight_i = exp(-cost_i / T) 5. Sample next syllable proportional to weights
- Parameters:
start (
int|str) – Starting syllable (syllable text or index)steps (
int) – Number of steps to take (each step visits one syllable)max_flips (
int) – Maximum feature flips allowed per step (1-3). Must be <= max_neighbor_distance from __init__.temperature (
float) – Exploration temperature (0.1-5.0). Higher values increase randomness. Typical values: - 0.3: Conservative, minimal exploration - 0.7: Balanced - 1.5: High exploration - 2.5: Maximum randomnessfrequency_weight (
float) – Frequency bias (-2.0 to 2.0): - Positive: Favor common syllables - Zero: Neutral - Negative: Favor rare syllables Typical values: -1.0, 0.0, 1.0seed (
Optional[int]) – Random seed for reproducibility. Same seed = same walk. If None, uses system randomness (non-reproducible).
- Returns:
“syllable”: Syllable text (str)
”frequency”: Corpus frequency (int)
”features”: Binary feature vector (list of 12 ints)
Length = steps + 1 (includes starting syllable)
- Return type:
List of syllable dictionaries with keys
- Raises:
ValueError – If start syllable not found in dataset
ValueError – If max_flips > max_neighbor_distance
ValueError – If steps < 0
Example
>>> walker = SyllableWalker("data.json") >>> walk = walker.walk( ... start="ka", ... steps=5, ... max_flips=2, ... temperature=0.7, ... frequency_weight=0.0, ... seed=42 ... ) >>> len(walk) 6 # start + 5 steps >>> walk[0]["syllable"] 'ka'
Notes
Deterministic: Same seed always produces same walk
Uses local Random instance (doesn’t affect global random state)
Inertia option allows walk to stay at current syllable
- walk_from_profile(start, profile, steps=5, seed=None)[source]
Execute a walk using a named profile.
Convenience method that uses predefined WalkProfile parameters. See WALK_PROFILES for available profiles.
- Parameters:
- Return type:
- Returns:
List of syllable dictionaries (same as walk())
- Raises:
ValueError – If profile name not found
Example
>>> walker = SyllableWalker("data.json") >>> walk = walker.walk_from_profile("ka", "goblin", steps=10, seed=42) >>> print(walker.format_walk(walk)) ka → kha → gha → ghe → ge → gwe → ...
- class build_tools.syllable_walk.WalkProfile(name, description, max_flips, temperature, frequency_weight)[source]
Bases:
objectConfiguration profile for a syllable walk.
A profile encapsulates all parameters needed for a walk, providing named presets for different behaviors.
- name
Human-readable profile name (e.g., “Dialect Walk”)
- description
Brief description of profile behavior
- max_flips
Maximum feature flips allowed per step (1-3)
- temperature
Exploration temperature (0.1-5.0)
- frequency_weight
Frequency bias (-2.0 to 2.0)
Example
>>> profile = WalkProfile( ... name="Custom Walk", ... description="High temperature, neutral frequency", ... max_flips=2, ... temperature=2.0, ... frequency_weight=0.0 ... ) >>> print(profile) Custom Walk: High temperature, neutral frequency
- build_tools.syllable_walk.get_profile(name)[source]
Get a walk profile by name.
- Parameters:
name (
str) – Profile name (case-insensitive)- Return type:
- Returns:
WalkProfile object
- Raises:
ValueError – If profile name not found
Example
>>> profile = get_profile("goblin") >>> profile.temperature 1.5 >>> profile = get_profile("GOBLIN") # Case-insensitive >>> profile.temperature 1.5
- build_tools.syllable_walk.list_profiles()[source]
Get all available walk profiles.
- Return type:
- Returns:
Dictionary mapping profile names to WalkProfile objects (copy)
Example
>>> profiles = list_profiles() >>> for name, profile in profiles.items(): ... print(f"{name}: {profile.description}") clerical: Conservative, favors common syllables, minimal phonetic change dialect: Moderate exploration, neutral frequency bias goblin: Chaotic, favors rare syllables, high phonetic variation ritual: Maximum exploration, strongly favors rare syllables