build_tools.syllable_walk.reach
Thermodynamic reach calculator for syllable walker profiles.
Computes the mean effective vocabulary of each walk profile — the average number of syllables with non-negligible transition probability from any given starting position, under the profile’s complete parameter set (max_flips, temperature, frequency_weight).
This is a deterministic, seed-independent metric that reflects the thermodynamic structure of the profile’s constraint regime, not stochastic walk behaviour.
The algorithm replicates the walker’s softmax transition math (see
SyllableWalker.walk()), but exhaustively over all starting nodes rather than
sampling a single path. For each starting syllable, it computes the full
probability distribution over candidate neighbors, then counts how many
syllables exceed a probability threshold. The final reach value is the
mean of these per-node counts across all starting positions.
- Why mean-per-node instead of union?
An earlier implementation used the union of all reachable syllables across all starting nodes. This produced poor discrimination at production scale: with N=1,757 starting nodes and threshold=0.001, almost every syllable was reachable from some starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node approach captures the effective vocabulary per step of a walk, which scales correctly with corpus size and discriminates between profiles that differ only in temperature or frequency_weight.
- Design reference:
_working/syllable_walker_profile_field_micro_signal.md- Key properties:
Deterministic: same corpus + profile always produces the same reach
Seed-independent: no random sampling involved
Captures all three profile parameters (max_flips, temperature, frequency_weight)
Produces genuinely different values for all four named profiles
Scales correctly with corpus size (no saturation)
Computed once per corpus load, not per walk
Example
>>> from build_tools.syllable_walk.reach import compute_all_reaches
>>> from build_tools.syllable_walk import SyllableWalker
>>> walker = SyllableWalker("data/annotated/syllables_annotated.json")
>>> reaches = compute_all_reaches(walker)
>>> for name, result in reaches.items():
... print(f"{name}: reach {result.reach} / {result.total}")
clerical: reach 4 / 2088
dialect: reach 32 / 2088
goblin: reach 58 / 2088
ritual: reach 147 / 2088
Attributes
Classes
Result of a thermodynamic reach computation for a single profile. |
Functions
|
Compute mean effective vocabulary for a single profile. |
|
Compute mean effective vocabulary for all four named walk profiles. |
Module Contents
- class build_tools.syllable_walk.reach.ReachResult[source]
Result of a thermodynamic reach computation for a single profile.
Encapsulates both the reach count and the full context of how it was computed, including the profile parameters and timing metadata.
- profile_name
Name of the profile (e.g., “clerical”, “dialect”).
- reach
Mean number of syllables reachable per starting node (rounded). This is the primary micro signal — the average effective vocabulary size at each step of a walk under this profile’s constraints.
- total
Total syllables in the corpus (the “field” size).
- threshold
Probability threshold used for the reachability test. A syllable is counted if p > threshold from the starting node.
- max_flips
Profile’s max_flips parameter (edge existence constraint).
- temperature
Profile’s temperature parameter (probability shape).
- frequency_weight
Profile’s frequency_weight parameter (rarity bias).
- computation_ms
Wall-clock time for this profile’s computation in milliseconds. Captured as metadata to monitor performance across different systems and corpus sizes.
- unique_reachable
Total unique syllables reachable from at least one starting node (union across all nodes). This is supplementary context — the mean per-node count (
reach) is the primary metric displayed in the UI.
- reachable_indices
Tuple of
(syllable_index, reachability_count)pairs for all syllables in the union reachable set, sorted by reachability count descending (most commonly reachable first). The count is how many starting nodes can reach that syllable. Maps to syllable text viawalker.syllables[idx]. Omitted fromto_dict()to keep API responses lean.
Example
>>> result = ReachResult( ... profile_name="dialect", ... reach=32, ... total=2088, ... threshold=0.001, ... max_flips=2, ... temperature=0.7, ... frequency_weight=0.0, ... computation_ms=42.5, ... unique_reachable=1850, ... ) >>> result.reach 32
- build_tools.syllable_walk.reach.compute_reach(walker, profile_name, max_flips, temperature, frequency_weight, threshold=DEFAULT_REACH_THRESHOLD)[source]
Compute mean effective vocabulary for a single profile.
For each syllable in the corpus, computes the softmax transition probability distribution over all neighbors within
max_flipsdistance, using the profile’stemperatureandfrequency_weight. Counts how many neighbors exceed the probability threshold, then returns the mean of these per-node counts as the reach value.This replicates the same math as
SyllableWalker.walk()(lines 526–549 of walker.py), but exhaustively over all starting nodes rather than sampling a single stochastic path.- The computation is:
For each starting syllable s: a. Collect all neighbors within max_flips Hamming distance b. Compute cost per neighbor: flip_cost + rarity_cost c. Add inertia option (staying at s) for normalisation d. Apply softmax: weight_i = exp(-cost_i / temperature) e. Normalise to probabilities f. Count other syllables (not s itself) with p > threshold.
Inertia participates in normalisation but self-transitions do not count toward reach.
Return the mean per-node count (rounded to nearest integer)
- Why mean-per-node instead of union?
The union approach (counting syllables reachable from any starting node) saturates to near-total for production corpora. With N=1,757 nodes and threshold=0.001, almost every syllable is reachable from at least one starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node count captures the effective vocabulary per step, which scales correctly with corpus size.
- Parameters:
walker (build_tools.syllable_walk.walker.SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph. Must have
neighbor_graph,_flip_cost(),_rarity_cost(),_hamming_distance(), andinertia_costavailable.profile_name (str) – Human-readable name for the profile (e.g., “dialect”). Stored in the result for identification.
max_flips (int) – Maximum feature flips per step (1–3). Determines which edges in the neighbor graph are traversable.
temperature (float) – Softmax temperature (0.1–5.0). Controls the shape of the probability distribution. Low temperature concentrates probability on low-cost transitions; high temperature flattens the distribution toward uniform.
frequency_weight (float) – Frequency bias (-2.0 to 2.0). Positive values penalise rare syllables (favour common); negative values reward rare syllables (favour uncommon).
threshold (float) – Minimum transition probability for a syllable to be counted as “effectively reachable.” Default: 0.001.
- Returns:
ReachResult with the mean per-node reach count, corpus total, unique reachable count (union), and metadata.
- Raises:
ValueError – If walker has no syllables loaded.
- Return type:
Example
>>> result = compute_reach( ... walker, "dialect", ... max_flips=2, temperature=0.7, frequency_weight=0.0, ... ) >>> print(f"Dialect reach: {result.reach} / {result.total}") Dialect reach: 32 / 2088
- build_tools.syllable_walk.reach.compute_all_reaches(walker, threshold=DEFAULT_REACH_THRESHOLD, progress_callback=None)[source]
Compute mean effective vocabulary for all four named walk profiles.
Iterates over the predefined profiles (clerical, dialect, goblin, ritual) and computes the mean per-node thermodynamic reach for each. Returns a dictionary mapping profile names to their ReachResult.
This is intended to be called once after the walker finishes initialising, typically in the background thread that builds the neighbor graph. The results are cached in PatchState and served via the stats endpoint.
- Parameters:
walker (build_tools.syllable_walk.walker.SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph.
threshold (float) – Minimum transition probability for reachability. Default: 0.001. See
DEFAULT_REACH_THRESHOLDfor rationale.progress_callback (Callable[[str], None] | None) – Optional callable invoked with a progress message after each profile is computed. Used by the web UI to show incremental reach results like
"Computing reaches: clerical ~4, dialect ~32...".
- Returns:
Dictionary mapping profile name to ReachResult. Keys:
"clerical","dialect","goblin","ritual".- Return type:
Example
>>> reaches = compute_all_reaches(walker) >>> for name, r in reaches.items(): ... print(f"{name}: reach={r.reach}, time={r.computation_ms}ms") clerical: reach=4, time=12.3ms dialect: reach=32, time=15.1ms goblin: reach=58, time=14.8ms ritual: reach=147, time=18.2ms
Note
Custom profile reach is not computed here. See the TODO note in
api/walker.pyregarding on-demand computation for custom profiles.