build_tools.syllable_walk.reach

Thermodynamic reach calculator for syllable walker profiles.

Computes the mean effective vocabulary of each walk profile — the average number of syllables with non-negligible transition probability from any given starting position, under the profile’s complete parameter set (max_flips, temperature, frequency_weight).

This is a deterministic, seed-independent metric that reflects the thermodynamic structure of the profile’s constraint regime, not stochastic walk behaviour.

The algorithm replicates the walker’s softmax transition math (see SyllableWalker.walk()), but exhaustively over all starting nodes rather than sampling a single path. For each starting syllable, it computes the full probability distribution over candidate neighbors, then counts how many syllables exceed a probability threshold. The final reach value is the mean of these per-node counts across all starting positions.

Why mean-per-node instead of union?

An earlier implementation used the union of all reachable syllables across all starting nodes. This produced poor discrimination at production scale: with N=1,757 starting nodes and threshold=0.001, almost every syllable was reachable from some starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node approach captures the effective vocabulary per step of a walk, which scales correctly with corpus size and discriminates between profiles that differ only in temperature or frequency_weight.

Design reference:

_working/syllable_walker_profile_field_micro_signal.md

Key properties:
  • Deterministic: same corpus + profile always produces the same reach

  • Seed-independent: no random sampling involved

  • Captures all three profile parameters (max_flips, temperature, frequency_weight)

  • Produces genuinely different values for all four named profiles

  • Scales correctly with corpus size (no saturation)

  • Computed once per corpus load, not per walk

Example

>>> from build_tools.syllable_walk.reach import compute_all_reaches
>>> from build_tools.syllable_walk import SyllableWalker
>>> walker = SyllableWalker("data/annotated/syllables_annotated.json")
>>> reaches = compute_all_reaches(walker)
>>> for name, result in reaches.items():
...     print(f"{name}: reach {result.reach} / {result.total}")
clerical: reach 4 / 2088
dialect: reach 32 / 2088
goblin: reach 58 / 2088
ritual: reach 147 / 2088

Attributes

DEFAULT_REACH_THRESHOLD

Classes

ReachResult

Result of a thermodynamic reach computation for a single profile.

Functions

compute_reach(walker, profile_name, max_flips, ...[, ...])

Compute mean effective vocabulary for a single profile.

compute_all_reaches(walker[, threshold, progress_callback])

Compute mean effective vocabulary for all four named walk profiles.

Module Contents

build_tools.syllable_walk.reach.DEFAULT_REACH_THRESHOLD: float = 0.001
class build_tools.syllable_walk.reach.ReachResult[source]

Result of a thermodynamic reach computation for a single profile.

Encapsulates both the reach count and the full context of how it was computed, including the profile parameters and timing metadata.

profile_name

Name of the profile (e.g., “clerical”, “dialect”).

reach

Mean number of syllables reachable per starting node (rounded). This is the primary micro signal — the average effective vocabulary size at each step of a walk under this profile’s constraints.

total

Total syllables in the corpus (the “field” size).

threshold

Probability threshold used for the reachability test. A syllable is counted if p > threshold from the starting node.

max_flips

Profile’s max_flips parameter (edge existence constraint).

temperature

Profile’s temperature parameter (probability shape).

frequency_weight

Profile’s frequency_weight parameter (rarity bias).

computation_ms

Wall-clock time for this profile’s computation in milliseconds. Captured as metadata to monitor performance across different systems and corpus sizes.

unique_reachable

Total unique syllables reachable from at least one starting node (union across all nodes). This is supplementary context — the mean per-node count (reach) is the primary metric displayed in the UI.

reachable_indices

Tuple of (syllable_index, reachability_count) pairs for all syllables in the union reachable set, sorted by reachability count descending (most commonly reachable first). The count is how many starting nodes can reach that syllable. Maps to syllable text via walker.syllables[idx]. Omitted from to_dict() to keep API responses lean.

Example

>>> result = ReachResult(
...     profile_name="dialect",
...     reach=32,
...     total=2088,
...     threshold=0.001,
...     max_flips=2,
...     temperature=0.7,
...     frequency_weight=0.0,
...     computation_ms=42.5,
...     unique_reachable=1850,
... )
>>> result.reach
32
profile_name: str
reach: int
total: int
threshold: float
max_flips: int
temperature: float
frequency_weight: float
computation_ms: float
unique_reachable: int = 0
reachable_indices: tuple[tuple[int, int], Ellipsis] = ()
to_dict()[source]

Serialise to a plain dictionary for API responses.

Returns:

Dictionary with all fields, suitable for JSON serialisation.

Return type:

dict[str, Any]

Example

>>> result.to_dict()
{'profile_name': 'dialect', 'reach': 32, 'total': 2088, ...}
build_tools.syllable_walk.reach.compute_reach(walker, profile_name, max_flips, temperature, frequency_weight, threshold=DEFAULT_REACH_THRESHOLD)[source]

Compute mean effective vocabulary for a single profile.

For each syllable in the corpus, computes the softmax transition probability distribution over all neighbors within max_flips distance, using the profile’s temperature and frequency_weight. Counts how many neighbors exceed the probability threshold, then returns the mean of these per-node counts as the reach value.

This replicates the same math as SyllableWalker.walk() (lines 526–549 of walker.py), but exhaustively over all starting nodes rather than sampling a single stochastic path.

The computation is:
  1. For each starting syllable s: a. Collect all neighbors within max_flips Hamming distance b. Compute cost per neighbor: flip_cost + rarity_cost c. Add inertia option (staying at s) for normalisation d. Apply softmax: weight_i = exp(-cost_i / temperature) e. Normalise to probabilities f. Count other syllables (not s itself) with p > threshold.

    Inertia participates in normalisation but self-transitions do not count toward reach.

  2. Return the mean per-node count (rounded to nearest integer)

Why mean-per-node instead of union?

The union approach (counting syllables reachable from any starting node) saturates to near-total for production corpora. With N=1,757 nodes and threshold=0.001, almost every syllable is reachable from at least one starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node count captures the effective vocabulary per step, which scales correctly with corpus size.

Parameters:
  • walker (build_tools.syllable_walk.walker.SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph. Must have neighbor_graph, _flip_cost(), _rarity_cost(), _hamming_distance(), and inertia_cost available.

  • profile_name (str) – Human-readable name for the profile (e.g., “dialect”). Stored in the result for identification.

  • max_flips (int) – Maximum feature flips per step (1–3). Determines which edges in the neighbor graph are traversable.

  • temperature (float) – Softmax temperature (0.1–5.0). Controls the shape of the probability distribution. Low temperature concentrates probability on low-cost transitions; high temperature flattens the distribution toward uniform.

  • frequency_weight (float) – Frequency bias (-2.0 to 2.0). Positive values penalise rare syllables (favour common); negative values reward rare syllables (favour uncommon).

  • threshold (float) – Minimum transition probability for a syllable to be counted as “effectively reachable.” Default: 0.001.

Returns:

ReachResult with the mean per-node reach count, corpus total, unique reachable count (union), and metadata.

Raises:

ValueError – If walker has no syllables loaded.

Return type:

ReachResult

Example

>>> result = compute_reach(
...     walker, "dialect",
...     max_flips=2, temperature=0.7, frequency_weight=0.0,
... )
>>> print(f"Dialect reach: {result.reach} / {result.total}")
Dialect reach: 32 / 2088
build_tools.syllable_walk.reach.compute_all_reaches(walker, threshold=DEFAULT_REACH_THRESHOLD, progress_callback=None)[source]

Compute mean effective vocabulary for all four named walk profiles.

Iterates over the predefined profiles (clerical, dialect, goblin, ritual) and computes the mean per-node thermodynamic reach for each. Returns a dictionary mapping profile names to their ReachResult.

This is intended to be called once after the walker finishes initialising, typically in the background thread that builds the neighbor graph. The results are cached in PatchState and served via the stats endpoint.

Parameters:
  • walker (build_tools.syllable_walk.walker.SyllableWalker) – Initialised SyllableWalker with pre-computed neighbor graph.

  • threshold (float) – Minimum transition probability for reachability. Default: 0.001. See DEFAULT_REACH_THRESHOLD for rationale.

  • progress_callback (Callable[[str], None] | None) – Optional callable invoked with a progress message after each profile is computed. Used by the web UI to show incremental reach results like "Computing reaches: clerical ~4, dialect ~32...".

Returns:

Dictionary mapping profile name to ReachResult. Keys: "clerical", "dialect", "goblin", "ritual".

Return type:

dict[str, ReachResult]

Example

>>> reaches = compute_all_reaches(walker)
>>> for name, r in reaches.items():
...     print(f"{name}: reach={r.reach}, time={r.computation_ms}ms")
clerical: reach=4, time=12.3ms
dialect: reach=32, time=15.1ms
goblin: reach=58, time=14.8ms
ritual: reach=147, time=18.2ms

Note

Custom profile reach is not computed here. See the TODO note in api/walker.py regarding on-demand computation for custom profiles.