build_tools.syllable_walk.reach =============================== .. py:module:: build_tools.syllable_walk.reach .. autoapi-nested-parse:: Thermodynamic reach calculator for syllable walker profiles. Computes the **mean effective vocabulary** of each walk profile — the average number of syllables with non-negligible transition probability from any given starting position, under the profile's complete parameter set (max_flips, temperature, frequency_weight). This is a deterministic, seed-independent metric that reflects the thermodynamic structure of the profile's constraint regime, not stochastic walk behaviour. The algorithm replicates the walker's softmax transition math (see ``SyllableWalker.walk()``), but exhaustively over all starting nodes rather than sampling a single path. For each starting syllable, it computes the full probability distribution over candidate neighbors, then counts how many syllables exceed a probability threshold. The final reach value is the **mean** of these per-node counts across all starting positions. Why mean-per-node instead of union? An earlier implementation used the union of all reachable syllables across all starting nodes. This produced poor discrimination at production scale: with N=1,757 starting nodes and threshold=0.001, almost every syllable was reachable from *some* starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node approach captures the effective vocabulary *per step* of a walk, which scales correctly with corpus size and discriminates between profiles that differ only in temperature or frequency_weight. Design reference: ``_working/syllable_walker_profile_field_micro_signal.md`` Key properties: - Deterministic: same corpus + profile always produces the same reach - Seed-independent: no random sampling involved - Captures all three profile parameters (max_flips, temperature, frequency_weight) - Produces genuinely different values for all four named profiles - Scales correctly with corpus size (no saturation) - Computed once per corpus load, not per walk .. admonition:: Example >>> from build_tools.syllable_walk.reach import compute_all_reaches >>> from build_tools.syllable_walk import SyllableWalker >>> walker = SyllableWalker("data/annotated/syllables_annotated.json") >>> reaches = compute_all_reaches(walker) >>> for name, result in reaches.items(): ... print(f"{name}: reach {result.reach} / {result.total}") clerical: reach 4 / 2088 dialect: reach 32 / 2088 goblin: reach 58 / 2088 ritual: reach 147 / 2088 Attributes ---------- .. autoapisummary:: build_tools.syllable_walk.reach.DEFAULT_REACH_THRESHOLD Classes ------- .. autoapisummary:: build_tools.syllable_walk.reach.ReachResult Functions --------- .. autoapisummary:: build_tools.syllable_walk.reach.compute_reach build_tools.syllable_walk.reach.compute_all_reaches Module Contents --------------- .. py:data:: DEFAULT_REACH_THRESHOLD :type: float :value: 0.001 .. py:class:: ReachResult Result of a thermodynamic reach computation for a single profile. Encapsulates both the reach count and the full context of how it was computed, including the profile parameters and timing metadata. .. attribute:: profile_name Name of the profile (e.g., "clerical", "dialect"). .. attribute:: reach Mean number of syllables reachable per starting node (rounded). This is the primary micro signal — the average effective vocabulary size at each step of a walk under this profile's constraints. .. attribute:: total Total syllables in the corpus (the "field" size). .. attribute:: threshold Probability threshold used for the reachability test. A syllable is counted if p > threshold from the starting node. .. attribute:: max_flips Profile's max_flips parameter (edge existence constraint). .. attribute:: temperature Profile's temperature parameter (probability shape). .. attribute:: frequency_weight Profile's frequency_weight parameter (rarity bias). .. attribute:: computation_ms Wall-clock time for this profile's computation in milliseconds. Captured as metadata to monitor performance across different systems and corpus sizes. .. attribute:: unique_reachable Total unique syllables reachable from at least one starting node (union across all nodes). This is supplementary context — the mean per-node count (``reach``) is the primary metric displayed in the UI. .. attribute:: reachable_indices Tuple of ``(syllable_index, reachability_count)`` pairs for all syllables in the union reachable set, sorted by reachability count descending (most commonly reachable first). The count is how many starting nodes can reach that syllable. Maps to syllable text via ``walker.syllables[idx]``. Omitted from ``to_dict()`` to keep API responses lean. .. admonition:: Example >>> result = ReachResult( ... profile_name="dialect", ... reach=32, ... total=2088, ... threshold=0.001, ... max_flips=2, ... temperature=0.7, ... frequency_weight=0.0, ... computation_ms=42.5, ... unique_reachable=1850, ... ) >>> result.reach 32 .. py:attribute:: profile_name :type: str .. py:attribute:: reach :type: int .. py:attribute:: total :type: int .. py:attribute:: threshold :type: float .. py:attribute:: max_flips :type: int .. py:attribute:: temperature :type: float .. py:attribute:: frequency_weight :type: float .. py:attribute:: computation_ms :type: float .. py:attribute:: unique_reachable :type: int :value: 0 .. py:attribute:: reachable_indices :type: tuple[tuple[int, int], Ellipsis] :value: () .. py:method:: to_dict() Serialise to a plain dictionary for API responses. :returns: Dictionary with all fields, suitable for JSON serialisation. .. admonition:: Example >>> result.to_dict() {'profile_name': 'dialect', 'reach': 32, 'total': 2088, ...} .. py:function:: compute_reach(walker, profile_name, max_flips, temperature, frequency_weight, threshold = DEFAULT_REACH_THRESHOLD) Compute mean effective vocabulary for a single profile. For each syllable in the corpus, computes the softmax transition probability distribution over all neighbors within ``max_flips`` distance, using the profile's ``temperature`` and ``frequency_weight``. Counts how many neighbors exceed the probability threshold, then returns the **mean** of these per-node counts as the reach value. This replicates the same math as ``SyllableWalker.walk()`` (lines 526–549 of walker.py), but exhaustively over all starting nodes rather than sampling a single stochastic path. The computation is: 1. For each starting syllable *s*: a. Collect all neighbors within max_flips Hamming distance b. Compute cost per neighbor: flip_cost + rarity_cost c. Add inertia option (staying at *s*) for normalisation d. Apply softmax: weight_i = exp(-cost_i / temperature) e. Normalise to probabilities f. Count **other** syllables (not *s* itself) with p > threshold. Inertia participates in normalisation but self-transitions do not count toward reach. 2. Return the **mean** per-node count (rounded to nearest integer) Why mean-per-node instead of union? The union approach (counting syllables reachable from *any* starting node) saturates to near-total for production corpora. With N=1,757 nodes and threshold=0.001, almost every syllable is reachable from at least one starting node, making reach ≈ total for any profile with max_flips ≥ 2. The mean-per-node count captures the effective vocabulary *per step*, which scales correctly with corpus size. :param walker: Initialised SyllableWalker with pre-computed neighbor graph. Must have ``neighbor_graph``, ``_flip_cost()``, ``_rarity_cost()``, ``_hamming_distance()``, and ``inertia_cost`` available. :param profile_name: Human-readable name for the profile (e.g., "dialect"). Stored in the result for identification. :param max_flips: Maximum feature flips per step (1–3). Determines which edges in the neighbor graph are traversable. :param temperature: Softmax temperature (0.1–5.0). Controls the shape of the probability distribution. Low temperature concentrates probability on low-cost transitions; high temperature flattens the distribution toward uniform. :param frequency_weight: Frequency bias (-2.0 to 2.0). Positive values penalise rare syllables (favour common); negative values reward rare syllables (favour uncommon). :param threshold: Minimum transition probability for a syllable to be counted as "effectively reachable." Default: 0.001. :returns: ReachResult with the mean per-node reach count, corpus total, unique reachable count (union), and metadata. :raises ValueError: If walker has no syllables loaded. .. admonition:: Example >>> result = compute_reach( ... walker, "dialect", ... max_flips=2, temperature=0.7, frequency_weight=0.0, ... ) >>> print(f"Dialect reach: {result.reach} / {result.total}") Dialect reach: 32 / 2088 .. py:function:: compute_all_reaches(walker, threshold = DEFAULT_REACH_THRESHOLD, progress_callback = None) Compute mean effective vocabulary for all four named walk profiles. Iterates over the predefined profiles (clerical, dialect, goblin, ritual) and computes the mean per-node thermodynamic reach for each. Returns a dictionary mapping profile names to their ReachResult. This is intended to be called once after the walker finishes initialising, typically in the background thread that builds the neighbor graph. The results are cached in PatchState and served via the stats endpoint. :param walker: Initialised SyllableWalker with pre-computed neighbor graph. :param threshold: Minimum transition probability for reachability. Default: 0.001. See ``DEFAULT_REACH_THRESHOLD`` for rationale. :param progress_callback: Optional callable invoked with a progress message after each profile is computed. Used by the web UI to show incremental reach results like ``"Computing reaches: clerical ~4, dialect ~32..."``. :returns: Dictionary mapping profile name to ReachResult. Keys: ``"clerical"``, ``"dialect"``, ``"goblin"``, ``"ritual"``. .. admonition:: Example >>> reaches = compute_all_reaches(walker) >>> for name, r in reaches.items(): ... print(f"{name}: reach={r.reach}, time={r.computation_ms}ms") clerical: reach=4, time=12.3ms dialect: reach=32, time=15.1ms goblin: reach=58, time=14.8ms ritual: reach=147, time=18.2ms .. note:: Custom profile reach is not computed here. See the TODO note in ``api/walker.py`` regarding on-demand computation for custom profiles.