build_tools.syllable_walk_tui.services.metrics

Corpus shape metrics computation.

This module provides dataclasses and pure functions for computing raw, objective metrics about corpus shape. These metrics characterize the statistical structure of a syllable corpus without interpretation.

Design Philosophy:

Raw numbers only, no interpretation or judgment
Pure functions (no side effects, no I/O)
All metrics are observable facts about the corpus
Users draw their own conclusions from the data

Metric Categories:

Inventory: What exists (counts, lengths)
Frequency: Weight distribution (how syllables are distributed)
Feature Saturation: Phonetic feature coverage (per-feature counts)

Usage:

>>> from build_tools.syllable_walk_tui.services.metrics import (
...     compute_corpus_shape_metrics
... )
>>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data)
>>> print(f"Total syllables: {metrics.inventory.total_count}")
>>> print(f"Hapax count: {metrics.frequency.hapax_count}")

Attributes

FEATURE_NAMES

Classes

`InventoryMetrics`	Raw inventory metrics describing what exists in the corpus.
`FrequencyMetrics`	Raw frequency distribution metrics.
`FeatureSaturation`	Saturation metrics for a single phonetic feature.
`FeatureSaturationMetrics`	Feature saturation metrics for all 12 phonetic features.
`PoleExemplars`	Exemplar syllables from each pole of a terrain axis.
`TerrainMetrics`	Phonaesthetic terrain metrics describing corpus character.
`CorpusShapeMetrics`	Complete corpus shape metrics combining all categories.

Functions

`compute_inventory_metrics`(syllables)	Compute inventory metrics from a list of syllables.
`compute_frequency_metrics`(frequencies)	Compute frequency distribution metrics.
`compute_feature_saturation_metrics`(annotated_data)	Compute feature saturation metrics from annotated syllable data.
`score_syllable_on_axis`(features, axis_weights)	Compute axis score for a single syllable from its boolean features.
`sample_pole_exemplars`(annotated_data, axis_weights, ...)	Sample exemplar syllables from each pole of an axis.
`compute_terrain_metrics`(feature_saturation[, weights, ...])	Compute phonaesthetic terrain metrics from feature saturation.
`compute_corpus_shape_metrics`(syllables, frequencies, ...)	Compute complete corpus shape metrics.

Module Contents

class build_tools.syllable_walk_tui.services.metrics.InventoryMetrics[source]

Raw inventory metrics describing what exists in the corpus.

All metrics are objective counts and statistics about syllable inventory.

total_count: Total number of unique syllables

length_min: Minimum syllable length (characters)

length_max: Maximum syllable length (characters)

length_mean: Mean syllable length

length_median: Median syllable length

length_std: Standard deviation of syllable lengths

length_distribution: Count of syllables at each length {length: count}

total_count: int

length_min: int

length_max: int

length_mean: float

length_median: float

length_std: float

length_distribution: dict[int, int]

build_tools.syllable_walk_tui.services.metrics.compute_inventory_metrics(syllables)[source]

Compute inventory metrics from a list of syllables.

Parameters:: syllables (collections.abc.Sequence[str]) – List of unique syllables
Returns:: InventoryMetrics with all computed values
Raises:: ValueError – If syllables list is empty
Return type:: InventoryMetrics

class build_tools.syllable_walk_tui.services.metrics.FrequencyMetrics[source]

Raw frequency distribution metrics.

Describes how syllable occurrences are distributed across the corpus.

total_occurrences: Sum of all frequency counts

freq_min: Minimum frequency value

freq_max: Maximum frequency value

freq_mean: Mean frequency

freq_median: Median frequency

freq_std: Standard deviation of frequencies

percentile_10: 10th percentile frequency

percentile_25: 25th percentile frequency (Q1)

percentile_50: 50th percentile frequency (median)

percentile_75: 75th percentile frequency (Q3)

percentile_90: 90th percentile frequency

percentile_99: 99th percentile frequency

unique_freq_count: Number of distinct frequency values

hapax_count: Count of syllables appearing exactly once

top_10: Top 10 syllables by frequency [(syllable, freq), …]

bottom_10: Bottom 10 syllables by frequency [(syllable, freq), …]

total_occurrences: int

freq_min: int

freq_max: int

freq_mean: float

freq_median: float

freq_std: float

percentile_10: int

percentile_25: int

percentile_50: int

percentile_75: int

percentile_90: int

percentile_99: int

unique_freq_count: int

hapax_count: int

top_10: tuple[tuple[str, int], Ellipsis] = ()

bottom_10: tuple[tuple[str, int], Ellipsis] = ()

build_tools.syllable_walk_tui.services.metrics.compute_frequency_metrics(frequencies)[source]

Compute frequency distribution metrics.

Parameters:: frequencies (dict[str, int]) – Dictionary mapping syllable to frequency count
Returns:: FrequencyMetrics with all computed values
Raises:: ValueError – If frequencies dict is empty
Return type:: FrequencyMetrics

build_tools.syllable_walk_tui.services.metrics.FEATURE_NAMES: tuple[str, Ellipsis] = ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...

class build_tools.syllable_walk_tui.services.metrics.FeatureSaturation[source]

Saturation metrics for a single phonetic feature.

feature_name: Name of the feature

true_count: Number of syllables with feature = True

false_count: Number of syllables with feature = False

true_percentage: Percentage of corpus with feature = True

feature_name: str

true_count: int

false_count: int

true_percentage: float

class build_tools.syllable_walk_tui.services.metrics.FeatureSaturationMetrics[source]

Feature saturation metrics for all 12 phonetic features.

total_syllables: Total syllables analyzed

features: Tuple of FeatureSaturation for each feature (in canonical order)

by_name: Dict mapping feature name to FeatureSaturation (for lookup)

total_syllables: int

features: tuple[FeatureSaturation, Ellipsis] = ()

by_name: dict[str, FeatureSaturation]

build_tools.syllable_walk_tui.services.metrics.compute_feature_saturation_metrics(annotated_data)[source]

Compute feature saturation metrics from annotated syllable data.

Parameters:: annotated_data (collections.abc.Sequence[dict]) – List of dicts with ‘syllable’, ‘frequency’, ‘features’ keys
Returns:: FeatureSaturationMetrics with per-feature saturation counts
Raises:: ValueError – If annotated_data is empty or malformed
Return type:: FeatureSaturationMetrics

class build_tools.syllable_walk_tui.services.metrics.PoleExemplars[source]

Exemplar syllables from each pole of a terrain axis.

These concrete examples help users understand what syllables represent each end of the phonaesthetic spectrum.

axis_name: Name of the axis (“shape”, “craft”, or “space”)

low_pole_exemplars: Syllables from the low pole (Round/Flowing/Open)

high_pole_exemplars: Syllables from the high pole (Jagged/Worked/Dense)

axis_name: str

low_pole_exemplars: tuple[str, Ellipsis]

high_pole_exemplars: tuple[str, Ellipsis]

class build_tools.syllable_walk_tui.services.metrics.TerrainMetrics[source]

Phonaesthetic terrain metrics describing corpus character.

Three axes derived from feature saturation percentages: - Shape: Round (0.0) ↔ Jagged (1.0) - Bouba/Kiki dimension - Craft: Flowing (0.0) ↔ Worked (1.0) - Sung/Forged dimension - Space: Open (0.0) ↔ Dense (1.0) - Valley/Workshop dimension

Scores are normalized to 0.0-1.0 range where 0.5 is neutral.

shape_score: Position on Round↔Jagged axis (0.0-1.0)

craft_score: Position on Flowing↔Worked axis (0.0-1.0)

space_score: Position on Open↔Dense axis (0.0-1.0)

shape_label: Human-readable label for shape position

craft_label: Human-readable label for craft position

space_label: Human-readable label for space position

shape_exemplars: Optional exemplar syllables for shape axis

craft_exemplars: Optional exemplar syllables for craft axis

space_exemplars: Optional exemplar syllables for space axis

shape_score: float

craft_score: float

space_score: float

shape_label: str

craft_label: str

space_label: str

shape_exemplars: PoleExemplars | None = None

craft_exemplars: PoleExemplars | None = None

space_exemplars: PoleExemplars | None = None

build_tools.syllable_walk_tui.services.metrics.score_syllable_on_axis(features, axis_weights)[source]

Compute axis score for a single syllable from its boolean features.

Unlike _compute_axis_score() which uses corpus percentages, this uses binary features (0 or 1) to rank individual syllables.

Parameters:

features (dict[str, bool]) – Dictionary of feature_name -> boolean
axis_weights (build_tools.syllable_walk_tui.services.terrain_weights.AxisWeights) – AxisWeights containing feature-to-weight mappings

Returns:

Raw weighted sum (not normalized). Higher = more toward high pole.

Return type:

float

build_tools.syllable_walk_tui.services.metrics.sample_pole_exemplars(annotated_data, axis_weights, axis_name, n_exemplars=3, rng=None)[source]

Sample exemplar syllables from each pole of an axis.

Scores all syllables in the corpus and samples from the low and high tails to provide concrete examples of syllables at each pole.

Parameters:

annotated_data (collections.abc.Sequence[dict]) – List of {“syllable”: str, “features”: dict} entries
axis_weights (build_tools.syllable_walk_tui.services.terrain_weights.AxisWeights) – Weights for the axis
axis_name (str) – Name of axis (“shape”, “craft”, “space”)
n_exemplars (int) – Number of exemplars per pole (default 3)
rng (random.Random | None) – Optional RNG for shuffling within tails (isolated from generation)

Returns:

PoleExemplars with syllables from low and high poles

Return type:

PoleExemplars

build_tools.syllable_walk_tui.services.metrics.compute_terrain_metrics(feature_saturation, weights=None, annotated_data=None, exemplar_rng=None, n_exemplars=3)[source]

Compute phonaesthetic terrain metrics from feature saturation.

Derives three axis scores representing the corpus’s position in phonaesthetic space. These are descriptive, not prescriptive - they characterize the acoustic terrain without imposing meaning.

Parameters:

feature_saturation (FeatureSaturationMetrics) – Computed feature saturation metrics
weights (build_tools.syllable_walk_tui.services.terrain_weights.TerrainWeights | None) – Optional TerrainWeights configuration. If None, uses DEFAULT_TERRAIN_WEIGHTS from terrain_weights module. Custom weights allow calibration for different phonaesthetic models or user preferences.
annotated_data (collections.abc.Sequence[dict] | None) – Optional list of {“syllable”: str, “features”: dict} entries. If provided, pole exemplars will be computed.
exemplar_rng (random.Random | None) – Optional RNG for shuffling exemplars. Isolated from name generation to maintain determinism.
n_exemplars (int) – Number of exemplars per pole (default 3)

Returns:

TerrainMetrics with scores and labels for all three axes

Return type:

TerrainMetrics

Example

>>> terrain = compute_terrain_metrics(feature_saturation)
>>> print(f"Shape: {terrain.shape_score:.2f} ({terrain.shape_label})")
>>> print(f"Craft: {terrain.craft_score:.2f} ({terrain.craft_label})")

# With custom weights: >>> from build_tools.syllable_walk_tui.services.terrain_weights import ( … TerrainWeights, AxisWeights … ) >>> custom = TerrainWeights(shape=AxisWeights({“contains_plosive”: 1.5})) >>> terrain = compute_terrain_metrics(feature_saturation, weights=custom)

# With exemplars: >>> terrain = compute_terrain_metrics( … feature_saturation, annotated_data=corpus_data … ) >>> print(terrain.shape_exemplars.low_pole_exemplars)

class build_tools.syllable_walk_tui.services.metrics.CorpusShapeMetrics[source]

Complete corpus shape metrics combining all categories.

This is the primary interface for corpus analysis. Contains all raw metrics needed to understand corpus structure.

inventory: Inventory metrics (counts, lengths)

frequency: Frequency distribution metrics

feature_saturation: Per-feature saturation metrics

terrain: Phonaesthetic terrain metrics (derived from features)

inventory: InventoryMetrics

frequency: FrequencyMetrics

feature_saturation: FeatureSaturationMetrics

terrain: TerrainMetrics

build_tools.syllable_walk_tui.services.metrics.compute_corpus_shape_metrics(syllables, frequencies, annotated_data)[source]

Compute complete corpus shape metrics.

This is the main entry point for corpus analysis. Computes all metric categories and returns a composite result.

Parameters:

syllables (collections.abc.Sequence[str]) – List of unique syllables
frequencies (dict[str, int]) – Dictionary mapping syllable to frequency count
annotated_data (collections.abc.Sequence[dict]) – List of annotated syllable dicts

Returns:

CorpusShapeMetrics containing all computed metrics

Raises:

ValueError – If any input is empty or malformed

Return type:

CorpusShapeMetrics

Example

>>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data)
>>> print(f"Corpus has {metrics.inventory.total_count} syllables")
>>> print(f"Hapax legomena: {metrics.frequency.hapax_count}")
>>> vowel_pct = metrics.feature_saturation.by_name['starts_with_vowel'].true_percentage
>>> print(f"Starts with vowel: {vowel_pct:.1f}%")
>>> print(f"Terrain: {metrics.terrain.shape_label}")