build_tools.syllable_walk_tui.services.metrics
Corpus shape metrics computation.
This module provides dataclasses and pure functions for computing raw, objective metrics about corpus shape. These metrics characterize the statistical structure of a syllable corpus without interpretation.
- Design Philosophy:
Raw numbers only, no interpretation or judgment
Pure functions (no side effects, no I/O)
All metrics are observable facts about the corpus
Users draw their own conclusions from the data
- Metric Categories:
Inventory: What exists (counts, lengths)
Frequency: Weight distribution (how syllables are distributed)
Feature Saturation: Phonetic feature coverage (per-feature counts)
- Usage:
>>> from build_tools.syllable_walk_tui.services.metrics import ( ... compute_corpus_shape_metrics ... ) >>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data) >>> print(f"Total syllables: {metrics.inventory.total_count}") >>> print(f"Hapax count: {metrics.frequency.hapax_count}")
Attributes
Classes
Raw inventory metrics describing what exists in the corpus. |
|
Raw frequency distribution metrics. |
|
Saturation metrics for a single phonetic feature. |
|
Feature saturation metrics for all 12 phonetic features. |
|
Exemplar syllables from each pole of a terrain axis. |
|
Phonaesthetic terrain metrics describing corpus character. |
|
Complete corpus shape metrics combining all categories. |
Functions
|
Compute inventory metrics from a list of syllables. |
|
Compute frequency distribution metrics. |
|
Compute feature saturation metrics from annotated syllable data. |
|
Compute axis score for a single syllable from its boolean features. |
|
Sample exemplar syllables from each pole of an axis. |
|
Compute phonaesthetic terrain metrics from feature saturation. |
|
Compute complete corpus shape metrics. |
Module Contents
- class build_tools.syllable_walk_tui.services.metrics.InventoryMetrics[source]
Raw inventory metrics describing what exists in the corpus.
All metrics are objective counts and statistics about syllable inventory.
- total_count
Total number of unique syllables
- length_min
Minimum syllable length (characters)
- length_max
Maximum syllable length (characters)
- length_mean
Mean syllable length
- length_median
Median syllable length
- length_std
Standard deviation of syllable lengths
- length_distribution
Count of syllables at each length {length: count}
- build_tools.syllable_walk_tui.services.metrics.compute_inventory_metrics(syllables)[source]
Compute inventory metrics from a list of syllables.
- Parameters:
syllables (collections.abc.Sequence[str]) – List of unique syllables
- Returns:
InventoryMetrics with all computed values
- Raises:
ValueError – If syllables list is empty
- Return type:
- class build_tools.syllable_walk_tui.services.metrics.FrequencyMetrics[source]
Raw frequency distribution metrics.
Describes how syllable occurrences are distributed across the corpus.
- total_occurrences
Sum of all frequency counts
- freq_min
Minimum frequency value
- freq_max
Maximum frequency value
- freq_mean
Mean frequency
- freq_median
Median frequency
- freq_std
Standard deviation of frequencies
- percentile_10
10th percentile frequency
- percentile_25
25th percentile frequency (Q1)
- percentile_50
50th percentile frequency (median)
- percentile_75
75th percentile frequency (Q3)
- percentile_90
90th percentile frequency
- percentile_99
99th percentile frequency
- unique_freq_count
Number of distinct frequency values
- hapax_count
Count of syllables appearing exactly once
- top_10
Top 10 syllables by frequency [(syllable, freq), …]
- bottom_10
Bottom 10 syllables by frequency [(syllable, freq), …]
- build_tools.syllable_walk_tui.services.metrics.compute_frequency_metrics(frequencies)[source]
Compute frequency distribution metrics.
- Parameters:
frequencies (dict[str, int]) – Dictionary mapping syllable to frequency count
- Returns:
FrequencyMetrics with all computed values
- Raises:
ValueError – If frequencies dict is empty
- Return type:
- build_tools.syllable_walk_tui.services.metrics.FEATURE_NAMES: tuple[str, Ellipsis] = ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...
- class build_tools.syllable_walk_tui.services.metrics.FeatureSaturation[source]
Saturation metrics for a single phonetic feature.
- feature_name
Name of the feature
- true_count
Number of syllables with feature = True
- false_count
Number of syllables with feature = False
- true_percentage
Percentage of corpus with feature = True
- class build_tools.syllable_walk_tui.services.metrics.FeatureSaturationMetrics[source]
Feature saturation metrics for all 12 phonetic features.
- total_syllables
Total syllables analyzed
- features
Tuple of FeatureSaturation for each feature (in canonical order)
- by_name
Dict mapping feature name to FeatureSaturation (for lookup)
- features: tuple[FeatureSaturation, Ellipsis] = ()
- by_name: dict[str, FeatureSaturation]
- build_tools.syllable_walk_tui.services.metrics.compute_feature_saturation_metrics(annotated_data)[source]
Compute feature saturation metrics from annotated syllable data.
- Parameters:
annotated_data (collections.abc.Sequence[dict]) – List of dicts with ‘syllable’, ‘frequency’, ‘features’ keys
- Returns:
FeatureSaturationMetrics with per-feature saturation counts
- Raises:
ValueError – If annotated_data is empty or malformed
- Return type:
- class build_tools.syllable_walk_tui.services.metrics.PoleExemplars[source]
Exemplar syllables from each pole of a terrain axis.
These concrete examples help users understand what syllables represent each end of the phonaesthetic spectrum.
- axis_name
Name of the axis (“shape”, “craft”, or “space”)
- low_pole_exemplars
Syllables from the low pole (Round/Flowing/Open)
- high_pole_exemplars
Syllables from the high pole (Jagged/Worked/Dense)
- class build_tools.syllable_walk_tui.services.metrics.TerrainMetrics[source]
Phonaesthetic terrain metrics describing corpus character.
Three axes derived from feature saturation percentages: - Shape: Round (0.0) ↔ Jagged (1.0) - Bouba/Kiki dimension - Craft: Flowing (0.0) ↔ Worked (1.0) - Sung/Forged dimension - Space: Open (0.0) ↔ Dense (1.0) - Valley/Workshop dimension
Scores are normalized to 0.0-1.0 range where 0.5 is neutral.
- shape_score
Position on Round↔Jagged axis (0.0-1.0)
- craft_score
Position on Flowing↔Worked axis (0.0-1.0)
- space_score
Position on Open↔Dense axis (0.0-1.0)
- shape_label
Human-readable label for shape position
- craft_label
Human-readable label for craft position
- space_label
Human-readable label for space position
- shape_exemplars
Optional exemplar syllables for shape axis
- craft_exemplars
Optional exemplar syllables for craft axis
- space_exemplars
Optional exemplar syllables for space axis
- shape_exemplars: PoleExemplars | None = None
- craft_exemplars: PoleExemplars | None = None
- space_exemplars: PoleExemplars | None = None
- build_tools.syllable_walk_tui.services.metrics.score_syllable_on_axis(features, axis_weights)[source]
Compute axis score for a single syllable from its boolean features.
Unlike _compute_axis_score() which uses corpus percentages, this uses binary features (0 or 1) to rank individual syllables.
- Parameters:
features (dict[str, bool]) – Dictionary of feature_name -> boolean
axis_weights (build_tools.syllable_walk_tui.services.terrain_weights.AxisWeights) – AxisWeights containing feature-to-weight mappings
- Returns:
Raw weighted sum (not normalized). Higher = more toward high pole.
- Return type:
- build_tools.syllable_walk_tui.services.metrics.sample_pole_exemplars(annotated_data, axis_weights, axis_name, n_exemplars=3, rng=None)[source]
Sample exemplar syllables from each pole of an axis.
Scores all syllables in the corpus and samples from the low and high tails to provide concrete examples of syllables at each pole.
- Parameters:
annotated_data (collections.abc.Sequence[dict]) – List of {“syllable”: str, “features”: dict} entries
axis_weights (build_tools.syllable_walk_tui.services.terrain_weights.AxisWeights) – Weights for the axis
axis_name (str) – Name of axis (“shape”, “craft”, “space”)
n_exemplars (int) – Number of exemplars per pole (default 3)
rng (random.Random | None) – Optional RNG for shuffling within tails (isolated from generation)
- Returns:
PoleExemplars with syllables from low and high poles
- Return type:
- build_tools.syllable_walk_tui.services.metrics.compute_terrain_metrics(feature_saturation, weights=None, annotated_data=None, exemplar_rng=None, n_exemplars=3)[source]
Compute phonaesthetic terrain metrics from feature saturation.
Derives three axis scores representing the corpus’s position in phonaesthetic space. These are descriptive, not prescriptive - they characterize the acoustic terrain without imposing meaning.
- Parameters:
feature_saturation (FeatureSaturationMetrics) – Computed feature saturation metrics
weights (build_tools.syllable_walk_tui.services.terrain_weights.TerrainWeights | None) – Optional TerrainWeights configuration. If None, uses DEFAULT_TERRAIN_WEIGHTS from terrain_weights module. Custom weights allow calibration for different phonaesthetic models or user preferences.
annotated_data (collections.abc.Sequence[dict] | None) – Optional list of {“syllable”: str, “features”: dict} entries. If provided, pole exemplars will be computed.
exemplar_rng (random.Random | None) – Optional RNG for shuffling exemplars. Isolated from name generation to maintain determinism.
n_exemplars (int) – Number of exemplars per pole (default 3)
- Returns:
TerrainMetrics with scores and labels for all three axes
- Return type:
Example
>>> terrain = compute_terrain_metrics(feature_saturation) >>> print(f"Shape: {terrain.shape_score:.2f} ({terrain.shape_label})") >>> print(f"Craft: {terrain.craft_score:.2f} ({terrain.craft_label})")
# With custom weights: >>> from build_tools.syllable_walk_tui.services.terrain_weights import ( … TerrainWeights, AxisWeights … ) >>> custom = TerrainWeights(shape=AxisWeights({“contains_plosive”: 1.5})) >>> terrain = compute_terrain_metrics(feature_saturation, weights=custom)
# With exemplars: >>> terrain = compute_terrain_metrics( … feature_saturation, annotated_data=corpus_data … ) >>> print(terrain.shape_exemplars.low_pole_exemplars)
- class build_tools.syllable_walk_tui.services.metrics.CorpusShapeMetrics[source]
Complete corpus shape metrics combining all categories.
This is the primary interface for corpus analysis. Contains all raw metrics needed to understand corpus structure.
- inventory
Inventory metrics (counts, lengths)
- frequency
Frequency distribution metrics
- feature_saturation
Per-feature saturation metrics
- terrain
Phonaesthetic terrain metrics (derived from features)
- inventory: InventoryMetrics
- frequency: FrequencyMetrics
- feature_saturation: FeatureSaturationMetrics
- terrain: TerrainMetrics
- build_tools.syllable_walk_tui.services.metrics.compute_corpus_shape_metrics(syllables, frequencies, annotated_data)[source]
Compute complete corpus shape metrics.
This is the main entry point for corpus analysis. Computes all metric categories and returns a composite result.
- Parameters:
syllables (collections.abc.Sequence[str]) – List of unique syllables
frequencies (dict[str, int]) – Dictionary mapping syllable to frequency count
annotated_data (collections.abc.Sequence[dict]) – List of annotated syllable dicts
- Returns:
CorpusShapeMetrics containing all computed metrics
- Raises:
ValueError – If any input is empty or malformed
- Return type:
Example
>>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data) >>> print(f"Corpus has {metrics.inventory.total_count} syllables") >>> print(f"Hapax legomena: {metrics.frequency.hapax_count}") >>> vowel_pct = metrics.feature_saturation.by_name['starts_with_vowel'].true_percentage >>> print(f"Starts with vowel: {vowel_pct:.1f}%") >>> print(f"Terrain: {metrics.terrain.shape_label}")