build_tools.syllable_walk_tui.services.metrics

Corpus shape metrics computation.

This module provides dataclasses and pure functions for computing raw, objective metrics about corpus shape. These metrics characterize the statistical structure of a syllable corpus without interpretation.

Design Philosophy:
  • Raw numbers only, no interpretation or judgment

  • Pure functions (no side effects, no I/O)

  • All metrics are observable facts about the corpus

  • Users draw their own conclusions from the data

Metric Categories:
  • Inventory: What exists (counts, lengths)

  • Frequency: Weight distribution (how syllables are distributed)

  • Feature Saturation: Phonetic feature coverage (per-feature counts)

Usage:
>>> from build_tools.syllable_walk_tui.services.metrics import (
...     compute_corpus_shape_metrics
... )
>>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data)
>>> print(f"Total syllables: {metrics.inventory.total_count}")
>>> print(f"Hapax count: {metrics.frequency.hapax_count}")

Attributes

FEATURE_NAMES

Classes

InventoryMetrics

Raw inventory metrics describing what exists in the corpus.

FrequencyMetrics

Raw frequency distribution metrics.

FeatureSaturation

Saturation metrics for a single phonetic feature.

FeatureSaturationMetrics

Feature saturation metrics for all 12 phonetic features.

PoleExemplars

Exemplar syllables from each pole of a terrain axis.

TerrainMetrics

Phonaesthetic terrain metrics describing corpus character.

CorpusShapeMetrics

Complete corpus shape metrics combining all categories.

Functions

compute_inventory_metrics(syllables)

Compute inventory metrics from a list of syllables.

compute_frequency_metrics(frequencies)

Compute frequency distribution metrics.

compute_feature_saturation_metrics(annotated_data)

Compute feature saturation metrics from annotated syllable data.

score_syllable_on_axis(features, axis_weights)

Compute axis score for a single syllable from its boolean features.

sample_pole_exemplars(annotated_data, axis_weights, ...)

Sample exemplar syllables from each pole of an axis.

compute_terrain_metrics(feature_saturation[, weights, ...])

Compute phonaesthetic terrain metrics from feature saturation.

compute_corpus_shape_metrics(syllables, frequencies, ...)

Compute complete corpus shape metrics.

Module Contents

class build_tools.syllable_walk_tui.services.metrics.InventoryMetrics[source]

Raw inventory metrics describing what exists in the corpus.

All metrics are objective counts and statistics about syllable inventory.

total_count

Total number of unique syllables

length_min

Minimum syllable length (characters)

length_max

Maximum syllable length (characters)

length_mean

Mean syllable length

length_median

Median syllable length

length_std

Standard deviation of syllable lengths

length_distribution

Count of syllables at each length {length: count}

total_count: int
length_min: int
length_max: int
length_mean: float
length_median: float
length_std: float
length_distribution: dict[int, int]
build_tools.syllable_walk_tui.services.metrics.compute_inventory_metrics(syllables)[source]

Compute inventory metrics from a list of syllables.

Parameters:

syllables (collections.abc.Sequence[str]) – List of unique syllables

Returns:

InventoryMetrics with all computed values

Raises:

ValueError – If syllables list is empty

Return type:

InventoryMetrics

class build_tools.syllable_walk_tui.services.metrics.FrequencyMetrics[source]

Raw frequency distribution metrics.

Describes how syllable occurrences are distributed across the corpus.

total_occurrences

Sum of all frequency counts

freq_min

Minimum frequency value

freq_max

Maximum frequency value

freq_mean

Mean frequency

freq_median

Median frequency

freq_std

Standard deviation of frequencies

percentile_10

10th percentile frequency

percentile_25

25th percentile frequency (Q1)

percentile_50

50th percentile frequency (median)

percentile_75

75th percentile frequency (Q3)

percentile_90

90th percentile frequency

percentile_99

99th percentile frequency

unique_freq_count

Number of distinct frequency values

hapax_count

Count of syllables appearing exactly once

top_10

Top 10 syllables by frequency [(syllable, freq), …]

bottom_10

Bottom 10 syllables by frequency [(syllable, freq), …]

total_occurrences: int
freq_min: int
freq_max: int
freq_mean: float
freq_median: float
freq_std: float
percentile_10: int
percentile_25: int
percentile_50: int
percentile_75: int
percentile_90: int
percentile_99: int
unique_freq_count: int
hapax_count: int
top_10: tuple[tuple[str, int], Ellipsis] = ()
bottom_10: tuple[tuple[str, int], Ellipsis] = ()
build_tools.syllable_walk_tui.services.metrics.compute_frequency_metrics(frequencies)[source]

Compute frequency distribution metrics.

Parameters:

frequencies (dict[str, int]) – Dictionary mapping syllable to frequency count

Returns:

FrequencyMetrics with all computed values

Raises:

ValueError – If frequencies dict is empty

Return type:

FrequencyMetrics

build_tools.syllable_walk_tui.services.metrics.FEATURE_NAMES: tuple[str, Ellipsis] = ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...
class build_tools.syllable_walk_tui.services.metrics.FeatureSaturation[source]

Saturation metrics for a single phonetic feature.

feature_name

Name of the feature

true_count

Number of syllables with feature = True

false_count

Number of syllables with feature = False

true_percentage

Percentage of corpus with feature = True

feature_name: str
true_count: int
false_count: int
true_percentage: float
class build_tools.syllable_walk_tui.services.metrics.FeatureSaturationMetrics[source]

Feature saturation metrics for all 12 phonetic features.

total_syllables

Total syllables analyzed

features

Tuple of FeatureSaturation for each feature (in canonical order)

by_name

Dict mapping feature name to FeatureSaturation (for lookup)

total_syllables: int
features: tuple[FeatureSaturation, Ellipsis] = ()
by_name: dict[str, FeatureSaturation]
build_tools.syllable_walk_tui.services.metrics.compute_feature_saturation_metrics(annotated_data)[source]

Compute feature saturation metrics from annotated syllable data.

Parameters:

annotated_data (collections.abc.Sequence[dict]) – List of dicts with ‘syllable’, ‘frequency’, ‘features’ keys

Returns:

FeatureSaturationMetrics with per-feature saturation counts

Raises:

ValueError – If annotated_data is empty or malformed

Return type:

FeatureSaturationMetrics

class build_tools.syllable_walk_tui.services.metrics.PoleExemplars[source]

Exemplar syllables from each pole of a terrain axis.

These concrete examples help users understand what syllables represent each end of the phonaesthetic spectrum.

axis_name

Name of the axis (“shape”, “craft”, or “space”)

low_pole_exemplars

Syllables from the low pole (Round/Flowing/Open)

high_pole_exemplars

Syllables from the high pole (Jagged/Worked/Dense)

axis_name: str
low_pole_exemplars: tuple[str, Ellipsis]
high_pole_exemplars: tuple[str, Ellipsis]
class build_tools.syllable_walk_tui.services.metrics.TerrainMetrics[source]

Phonaesthetic terrain metrics describing corpus character.

Three axes derived from feature saturation percentages: - Shape: Round (0.0) ↔ Jagged (1.0) - Bouba/Kiki dimension - Craft: Flowing (0.0) ↔ Worked (1.0) - Sung/Forged dimension - Space: Open (0.0) ↔ Dense (1.0) - Valley/Workshop dimension

Scores are normalized to 0.0-1.0 range where 0.5 is neutral.

shape_score

Position on Round↔Jagged axis (0.0-1.0)

craft_score

Position on Flowing↔Worked axis (0.0-1.0)

space_score

Position on Open↔Dense axis (0.0-1.0)

shape_label

Human-readable label for shape position

craft_label

Human-readable label for craft position

space_label

Human-readable label for space position

shape_exemplars

Optional exemplar syllables for shape axis

craft_exemplars

Optional exemplar syllables for craft axis

space_exemplars

Optional exemplar syllables for space axis

shape_score: float
craft_score: float
space_score: float
shape_label: str
craft_label: str
space_label: str
shape_exemplars: PoleExemplars | None = None
craft_exemplars: PoleExemplars | None = None
space_exemplars: PoleExemplars | None = None
build_tools.syllable_walk_tui.services.metrics.score_syllable_on_axis(features, axis_weights)[source]

Compute axis score for a single syllable from its boolean features.

Unlike _compute_axis_score() which uses corpus percentages, this uses binary features (0 or 1) to rank individual syllables.

Parameters:
Returns:

Raw weighted sum (not normalized). Higher = more toward high pole.

Return type:

float

build_tools.syllable_walk_tui.services.metrics.sample_pole_exemplars(annotated_data, axis_weights, axis_name, n_exemplars=3, rng=None)[source]

Sample exemplar syllables from each pole of an axis.

Scores all syllables in the corpus and samples from the low and high tails to provide concrete examples of syllables at each pole.

Parameters:
Returns:

PoleExemplars with syllables from low and high poles

Return type:

PoleExemplars

build_tools.syllable_walk_tui.services.metrics.compute_terrain_metrics(feature_saturation, weights=None, annotated_data=None, exemplar_rng=None, n_exemplars=3)[source]

Compute phonaesthetic terrain metrics from feature saturation.

Derives three axis scores representing the corpus’s position in phonaesthetic space. These are descriptive, not prescriptive - they characterize the acoustic terrain without imposing meaning.

Parameters:
  • feature_saturation (FeatureSaturationMetrics) – Computed feature saturation metrics

  • weights (build_tools.syllable_walk_tui.services.terrain_weights.TerrainWeights | None) – Optional TerrainWeights configuration. If None, uses DEFAULT_TERRAIN_WEIGHTS from terrain_weights module. Custom weights allow calibration for different phonaesthetic models or user preferences.

  • annotated_data (collections.abc.Sequence[dict] | None) – Optional list of {“syllable”: str, “features”: dict} entries. If provided, pole exemplars will be computed.

  • exemplar_rng (random.Random | None) – Optional RNG for shuffling exemplars. Isolated from name generation to maintain determinism.

  • n_exemplars (int) – Number of exemplars per pole (default 3)

Returns:

TerrainMetrics with scores and labels for all three axes

Return type:

TerrainMetrics

Example

>>> terrain = compute_terrain_metrics(feature_saturation)
>>> print(f"Shape: {terrain.shape_score:.2f} ({terrain.shape_label})")
>>> print(f"Craft: {terrain.craft_score:.2f} ({terrain.craft_label})")

# With custom weights: >>> from build_tools.syllable_walk_tui.services.terrain_weights import ( … TerrainWeights, AxisWeights … ) >>> custom = TerrainWeights(shape=AxisWeights({“contains_plosive”: 1.5})) >>> terrain = compute_terrain_metrics(feature_saturation, weights=custom)

# With exemplars: >>> terrain = compute_terrain_metrics( … feature_saturation, annotated_data=corpus_data … ) >>> print(terrain.shape_exemplars.low_pole_exemplars)

class build_tools.syllable_walk_tui.services.metrics.CorpusShapeMetrics[source]

Complete corpus shape metrics combining all categories.

This is the primary interface for corpus analysis. Contains all raw metrics needed to understand corpus structure.

inventory

Inventory metrics (counts, lengths)

frequency

Frequency distribution metrics

feature_saturation

Per-feature saturation metrics

terrain

Phonaesthetic terrain metrics (derived from features)

inventory: InventoryMetrics
frequency: FrequencyMetrics
feature_saturation: FeatureSaturationMetrics
terrain: TerrainMetrics
build_tools.syllable_walk_tui.services.metrics.compute_corpus_shape_metrics(syllables, frequencies, annotated_data)[source]

Compute complete corpus shape metrics.

This is the main entry point for corpus analysis. Computes all metric categories and returns a composite result.

Parameters:
Returns:

CorpusShapeMetrics containing all computed metrics

Raises:

ValueError – If any input is empty or malformed

Return type:

CorpusShapeMetrics

Example

>>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data)
>>> print(f"Corpus has {metrics.inventory.total_count} syllables")
>>> print(f"Hapax legomena: {metrics.frequency.hapax_count}")
>>> vowel_pct = metrics.feature_saturation.by_name['starts_with_vowel'].true_percentage
>>> print(f"Starts with vowel: {vowel_pct:.1f}%")
>>> print(f"Terrain: {metrics.terrain.shape_label}")