build_tools.syllable_walk_tui.services.metrics ============================================== .. py:module:: build_tools.syllable_walk_tui.services.metrics .. autoapi-nested-parse:: Corpus shape metrics computation. This module provides dataclasses and pure functions for computing raw, objective metrics about corpus shape. These metrics characterize the statistical structure of a syllable corpus without interpretation. Design Philosophy: - Raw numbers only, no interpretation or judgment - Pure functions (no side effects, no I/O) - All metrics are observable facts about the corpus - Users draw their own conclusions from the data Metric Categories: - Inventory: What exists (counts, lengths) - Frequency: Weight distribution (how syllables are distributed) - Feature Saturation: Phonetic feature coverage (per-feature counts) Usage: >>> from build_tools.syllable_walk_tui.services.metrics import ( ... compute_corpus_shape_metrics ... ) >>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data) >>> print(f"Total syllables: {metrics.inventory.total_count}") >>> print(f"Hapax count: {metrics.frequency.hapax_count}") Attributes ---------- .. autoapisummary:: build_tools.syllable_walk_tui.services.metrics.FEATURE_NAMES Classes ------- .. autoapisummary:: build_tools.syllable_walk_tui.services.metrics.InventoryMetrics build_tools.syllable_walk_tui.services.metrics.FrequencyMetrics build_tools.syllable_walk_tui.services.metrics.FeatureSaturation build_tools.syllable_walk_tui.services.metrics.FeatureSaturationMetrics build_tools.syllable_walk_tui.services.metrics.PoleExemplars build_tools.syllable_walk_tui.services.metrics.TerrainMetrics build_tools.syllable_walk_tui.services.metrics.CorpusShapeMetrics Functions --------- .. autoapisummary:: build_tools.syllable_walk_tui.services.metrics.compute_inventory_metrics build_tools.syllable_walk_tui.services.metrics.compute_frequency_metrics build_tools.syllable_walk_tui.services.metrics.compute_feature_saturation_metrics build_tools.syllable_walk_tui.services.metrics.score_syllable_on_axis build_tools.syllable_walk_tui.services.metrics.sample_pole_exemplars build_tools.syllable_walk_tui.services.metrics.compute_terrain_metrics build_tools.syllable_walk_tui.services.metrics.compute_corpus_shape_metrics Module Contents --------------- .. py:class:: InventoryMetrics Raw inventory metrics describing what exists in the corpus. All metrics are objective counts and statistics about syllable inventory. .. attribute:: total_count Total number of unique syllables .. attribute:: length_min Minimum syllable length (characters) .. attribute:: length_max Maximum syllable length (characters) .. attribute:: length_mean Mean syllable length .. attribute:: length_median Median syllable length .. attribute:: length_std Standard deviation of syllable lengths .. attribute:: length_distribution Count of syllables at each length {length: count} .. py:attribute:: total_count :type: int .. py:attribute:: length_min :type: int .. py:attribute:: length_max :type: int .. py:attribute:: length_mean :type: float .. py:attribute:: length_median :type: float .. py:attribute:: length_std :type: float .. py:attribute:: length_distribution :type: dict[int, int] .. py:function:: compute_inventory_metrics(syllables) Compute inventory metrics from a list of syllables. :param syllables: List of unique syllables :returns: InventoryMetrics with all computed values :raises ValueError: If syllables list is empty .. py:class:: FrequencyMetrics Raw frequency distribution metrics. Describes how syllable occurrences are distributed across the corpus. .. attribute:: total_occurrences Sum of all frequency counts .. attribute:: freq_min Minimum frequency value .. attribute:: freq_max Maximum frequency value .. attribute:: freq_mean Mean frequency .. attribute:: freq_median Median frequency .. attribute:: freq_std Standard deviation of frequencies .. attribute:: percentile_10 10th percentile frequency .. attribute:: percentile_25 25th percentile frequency (Q1) .. attribute:: percentile_50 50th percentile frequency (median) .. attribute:: percentile_75 75th percentile frequency (Q3) .. attribute:: percentile_90 90th percentile frequency .. attribute:: percentile_99 99th percentile frequency .. attribute:: unique_freq_count Number of distinct frequency values .. attribute:: hapax_count Count of syllables appearing exactly once .. attribute:: top_10 Top 10 syllables by frequency [(syllable, freq), ...] .. attribute:: bottom_10 Bottom 10 syllables by frequency [(syllable, freq), ...] .. py:attribute:: total_occurrences :type: int .. py:attribute:: freq_min :type: int .. py:attribute:: freq_max :type: int .. py:attribute:: freq_mean :type: float .. py:attribute:: freq_median :type: float .. py:attribute:: freq_std :type: float .. py:attribute:: percentile_10 :type: int .. py:attribute:: percentile_25 :type: int .. py:attribute:: percentile_50 :type: int .. py:attribute:: percentile_75 :type: int .. py:attribute:: percentile_90 :type: int .. py:attribute:: percentile_99 :type: int .. py:attribute:: unique_freq_count :type: int .. py:attribute:: hapax_count :type: int .. py:attribute:: top_10 :type: tuple[tuple[str, int], Ellipsis] :value: () .. py:attribute:: bottom_10 :type: tuple[tuple[str, int], Ellipsis] :value: () .. py:function:: compute_frequency_metrics(frequencies) Compute frequency distribution metrics. :param frequencies: Dictionary mapping syllable to frequency count :returns: FrequencyMetrics with all computed values :raises ValueError: If frequencies dict is empty .. py:data:: FEATURE_NAMES :type: tuple[str, Ellipsis] :value: ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',... .. py:class:: FeatureSaturation Saturation metrics for a single phonetic feature. .. attribute:: feature_name Name of the feature .. attribute:: true_count Number of syllables with feature = True .. attribute:: false_count Number of syllables with feature = False .. attribute:: true_percentage Percentage of corpus with feature = True .. py:attribute:: feature_name :type: str .. py:attribute:: true_count :type: int .. py:attribute:: false_count :type: int .. py:attribute:: true_percentage :type: float .. py:class:: FeatureSaturationMetrics Feature saturation metrics for all 12 phonetic features. .. attribute:: total_syllables Total syllables analyzed .. attribute:: features Tuple of FeatureSaturation for each feature (in canonical order) .. attribute:: by_name Dict mapping feature name to FeatureSaturation (for lookup) .. py:attribute:: total_syllables :type: int .. py:attribute:: features :type: tuple[FeatureSaturation, Ellipsis] :value: () .. py:attribute:: by_name :type: dict[str, FeatureSaturation] .. py:function:: compute_feature_saturation_metrics(annotated_data) Compute feature saturation metrics from annotated syllable data. :param annotated_data: List of dicts with 'syllable', 'frequency', 'features' keys :returns: FeatureSaturationMetrics with per-feature saturation counts :raises ValueError: If annotated_data is empty or malformed .. py:class:: PoleExemplars Exemplar syllables from each pole of a terrain axis. These concrete examples help users understand what syllables represent each end of the phonaesthetic spectrum. .. attribute:: axis_name Name of the axis ("shape", "craft", or "space") .. attribute:: low_pole_exemplars Syllables from the low pole (Round/Flowing/Open) .. attribute:: high_pole_exemplars Syllables from the high pole (Jagged/Worked/Dense) .. py:attribute:: axis_name :type: str .. py:attribute:: low_pole_exemplars :type: tuple[str, Ellipsis] .. py:attribute:: high_pole_exemplars :type: tuple[str, Ellipsis] .. py:class:: TerrainMetrics Phonaesthetic terrain metrics describing corpus character. Three axes derived from feature saturation percentages: - Shape: Round (0.0) ↔ Jagged (1.0) - Bouba/Kiki dimension - Craft: Flowing (0.0) ↔ Worked (1.0) - Sung/Forged dimension - Space: Open (0.0) ↔ Dense (1.0) - Valley/Workshop dimension Scores are normalized to 0.0-1.0 range where 0.5 is neutral. .. attribute:: shape_score Position on Round↔Jagged axis (0.0-1.0) .. attribute:: craft_score Position on Flowing↔Worked axis (0.0-1.0) .. attribute:: space_score Position on Open↔Dense axis (0.0-1.0) .. attribute:: shape_label Human-readable label for shape position .. attribute:: craft_label Human-readable label for craft position .. attribute:: space_label Human-readable label for space position .. attribute:: shape_exemplars Optional exemplar syllables for shape axis .. attribute:: craft_exemplars Optional exemplar syllables for craft axis .. attribute:: space_exemplars Optional exemplar syllables for space axis .. py:attribute:: shape_score :type: float .. py:attribute:: craft_score :type: float .. py:attribute:: space_score :type: float .. py:attribute:: shape_label :type: str .. py:attribute:: craft_label :type: str .. py:attribute:: space_label :type: str .. py:attribute:: shape_exemplars :type: PoleExemplars | None :value: None .. py:attribute:: craft_exemplars :type: PoleExemplars | None :value: None .. py:attribute:: space_exemplars :type: PoleExemplars | None :value: None .. py:function:: score_syllable_on_axis(features, axis_weights) Compute axis score for a single syllable from its boolean features. Unlike _compute_axis_score() which uses corpus percentages, this uses binary features (0 or 1) to rank individual syllables. :param features: Dictionary of feature_name -> boolean :param axis_weights: AxisWeights containing feature-to-weight mappings :returns: Raw weighted sum (not normalized). Higher = more toward high pole. .. py:function:: sample_pole_exemplars(annotated_data, axis_weights, axis_name, n_exemplars = 3, rng = None) Sample exemplar syllables from each pole of an axis. Scores all syllables in the corpus and samples from the low and high tails to provide concrete examples of syllables at each pole. :param annotated_data: List of {"syllable": str, "features": dict} entries :param axis_weights: Weights for the axis :param axis_name: Name of axis ("shape", "craft", "space") :param n_exemplars: Number of exemplars per pole (default 3) :param rng: Optional RNG for shuffling within tails (isolated from generation) :returns: PoleExemplars with syllables from low and high poles .. py:function:: compute_terrain_metrics(feature_saturation, weights = None, annotated_data = None, exemplar_rng = None, n_exemplars = 3) Compute phonaesthetic terrain metrics from feature saturation. Derives three axis scores representing the corpus's position in phonaesthetic space. These are descriptive, not prescriptive - they characterize the acoustic terrain without imposing meaning. :param feature_saturation: Computed feature saturation metrics :param weights: Optional TerrainWeights configuration. If None, uses DEFAULT_TERRAIN_WEIGHTS from terrain_weights module. Custom weights allow calibration for different phonaesthetic models or user preferences. :param annotated_data: Optional list of {"syllable": str, "features": dict} entries. If provided, pole exemplars will be computed. :param exemplar_rng: Optional RNG for shuffling exemplars. Isolated from name generation to maintain determinism. :param n_exemplars: Number of exemplars per pole (default 3) :returns: TerrainMetrics with scores and labels for all three axes .. admonition:: Example >>> terrain = compute_terrain_metrics(feature_saturation) >>> print(f"Shape: {terrain.shape_score:.2f} ({terrain.shape_label})") >>> print(f"Craft: {terrain.craft_score:.2f} ({terrain.craft_label})") # With custom weights: >>> from build_tools.syllable_walk_tui.services.terrain_weights import ( ... TerrainWeights, AxisWeights ... ) >>> custom = TerrainWeights(shape=AxisWeights({"contains_plosive": 1.5})) >>> terrain = compute_terrain_metrics(feature_saturation, weights=custom) # With exemplars: >>> terrain = compute_terrain_metrics( ... feature_saturation, annotated_data=corpus_data ... ) >>> print(terrain.shape_exemplars.low_pole_exemplars) .. py:class:: CorpusShapeMetrics Complete corpus shape metrics combining all categories. This is the primary interface for corpus analysis. Contains all raw metrics needed to understand corpus structure. .. attribute:: inventory Inventory metrics (counts, lengths) .. attribute:: frequency Frequency distribution metrics .. attribute:: feature_saturation Per-feature saturation metrics .. attribute:: terrain Phonaesthetic terrain metrics (derived from features) .. py:attribute:: inventory :type: InventoryMetrics .. py:attribute:: frequency :type: FrequencyMetrics .. py:attribute:: feature_saturation :type: FeatureSaturationMetrics .. py:attribute:: terrain :type: TerrainMetrics .. py:function:: compute_corpus_shape_metrics(syllables, frequencies, annotated_data) Compute complete corpus shape metrics. This is the main entry point for corpus analysis. Computes all metric categories and returns a composite result. :param syllables: List of unique syllables :param frequencies: Dictionary mapping syllable to frequency count :param annotated_data: List of annotated syllable dicts :returns: CorpusShapeMetrics containing all computed metrics :raises ValueError: If any input is empty or malformed .. admonition:: Example >>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data) >>> print(f"Corpus has {metrics.inventory.total_count} syllables") >>> print(f"Hapax legomena: {metrics.frequency.hapax_count}") >>> vowel_pct = metrics.feature_saturation.by_name['starts_with_vowel'].true_percentage >>> print(f"Starts with vowel: {vowel_pct:.1f}%") >>> print(f"Terrain: {metrics.terrain.shape_label}")