build_tools.syllable_walk_tui.services.metrics
==============================================

.. py:module:: build_tools.syllable_walk_tui.services.metrics

.. autoapi-nested-parse::

   Corpus shape metrics computation.

   This module provides dataclasses and pure functions for computing raw,
   objective metrics about corpus shape. These metrics characterize the
   statistical structure of a syllable corpus without interpretation.

   Design Philosophy:
       - Raw numbers only, no interpretation or judgment
       - Pure functions (no side effects, no I/O)
       - All metrics are observable facts about the corpus
       - Users draw their own conclusions from the data

   Metric Categories:
       - Inventory: What exists (counts, lengths)
       - Frequency: Weight distribution (how syllables are distributed)
       - Feature Saturation: Phonetic feature coverage (per-feature counts)

   Usage:
       >>> from build_tools.syllable_walk_tui.services.metrics import (
       ...     compute_corpus_shape_metrics
       ... )
       >>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data)
       >>> print(f"Total syllables: {metrics.inventory.total_count}")
       >>> print(f"Hapax count: {metrics.frequency.hapax_count}")


Attributes
----------

.. autoapisummary::

   build_tools.syllable_walk_tui.services.metrics.FEATURE_NAMES


Classes
-------

.. autoapisummary::

   build_tools.syllable_walk_tui.services.metrics.InventoryMetrics
   build_tools.syllable_walk_tui.services.metrics.FrequencyMetrics
   build_tools.syllable_walk_tui.services.metrics.FeatureSaturation
   build_tools.syllable_walk_tui.services.metrics.FeatureSaturationMetrics
   build_tools.syllable_walk_tui.services.metrics.PoleExemplars
   build_tools.syllable_walk_tui.services.metrics.TerrainMetrics
   build_tools.syllable_walk_tui.services.metrics.CorpusShapeMetrics


Functions
---------

.. autoapisummary::

   build_tools.syllable_walk_tui.services.metrics.compute_inventory_metrics
   build_tools.syllable_walk_tui.services.metrics.compute_frequency_metrics
   build_tools.syllable_walk_tui.services.metrics.compute_feature_saturation_metrics
   build_tools.syllable_walk_tui.services.metrics.score_syllable_on_axis
   build_tools.syllable_walk_tui.services.metrics.sample_pole_exemplars
   build_tools.syllable_walk_tui.services.metrics.compute_terrain_metrics
   build_tools.syllable_walk_tui.services.metrics.compute_corpus_shape_metrics


Module Contents
---------------

.. py:class:: InventoryMetrics

   Raw inventory metrics describing what exists in the corpus.

   All metrics are objective counts and statistics about syllable inventory.

   .. attribute:: total_count

      Total number of unique syllables

   .. attribute:: length_min

      Minimum syllable length (characters)

   .. attribute:: length_max

      Maximum syllable length (characters)

   .. attribute:: length_mean

      Mean syllable length

   .. attribute:: length_median

      Median syllable length

   .. attribute:: length_std

      Standard deviation of syllable lengths

   .. attribute:: length_distribution

      Count of syllables at each length {length: count}


   .. py:attribute:: total_count
      :type:  int


   .. py:attribute:: length_min
      :type:  int


   .. py:attribute:: length_max
      :type:  int


   .. py:attribute:: length_mean
      :type:  float


   .. py:attribute:: length_median
      :type:  float


   .. py:attribute:: length_std
      :type:  float


   .. py:attribute:: length_distribution
      :type:  dict[int, int]


.. py:function:: compute_inventory_metrics(syllables)

   Compute inventory metrics from a list of syllables.

   :param syllables: List of unique syllables

   :returns: InventoryMetrics with all computed values

   :raises ValueError: If syllables list is empty


.. py:class:: FrequencyMetrics

   Raw frequency distribution metrics.

   Describes how syllable occurrences are distributed across the corpus.

   .. attribute:: total_occurrences

      Sum of all frequency counts

   .. attribute:: freq_min

      Minimum frequency value

   .. attribute:: freq_max

      Maximum frequency value

   .. attribute:: freq_mean

      Mean frequency

   .. attribute:: freq_median

      Median frequency

   .. attribute:: freq_std

      Standard deviation of frequencies

   .. attribute:: percentile_10

      10th percentile frequency

   .. attribute:: percentile_25

      25th percentile frequency (Q1)

   .. attribute:: percentile_50

      50th percentile frequency (median)

   .. attribute:: percentile_75

      75th percentile frequency (Q3)

   .. attribute:: percentile_90

      90th percentile frequency

   .. attribute:: percentile_99

      99th percentile frequency

   .. attribute:: unique_freq_count

      Number of distinct frequency values

   .. attribute:: hapax_count

      Count of syllables appearing exactly once

   .. attribute:: top_10

      Top 10 syllables by frequency [(syllable, freq), ...]

   .. attribute:: bottom_10

      Bottom 10 syllables by frequency [(syllable, freq), ...]


   .. py:attribute:: total_occurrences
      :type:  int


   .. py:attribute:: freq_min
      :type:  int


   .. py:attribute:: freq_max
      :type:  int


   .. py:attribute:: freq_mean
      :type:  float


   .. py:attribute:: freq_median
      :type:  float


   .. py:attribute:: freq_std
      :type:  float


   .. py:attribute:: percentile_10
      :type:  int


   .. py:attribute:: percentile_25
      :type:  int


   .. py:attribute:: percentile_50
      :type:  int


   .. py:attribute:: percentile_75
      :type:  int


   .. py:attribute:: percentile_90
      :type:  int


   .. py:attribute:: percentile_99
      :type:  int


   .. py:attribute:: unique_freq_count
      :type:  int


   .. py:attribute:: hapax_count
      :type:  int


   .. py:attribute:: top_10
      :type:  tuple[tuple[str, int], Ellipsis]
      :value: ()


   .. py:attribute:: bottom_10
      :type:  tuple[tuple[str, int], Ellipsis]
      :value: ()


.. py:function:: compute_frequency_metrics(frequencies)

   Compute frequency distribution metrics.

   :param frequencies: Dictionary mapping syllable to frequency count

   :returns: FrequencyMetrics with all computed values

   :raises ValueError: If frequencies dict is empty


.. py:data:: FEATURE_NAMES
   :type:  tuple[str, Ellipsis]
   :value: ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...


.. py:class:: FeatureSaturation

   Saturation metrics for a single phonetic feature.

   .. attribute:: feature_name

      Name of the feature

   .. attribute:: true_count

      Number of syllables with feature = True

   .. attribute:: false_count

      Number of syllables with feature = False

   .. attribute:: true_percentage

      Percentage of corpus with feature = True


   .. py:attribute:: feature_name
      :type:  str


   .. py:attribute:: true_count
      :type:  int


   .. py:attribute:: false_count
      :type:  int


   .. py:attribute:: true_percentage
      :type:  float


.. py:class:: FeatureSaturationMetrics

   Feature saturation metrics for all 12 phonetic features.

   .. attribute:: total_syllables

      Total syllables analyzed

   .. attribute:: features

      Tuple of FeatureSaturation for each feature (in canonical order)

   .. attribute:: by_name

      Dict mapping feature name to FeatureSaturation (for lookup)


   .. py:attribute:: total_syllables
      :type:  int


   .. py:attribute:: features
      :type:  tuple[FeatureSaturation, Ellipsis]
      :value: ()


   .. py:attribute:: by_name
      :type:  dict[str, FeatureSaturation]


.. py:function:: compute_feature_saturation_metrics(annotated_data)

   Compute feature saturation metrics from annotated syllable data.

   :param annotated_data: List of dicts with 'syllable', 'frequency', 'features' keys

   :returns: FeatureSaturationMetrics with per-feature saturation counts

   :raises ValueError: If annotated_data is empty or malformed


.. py:class:: PoleExemplars

   Exemplar syllables from each pole of a terrain axis.

   These concrete examples help users understand what syllables
   represent each end of the phonaesthetic spectrum.

   .. attribute:: axis_name

      Name of the axis ("shape", "craft", or "space")

   .. attribute:: low_pole_exemplars

      Syllables from the low pole (Round/Flowing/Open)

   .. attribute:: high_pole_exemplars

      Syllables from the high pole (Jagged/Worked/Dense)


   .. py:attribute:: axis_name
      :type:  str


   .. py:attribute:: low_pole_exemplars
      :type:  tuple[str, Ellipsis]


   .. py:attribute:: high_pole_exemplars
      :type:  tuple[str, Ellipsis]


.. py:class:: TerrainMetrics

   Phonaesthetic terrain metrics describing corpus character.

   Three axes derived from feature saturation percentages:
   - Shape: Round (0.0) ↔ Jagged (1.0) - Bouba/Kiki dimension
   - Craft: Flowing (0.0) ↔ Worked (1.0) - Sung/Forged dimension
   - Space: Open (0.0) ↔ Dense (1.0) - Valley/Workshop dimension

   Scores are normalized to 0.0-1.0 range where 0.5 is neutral.

   .. attribute:: shape_score

      Position on Round↔Jagged axis (0.0-1.0)

   .. attribute:: craft_score

      Position on Flowing↔Worked axis (0.0-1.0)

   .. attribute:: space_score

      Position on Open↔Dense axis (0.0-1.0)

   .. attribute:: shape_label

      Human-readable label for shape position

   .. attribute:: craft_label

      Human-readable label for craft position

   .. attribute:: space_label

      Human-readable label for space position

   .. attribute:: shape_exemplars

      Optional exemplar syllables for shape axis

   .. attribute:: craft_exemplars

      Optional exemplar syllables for craft axis

   .. attribute:: space_exemplars

      Optional exemplar syllables for space axis


   .. py:attribute:: shape_score
      :type:  float


   .. py:attribute:: craft_score
      :type:  float


   .. py:attribute:: space_score
      :type:  float


   .. py:attribute:: shape_label
      :type:  str


   .. py:attribute:: craft_label
      :type:  str


   .. py:attribute:: space_label
      :type:  str


   .. py:attribute:: shape_exemplars
      :type:  PoleExemplars | None
      :value: None


   .. py:attribute:: craft_exemplars
      :type:  PoleExemplars | None
      :value: None


   .. py:attribute:: space_exemplars
      :type:  PoleExemplars | None
      :value: None


.. py:function:: score_syllable_on_axis(features, axis_weights)

   Compute axis score for a single syllable from its boolean features.

   Unlike _compute_axis_score() which uses corpus percentages, this uses
   binary features (0 or 1) to rank individual syllables.

   :param features: Dictionary of feature_name -> boolean
   :param axis_weights: AxisWeights containing feature-to-weight mappings

   :returns: Raw weighted sum (not normalized). Higher = more toward high pole.


.. py:function:: sample_pole_exemplars(annotated_data, axis_weights, axis_name, n_exemplars = 3, rng = None)

   Sample exemplar syllables from each pole of an axis.

   Scores all syllables in the corpus and samples from the low and high
   tails to provide concrete examples of syllables at each pole.

   :param annotated_data: List of {"syllable": str, "features": dict} entries
   :param axis_weights: Weights for the axis
   :param axis_name: Name of axis ("shape", "craft", "space")
   :param n_exemplars: Number of exemplars per pole (default 3)
   :param rng: Optional RNG for shuffling within tails (isolated from generation)

   :returns: PoleExemplars with syllables from low and high poles


.. py:function:: compute_terrain_metrics(feature_saturation, weights = None, annotated_data = None, exemplar_rng = None, n_exemplars = 3)

   Compute phonaesthetic terrain metrics from feature saturation.

   Derives three axis scores representing the corpus's position in
   phonaesthetic space. These are descriptive, not prescriptive -
   they characterize the acoustic terrain without imposing meaning.

   :param feature_saturation: Computed feature saturation metrics
   :param weights: Optional TerrainWeights configuration. If None, uses
                   DEFAULT_TERRAIN_WEIGHTS from terrain_weights module.
                   Custom weights allow calibration for different phonaesthetic
                   models or user preferences.
   :param annotated_data: Optional list of {"syllable": str, "features": dict}
                          entries. If provided, pole exemplars will be computed.
   :param exemplar_rng: Optional RNG for shuffling exemplars. Isolated from
                        name generation to maintain determinism.
   :param n_exemplars: Number of exemplars per pole (default 3)

   :returns: TerrainMetrics with scores and labels for all three axes

   .. admonition:: Example

      >>> terrain = compute_terrain_metrics(feature_saturation)
      >>> print(f"Shape: {terrain.shape_score:.2f} ({terrain.shape_label})")
      >>> print(f"Craft: {terrain.craft_score:.2f} ({terrain.craft_label})")
      
      # With custom weights:
      >>> from build_tools.syllable_walk_tui.services.terrain_weights import (
      ...     TerrainWeights, AxisWeights
      ... )
      >>> custom = TerrainWeights(shape=AxisWeights({"contains_plosive": 1.5}))
      >>> terrain = compute_terrain_metrics(feature_saturation, weights=custom)
      
      # With exemplars:
      >>> terrain = compute_terrain_metrics(
      ...     feature_saturation, annotated_data=corpus_data
      ... )
      >>> print(terrain.shape_exemplars.low_pole_exemplars)


.. py:class:: CorpusShapeMetrics

   Complete corpus shape metrics combining all categories.

   This is the primary interface for corpus analysis. Contains all raw
   metrics needed to understand corpus structure.

   .. attribute:: inventory

      Inventory metrics (counts, lengths)

   .. attribute:: frequency

      Frequency distribution metrics

   .. attribute:: feature_saturation

      Per-feature saturation metrics

   .. attribute:: terrain

      Phonaesthetic terrain metrics (derived from features)


   .. py:attribute:: inventory
      :type:  InventoryMetrics


   .. py:attribute:: frequency
      :type:  FrequencyMetrics


   .. py:attribute:: feature_saturation
      :type:  FeatureSaturationMetrics


   .. py:attribute:: terrain
      :type:  TerrainMetrics


.. py:function:: compute_corpus_shape_metrics(syllables, frequencies, annotated_data)

   Compute complete corpus shape metrics.

   This is the main entry point for corpus analysis. Computes all metric
   categories and returns a composite result.

   :param syllables: List of unique syllables
   :param frequencies: Dictionary mapping syllable to frequency count
   :param annotated_data: List of annotated syllable dicts

   :returns: CorpusShapeMetrics containing all computed metrics

   :raises ValueError: If any input is empty or malformed

   .. admonition:: Example

      >>> metrics = compute_corpus_shape_metrics(syllables, frequencies, annotated_data)
      >>> print(f"Corpus has {metrics.inventory.total_count} syllables")
      >>> print(f"Hapax legomena: {metrics.frequency.hapax_count}")
      >>> vowel_pct = metrics.feature_saturation.by_name['starts_with_vowel'].true_percentage
      >>> print(f"Starts with vowel: {vowel_pct:.1f}%")
      >>> print(f"Terrain: {metrics.terrain.shape_label}")