build_tools.name_combiner.aggregator
====================================

.. py:module:: build_tools.name_combiner.aggregator

.. autoapi-nested-parse::

   Feature aggregation for name-level evaluation.

   This module implements the rules for aggregating syllable-level features
   into name-level features. The aggregation produces a boolean feature vector
   for each name candidate, enabling policy evaluation by the name_selector.

   Aggregation Rules
   -----------------
   **Onset Features** (first syllable only):
       - starts_with_vowel
       - starts_with_cluster
       - starts_with_heavy_cluster

       These features describe how a name begins. Only the first syllable's
       onset features are relevant - internal syllable onsets don't affect
       how the name "enters" the listener's ear.

   **Coda Features** (final syllable only):
       - ends_with_vowel
       - ends_with_nasal
       - ends_with_stop

       These features describe how a name ends. Only the final syllable's
       coda features are relevant - internal syllable codas don't affect
       how the name "lands" or closes.

   **Internal Features** (OR across all syllables):
       - contains_plosive
       - contains_fricative
       - contains_liquid
       - contains_nasal

       These features describe the texture of a name. If ANY syllable contains
       the feature, the name has it. A name like "kalira" contains_liquid=True
       because "li" has a liquid, even though "ka" and "ra" might not.

   **Nucleus Features** (majority rule):
       - short_vowel
       - long_vowel

       These features describe the dominant vowel character of a name.
       We use majority rule (>50% of syllables) to determine the name-level
       value. See the module docstring for detailed rationale.

   Why Majority Rule for Nucleus Features
   --------------------------------------
   We use majority (>50% of syllables) rather than proportional scoring.

   1. **Preserves Architectural Consistency**: The entire feature registry is
      built on boolean features. The policy matrix uses checkmark/tilde/cross
      symbols that map cleanly to boolean logic. Introducing fractional
      features would break this elegant simplicity.

   2. **Keeps the Implementation Simple**: Majority rule means the name-level
      feature vector remains a simple boolean array, identical in structure
      to syllable-level vectors. No special cases, no type conversions.

   3. **Sufficient for Initial Policy Evaluation**: For a first implementation,
      knowing "this name is mostly short-vowel" vs. "this name is mostly
      long-vowel" is enough information to make good selection decisions.
      Precise ratios are not needed yet.

   4. **Easier to Debug and Explain**: When a name gets rejected, you can say
      "this name has short_vowel=true (2 of 3 syllables), which is discouraged
      for Place Names." That's clear and inspectable. Proportional scoring
      makes debugging harder.

   5. **Aligns with Project Philosophy**: The system is about shape and
      suitability, not precise optimization. Majority rule captures the
      dominant character of a name, which is what matters for admissibility.

   Future Consideration
   --------------------
   If finer-grained nucleus control is needed, proportional scoring could be
   introduced as an optional mode. This would require extending the policy
   matrix to handle float thresholds (e.g., short_vowel > 0.6). For now,
   majority rule provides the right balance of simplicity and expressiveness.

   Usage
   -----
   >>> from build_tools.name_combiner.aggregator import aggregate_features
   >>> syllables = [
   ...     {"syllable": "ka", "features": {"starts_with_vowel": False, ...}},
   ...     {"syllable": "li", "features": {"contains_liquid": True, ...}},
   ... ]
   >>> name_features = aggregate_features(syllables)
   >>> name_features["starts_with_vowel"]  # From first syllable
   False
   >>> name_features["contains_liquid"]  # OR across all
   True


Attributes
----------

.. autoapisummary::

   build_tools.name_combiner.aggregator.ONSET_FEATURES
   build_tools.name_combiner.aggregator.CODA_FEATURES
   build_tools.name_combiner.aggregator.INTERNAL_FEATURES
   build_tools.name_combiner.aggregator.NUCLEUS_FEATURES
   build_tools.name_combiner.aggregator.ALL_FEATURES


Functions
---------

.. autoapisummary::

   build_tools.name_combiner.aggregator.aggregate_features


Module Contents
---------------

.. py:data:: ONSET_FEATURES

.. py:data:: CODA_FEATURES

.. py:data:: INTERNAL_FEATURES

.. py:data:: NUCLEUS_FEATURES

.. py:data:: ALL_FEATURES
   :value: ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...


.. py:function:: aggregate_features(syllables)

   Aggregate syllable-level features into a name-level feature vector.

   Takes a sequence of syllable dictionaries (each with a "features" key)
   and produces a single boolean feature vector for the combined name.

   Parameters
   ----------
   syllables : Sequence[dict]
       List of syllable dictionaries, each containing:
       - "syllable": str - The syllable text
       - "features": dict[str, bool] - The 12 boolean features

   Returns
   -------
   dict[str, bool]
       Name-level feature vector with all 12 features as booleans.

   Raises
   ------
   ValueError
       If syllables list is empty or missing required keys.

   Examples
   --------
   >>> syllables = [
   ...     {"syllable": "ka", "features": {
   ...         "starts_with_vowel": False,
   ...         "starts_with_cluster": False,
   ...         "starts_with_heavy_cluster": False,
   ...         "contains_plosive": True,
   ...         "contains_fricative": False,
   ...         "contains_liquid": False,
   ...         "contains_nasal": False,
   ...         "short_vowel": True,
   ...         "long_vowel": False,
   ...         "ends_with_vowel": True,
   ...         "ends_with_nasal": False,
   ...         "ends_with_stop": False,
   ...     }},
   ...     {"syllable": "li", "features": {
   ...         "starts_with_vowel": False,
   ...         "starts_with_cluster": False,
   ...         "starts_with_heavy_cluster": False,
   ...         "contains_plosive": False,
   ...         "contains_fricative": False,
   ...         "contains_liquid": True,
   ...         "contains_nasal": False,
   ...         "short_vowel": True,
   ...         "long_vowel": False,
   ...         "ends_with_vowel": True,
   ...         "ends_with_nasal": False,
   ...         "ends_with_stop": False,
   ...     }},
   ... ]
   >>> features = aggregate_features(syllables)
   >>> features["starts_with_vowel"]  # From first syllable ("ka")
   False
   >>> features["ends_with_vowel"]  # From final syllable ("li")
   True
   >>> features["contains_liquid"]  # OR: True because "li" has it
   True
   >>> features["short_vowel"]  # Majority: 2/2 = 100% > 50%
   True

   Notes
   -----
   Aggregation follows these rules:

   - **Onset** (starts_with_*): First syllable only
   - **Coda** (ends_with_*): Final syllable only
   - **Internal** (contains_*): OR across all syllables
   - **Nucleus** (short_vowel, long_vowel): Majority rule (>50%)

   See module docstring for detailed rationale on majority rule.