build_tools.name_combiner.combiner
==================================

.. py:module:: build_tools.name_combiner.combiner

.. autoapi-nested-parse::

   Core combination logic for name candidate generation.

   This module provides the main combination functionality that takes an
   annotated syllable corpus and produces N-syllable name candidates with
   aggregated feature vectors.

   The combiner is intentionally simple - it performs structural combination
   without any policy evaluation. Policy-based filtering is the responsibility
   of the name_selector module.

   Combination Strategy
   --------------------
   The default combination strategy uses frequency-weighted random sampling:

   1. Load annotated syllables with their frequencies
   2. Build a weighted probability distribution (higher frequency = more likely)
   3. Sample N syllables using the isolated RNG instance
   4. Concatenate syllables to form a name
   5. Aggregate features using the rules in aggregator.py

   This produces candidates that reflect the natural distribution of the corpus
   while maintaining full determinism through seed control.

   Determinism
   -----------
   **Critical**: All combination uses `random.Random(seed)` to create isolated
   RNG instances. This ensures:

   - Same seed always produces identical candidates
   - No global state contamination
   - Reproducible builds across sessions

   Usage
   -----
   >>> from build_tools.name_combiner.combiner import combine_syllables
   >>> candidates = combine_syllables(
   ...     annotated_data=corpus,
   ...     syllable_count=2,
   ...     count=100,
   ...     seed=42,
   ... )
   >>> for c in candidates[:3]:
   ...     print(f"{c['name']}: score-ready features")


Functions
---------

.. autoapisummary::

   build_tools.name_combiner.combiner.combine_syllables


Module Contents
---------------

.. py:function:: combine_syllables(annotated_data, syllable_count, count, seed = None, frequency_weight = 1.0)

   Generate name candidates by combining syllables from an annotated corpus.

   Takes an annotated syllable corpus and produces N-syllable name candidates
   with aggregated feature vectors suitable for policy evaluation.

   Parameters
   ----------
   annotated_data : Sequence[dict]
       List of annotated syllable dictionaries, each containing:
       - "syllable": str - The syllable text
       - "frequency": int - Occurrence count in source corpus
       - "features": dict[str, bool] - The 12 boolean features

   syllable_count : int
       Number of syllables per generated name (typically 2, 3, or 4).

   count : int
       Number of candidates to generate.

   seed : int | None, optional
       RNG seed for deterministic output. If None, uses system entropy.
       Default: None.

   frequency_weight : float, optional
       Weight for frequency-biased sampling. 0.0 = uniform sampling,
       1.0 = fully frequency-weighted. Values between 0 and 1 interpolate.
       Default: 1.0.

   Returns
   -------
   list[dict]
       List of candidate dictionaries, each containing:
       - "name": str - The combined name (concatenated syllables)
       - "syllables": list[str] - The constituent syllables
       - "features": dict[str, bool] - Aggregated name-level features

   Raises
   ------
   ValueError
       If annotated_data is empty or syllable_count < 1.

   Examples
   --------
   >>> corpus = [
   ...     {"syllable": "ka", "frequency": 100, "features": {...}},
   ...     {"syllable": "li", "frequency": 50, "features": {...}},
   ...     {"syllable": "ra", "frequency": 75, "features": {...}},
   ... ]
   >>> candidates = combine_syllables(corpus, syllable_count=2, count=5, seed=42)
   >>> len(candidates)
   5
   >>> candidates[0]["name"]  # Deterministic with seed=42
   'kali'  # Example output
   >>> candidates[0]["syllables"]
   ['ka', 'li']

   Notes
   -----
   **Determinism**: Uses `random.Random(seed)` for isolated RNG. Same seed
   always produces identical output.

   **Frequency Weighting**: Higher frequency syllables are more likely to
   be sampled. This reflects the natural distribution of the source corpus
   and tends to produce more "natural-sounding" combinations.

   **No Policy Evaluation**: This function performs structural combination
   only. Policy-based filtering is done by the name_selector module.