build_tools.name_combiner.combiner
Core combination logic for name candidate generation.
This module provides the main combination functionality that takes an annotated syllable corpus and produces N-syllable name candidates with aggregated feature vectors.
The combiner is intentionally simple - it performs structural combination without any policy evaluation. Policy-based filtering is the responsibility of the name_selector module.
Combination Strategy
The default combination strategy uses frequency-weighted random sampling:
Load annotated syllables with their frequencies
Build a weighted probability distribution (higher frequency = more likely)
Sample N syllables using the isolated RNG instance
Concatenate syllables to form a name
Aggregate features using the rules in aggregator.py
This produces candidates that reflect the natural distribution of the corpus while maintaining full determinism through seed control.
Determinism
Critical: All combination uses random.Random(seed) to create isolated RNG instances. This ensures:
Same seed always produces identical candidates
No global state contamination
Reproducible builds across sessions
Usage
>>> from build_tools.name_combiner.combiner import combine_syllables
>>> candidates = combine_syllables(
... annotated_data=corpus,
... syllable_count=2,
... count=100,
... seed=42,
... )
>>> for c in candidates[:3]:
... print(f"{c['name']}: score-ready features")
Functions
|
Generate name candidates by combining syllables from an annotated corpus. |
Module Contents
- build_tools.name_combiner.combiner.combine_syllables(annotated_data, syllable_count, count, seed=None, frequency_weight=1.0)[source]
Generate name candidates by combining syllables from an annotated corpus.
Takes an annotated syllable corpus and produces N-syllable name candidates with aggregated feature vectors suitable for policy evaluation.
Parameters
- annotated_dataSequence[dict]
List of annotated syllable dictionaries, each containing: - “syllable”: str - The syllable text - “frequency”: int - Occurrence count in source corpus - “features”: dict[str, bool] - The 12 boolean features
- syllable_countint
Number of syllables per generated name (typically 2, 3, or 4).
- countint
Number of candidates to generate.
- seedint | None, optional
RNG seed for deterministic output. If None, uses system entropy. Default: None.
- frequency_weightfloat, optional
Weight for frequency-biased sampling. 0.0 = uniform sampling, 1.0 = fully frequency-weighted. Values between 0 and 1 interpolate. Default: 1.0.
Returns
- list[dict]
List of candidate dictionaries, each containing: - “name”: str - The combined name (concatenated syllables) - “syllables”: list[str] - The constituent syllables - “features”: dict[str, bool] - Aggregated name-level features
Raises
- ValueError
If annotated_data is empty or syllable_count < 1.
Examples
>>> corpus = [ ... {"syllable": "ka", "frequency": 100, "features": {...}}, ... {"syllable": "li", "frequency": 50, "features": {...}}, ... {"syllable": "ra", "frequency": 75, "features": {...}}, ... ] >>> candidates = combine_syllables(corpus, syllable_count=2, count=5, seed=42) >>> len(candidates) 5 >>> candidates[0]["name"] # Deterministic with seed=42 'kali' # Example output >>> candidates[0]["syllables"] ['ka', 'li']
Notes
Determinism: Uses random.Random(seed) for isolated RNG. Same seed always produces identical output.
Frequency Weighting: Higher frequency syllables are more likely to be sampled. This reflects the natural distribution of the source corpus and tends to produce more “natural-sounding” combinations.
No Policy Evaluation: This function performs structural combination only. Policy-based filtering is done by the name_selector module.