build_tools.name_combiner.aggregator
Feature aggregation for name-level evaluation.
This module implements the rules for aggregating syllable-level features into name-level features. The aggregation produces a boolean feature vector for each name candidate, enabling policy evaluation by the name_selector.
Aggregation Rules
- Onset Features (first syllable only):
starts_with_vowel
starts_with_cluster
starts_with_heavy_cluster
These features describe how a name begins. Only the first syllable’s onset features are relevant - internal syllable onsets don’t affect how the name “enters” the listener’s ear.
- Coda Features (final syllable only):
ends_with_vowel
ends_with_nasal
ends_with_stop
These features describe how a name ends. Only the final syllable’s coda features are relevant - internal syllable codas don’t affect how the name “lands” or closes.
- Internal Features (OR across all syllables):
contains_plosive
contains_fricative
contains_liquid
contains_nasal
These features describe the texture of a name. If ANY syllable contains the feature, the name has it. A name like “kalira” contains_liquid=True because “li” has a liquid, even though “ka” and “ra” might not.
- Nucleus Features (majority rule):
short_vowel
long_vowel
These features describe the dominant vowel character of a name. We use majority rule (>50% of syllables) to determine the name-level value. See the module docstring for detailed rationale.
Why Majority Rule for Nucleus Features
We use majority (>50% of syllables) rather than proportional scoring.
Preserves Architectural Consistency: The entire feature registry is built on boolean features. The policy matrix uses checkmark/tilde/cross symbols that map cleanly to boolean logic. Introducing fractional features would break this elegant simplicity.
Keeps the Implementation Simple: Majority rule means the name-level feature vector remains a simple boolean array, identical in structure to syllable-level vectors. No special cases, no type conversions.
Sufficient for Initial Policy Evaluation: For a first implementation, knowing “this name is mostly short-vowel” vs. “this name is mostly long-vowel” is enough information to make good selection decisions. Precise ratios are not needed yet.
Easier to Debug and Explain: When a name gets rejected, you can say “this name has short_vowel=true (2 of 3 syllables), which is discouraged for Place Names.” That’s clear and inspectable. Proportional scoring makes debugging harder.
Aligns with Project Philosophy: The system is about shape and suitability, not precise optimization. Majority rule captures the dominant character of a name, which is what matters for admissibility.
Future Consideration
If finer-grained nucleus control is needed, proportional scoring could be introduced as an optional mode. This would require extending the policy matrix to handle float thresholds (e.g., short_vowel > 0.6). For now, majority rule provides the right balance of simplicity and expressiveness.
Usage
>>> from build_tools.name_combiner.aggregator import aggregate_features
>>> syllables = [
... {"syllable": "ka", "features": {"starts_with_vowel": False, ...}},
... {"syllable": "li", "features": {"contains_liquid": True, ...}},
... ]
>>> name_features = aggregate_features(syllables)
>>> name_features["starts_with_vowel"] # From first syllable
False
>>> name_features["contains_liquid"] # OR across all
True
Attributes
Functions
|
Aggregate syllable-level features into a name-level feature vector. |
Module Contents
- build_tools.name_combiner.aggregator.ONSET_FEATURES
- build_tools.name_combiner.aggregator.CODA_FEATURES
- build_tools.name_combiner.aggregator.INTERNAL_FEATURES
- build_tools.name_combiner.aggregator.NUCLEUS_FEATURES
- build_tools.name_combiner.aggregator.ALL_FEATURES = ('starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...
- build_tools.name_combiner.aggregator.aggregate_features(syllables)[source]
Aggregate syllable-level features into a name-level feature vector.
Takes a sequence of syllable dictionaries (each with a “features” key) and produces a single boolean feature vector for the combined name.
Parameters
- syllablesSequence[dict]
List of syllable dictionaries, each containing: - “syllable”: str - The syllable text - “features”: dict[str, bool] - The 12 boolean features
Returns
- dict[str, bool]
Name-level feature vector with all 12 features as booleans.
Raises
- ValueError
If syllables list is empty or missing required keys.
Examples
>>> syllables = [ ... {"syllable": "ka", "features": { ... "starts_with_vowel": False, ... "starts_with_cluster": False, ... "starts_with_heavy_cluster": False, ... "contains_plosive": True, ... "contains_fricative": False, ... "contains_liquid": False, ... "contains_nasal": False, ... "short_vowel": True, ... "long_vowel": False, ... "ends_with_vowel": True, ... "ends_with_nasal": False, ... "ends_with_stop": False, ... }}, ... {"syllable": "li", "features": { ... "starts_with_vowel": False, ... "starts_with_cluster": False, ... "starts_with_heavy_cluster": False, ... "contains_plosive": False, ... "contains_fricative": False, ... "contains_liquid": True, ... "contains_nasal": False, ... "short_vowel": True, ... "long_vowel": False, ... "ends_with_vowel": True, ... "ends_with_nasal": False, ... "ends_with_stop": False, ... }}, ... ] >>> features = aggregate_features(syllables) >>> features["starts_with_vowel"] # From first syllable ("ka") False >>> features["ends_with_vowel"] # From final syllable ("li") True >>> features["contains_liquid"] # OR: True because "li" has it True >>> features["short_vowel"] # Majority: 2/2 = 100% > 50% True
Notes
Aggregation follows these rules:
Onset (starts_with_*): First syllable only
Coda (ends_with_*): Final syllable only
Internal (contains_*): OR across all syllables
Nucleus (short_vowel, long_vowel): Majority rule (>50%)
See module docstring for detailed rationale on majority rule.