Name Selector
Overview
Name Selector - Policy-Based Name Filtering and Ranking
Evaluates name candidates against name class policies to produce ranked, admissible name lists. This is a build-time tool only - not used during runtime name generation.
This module is the second stage of the Selection Policy Layer. It performs policy evaluation on candidates produced by the name_combiner module.
- Architectural Boundary:
The selector is the governance layer. All admissibility decisions, scoring, and rejection logic live here. The combiner upstream is purely structural.
Features: - Load name class policies from YAML configuration - Evaluate candidates against 12-feature policies - Hard mode (reject on discouraged) or soft mode (negative score) - Ranked output by score - Detailed evaluation metadata for debugging
Policy Logic: - Preferred feature present: +1 score - Tolerated feature present: 0 score - Discouraged feature present: Reject (hard) or -10 (soft)
- Usage:
>>> from build_tools.name_selector import select_names, load_name_classes >>> policies = load_name_classes("data/name_classes.yml") >>> selected = select_names(candidates, policies["first_name"], count=100) >>> for name in selected[:5]: ... print(f"{name['name']}: score={name['score']}")
CLI:
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class first_name \
--count 100
Command-Line Interface
Filter and rank name candidates against a name class policy. Evaluates candidates using the 12-feature policy matrix and produces ranked, admissible name lists. This is a build-time tool for the Selection Policy Layer.
usage: python -m build_tools.name_selector [-h] --run-dir RUN_DIR --candidates
CANDIDATES --name-class NAME_CLASS
[--policy-file POLICY_FILE]
[--count COUNT]
[--mode {hard,soft}]
Named Arguments
- --run-dir
Path to extraction run directory. Example: _working/output/20260110_115453_pyphen/
- --candidates
Path to candidates JSON file, relative to run-dir. If the wrong prefix is specified (e.g., nltk_ for a pyphen run), the correct file will be auto-detected. Example: candidates/pyphen_candidates_2syl.json
- --name-class
Name class identifier from name_classes.yml. Examples: first_name, last_name, place_name
- --policy-file
Path to name_classes.yml. If not specified, uses data/name_classes.yml from project root. Default: data/name_classes.yml
- --count
Maximum number of names to output. Default: 100.
Default:
100- --mode
Possible choices: hard, soft
Evaluation mode. ‘hard’ rejects candidates with discouraged features. ‘soft’ applies -10 penalty instead. Default: hard.
Default:
'hard'
Examples:
# Select first names from 2-syllable candidates
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class first_name \
--count 100
# Select place names with soft mode (penalties instead of rejection)
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_3syl.json \
--name-class place_name \
--mode soft
# Use a custom policy file
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class first_name \
--policy-file custom_policies.yml
- Output:
Creates
selections/{prefix}_{name_class}_{N}syl.jsonin the run directory. The prefix and syllable count are extracted from the candidates filename.
Output Format
Input/Output Contract
Inputs:
<run_directory>/candidates/{prefix}_candidates_{N}syl.json- From name_combinerdata/name_classes.yml- Policy configuration (or custom path)
Output:
<run_directory>/selections/{prefix}_{name_class}_{N}syl.json
Example directory structure after selection:
_working/output/20260110_115453_pyphen/
├── candidates/
│ └── pyphen_candidates_2syl.json ← Input
├── selections/
│ ├── pyphen_first_name_2syl.json ← Generated output
│ ├── pyphen_last_name_2syl.json
│ ├── pyphen_place_name_2syl.json
│ ├── pyphen_location_name_2syl.json
│ ├── pyphen_object_item_2syl.json
│ ├── pyphen_organisation_2syl.json
│ └── pyphen_title_epithet_2syl.json
├── data/
├── meta/
└── ...
Available Name Classes
The default policy file (data/name_classes.yml) defines these name classes:
Name Class |
Optimization |
Syllables |
Key Constraints |
|---|---|---|---|
|
Addressability |
2-3 |
Prefers vowel endings, avoids heavy clusters |
|
Durability |
2-3 |
Prefers stop endings, avoids vowel endings |
|
Stability |
2-4 |
Prefers clusters, vowel endings |
|
Meaning Compression |
1-3 |
Prefers heavy clusters, all texture features |
|
Distinction |
1-2 |
Prefers short vowels, stop endings |
|
Cadence |
2-4 |
All texture features, long vowels, nasal/stop endings |
|
Authority |
1-2 |
Heavy clusters, long vowels, avoids short vowels |
Output Structure
The selector produces JSON with this structure:
{
"metadata": {
"source_candidates": "pyphen_candidates_2syl.json",
"name_class": "first_name",
"policy_description": "Direct social address...",
"policy_file": "data/name_classes.yml",
"mode": "hard",
"order": "alphabetical",
"seed": 42,
"total_evaluated": 10000,
"admitted": 7420,
"rejected": 2580,
"rejection_reasons": {
"ends_with_stop": 2580
},
"score_distribution": {
"0": 5000,
"1": 2000,
"2": 420
},
"output_count": 100,
"generated_at": "2026-01-10T12:00:00Z"
},
"selections": [
{
"name": "kali",
"syllables": ["ka", "li"],
"features": {...},
"score": 2,
"rank": 1,
"evaluation": {
"preferred_hits": ["ends_with_vowel", "contains_liquid"],
"tolerated_hits": [],
"discouraged_hits": [],
"rejection_reason": null
}
}
]
}
Policy Configuration
Policies are defined in YAML with the following structure:
version: "1.0"
name_classes:
first_name:
description: "Direct social address. Optimized for addressability."
syllable_range: [2, 3]
features:
starts_with_vowel: preferred
ends_with_vowel: preferred
ends_with_stop: discouraged
contains_liquid: preferred
# ... all 12 features
Policy values:
preferred: +1 score when feature is presenttolerated: 0 score (neutral)discouraged: Reject (hard mode) or -10 score (soft mode)
Integration Guide
The name selector is the governance layer of the Selection Policy Layer. It evaluates candidates produced by the name_combiner against name class policies.
Typical workflow:
# Generate candidates first
python -m build_tools.name_combiner \
--run-dir _working/output/20260110_115453_pyphen/ \
--syllables 2 \
--count 10000
# Select for different name classes
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class first_name \
--count 100
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class last_name \
--count 100
# Select for other name classes as needed
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class organisation \
--count 50
When to use this tool:
After generating candidates with name_combiner
When you need filtered, ranked name lists per class
For generating production-ready name pools
To analyze policy effectiveness via statistics output
Evaluation modes:
hard (default): Candidates with discouraged features are rejected entirely
soft: Candidates with discouraged features receive -10 penalty instead of rejection
Ordering modes:
alphabetical (default): Names with equal scores are sorted alphabetically for deterministic output
random: Names with equal scores are shuffled within score groups using a seeded RNG for variety while maintaining determinism
Notes
Scoring:
Preferred features: +1 each
Tolerated features: 0
Discouraged features: Reject (hard) or -10 (soft)
Names are ranked by total score (descending). Tiebreaking for equal scores can be:
Alphabetical (default): Deterministic ordering by name for reproducibility
Random: Shuffled within score groups using a seed for variety while maintaining determinism
Syllable count filtering:
The selector filters by syllable count from the policy’s syllable_range before
scoring. Candidates outside the range are excluded regardless of feature scores.
Statistics output:
The CLI displays rejection statistics to help tune policies:
Evaluated: 10,000
Admitted: 7,420 (74.2%)
Rejected: 2,580
Rejection reasons:
ends_with_stop: 2,580
Build-time tool:
This is a build-time tool only - not used during runtime name generation.
API Reference
Name Selector - Policy-Based Name Filtering and Ranking
Evaluates name candidates against name class policies to produce ranked, admissible name lists. This is a build-time tool only - not used during runtime name generation.
This module is the second stage of the Selection Policy Layer. It performs policy evaluation on candidates produced by the name_combiner module.
- Architectural Boundary:
The selector is the governance layer. All admissibility decisions, scoring, and rejection logic live here. The combiner upstream is purely structural.
Features: - Load name class policies from YAML configuration - Evaluate candidates against 12-feature policies - Hard mode (reject on discouraged) or soft mode (negative score) - Ranked output by score - Detailed evaluation metadata for debugging
Policy Logic: - Preferred feature present: +1 score - Tolerated feature present: 0 score - Discouraged feature present: Reject (hard) or -10 (soft)
- Usage:
>>> from build_tools.name_selector import select_names, load_name_classes >>> policies = load_name_classes("data/name_classes.yml") >>> selected = select_names(candidates, policies["first_name"], count=100) >>> for name in selected[:5]: ... print(f"{name['name']}: score={name['score']}")
CLI:
python -m build_tools.name_selector \
--run-dir _working/output/20260110_115453_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class first_name \
--count 100
- class build_tools.name_selector.NameClassPolicy(name, description, syllable_range, features=<factory>)[source]
Bases:
objectPolicy configuration for a single name class.
Defines feature preferences for evaluating name candidates. Policies are loaded from YAML and remain immutable during evaluation.
Attributes
- namestr
Identifier for this name class (e.g., “first_name”, “place_name”).
- descriptionstr
Human-readable description of the name class purpose.
- syllable_rangetuple[int, int]
Allowed syllable count range [min, max], inclusive.
- featuresdict[str, PolicyValue]
Mapping of feature name to policy value (“preferred”, “tolerated”, “discouraged”).
Examples
>>> policy = NameClassPolicy( ... name="first_name", ... description="Direct social address.", ... syllable_range=(2, 3), ... features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"}, ... ) >>> policy.features["ends_with_vowel"] 'preferred'
- build_tools.name_selector.evaluate_candidate(candidate, policy, mode='hard')[source]
Evaluate a name candidate against a name class policy.
Scores the candidate based on which of its TRUE features match preferred, tolerated, or discouraged designations in the policy.
Parameters
- candidatedict
Candidate dictionary with “name”, “features”, and optionally “syllables”. Features must be a dict[str, bool].
- policyNameClassPolicy
The policy to evaluate against.
- mode{“hard”, “soft”}, optional
Evaluation mode. “hard” rejects on any discouraged feature. “soft” applies a -10 penalty instead. Default: “hard”.
Returns
- tuple[bool, int, dict]
admitted: True if candidate passes policy, False if rejected
score: Numeric score (higher is better)
details: Evaluation details for debugging
- Details dict structure:
preferred_hits: list[str] - Preferred features that are TRUE
tolerated_hits: list[str] - Tolerated features that are TRUE
discouraged_hits: list[str] - Discouraged features that are TRUE
rejection_reason: str | None - Reason for rejection (if any)
Examples
>>> # Candidate with preferred feature >>> candidate = {"name": "kali", "features": {"ends_with_vowel": True}} >>> admitted, score, details = evaluate_candidate(candidate, policy) >>> admitted, score (True, 1)
>>> # Candidate with discouraged feature (hard mode) >>> candidate = {"name": "kalt", "features": {"ends_with_stop": True}} >>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard") >>> admitted False >>> details["rejection_reason"] 'ends_with_stop'
Notes
Only TRUE features are evaluated. If a feature is FALSE in the candidate, it does not contribute to the score regardless of its policy designation.
This means “discouraged” means “discouraged when present”, not “required to be absent”.
- build_tools.name_selector.load_name_classes(yaml_path)[source]
Load name class policies from a YAML file.
- Return type:
Parameters
- yaml_pathstr | Path
Path to the name_classes.yml file.
Returns
- dict[str, NameClassPolicy]
Dictionary mapping name class identifiers to their policies.
Raises
- FileNotFoundError
If the YAML file does not exist.
- ValueError
If the YAML structure is invalid or policies fail validation.
Examples
>>> policies = load_name_classes("data/name_classes.yml") >>> "first_name" in policies True >>> policies["first_name"].syllable_range (2, 3)
- build_tools.name_selector.select_names(candidates, policy, count=100, mode='hard', order='alphabetical', seed=None)[source]
Select and rank name candidates against a policy.
Evaluates all candidates, filters out rejected ones, ranks by score, and returns the top N.
Parameters
- candidatesSequence[dict]
List of candidate dictionaries from name_combiner output. Each must have “name”, “syllables”, and “features” keys.
- policyNameClassPolicy
The policy to evaluate against.
- countint, optional
Maximum number of names to return. Default: 100.
- mode{“hard”, “soft”}, optional
Evaluation mode. “hard” rejects on discouraged features. “soft” applies penalties. Default: “hard”.
- order{“alphabetical”, “random”}, optional
Ordering for names with equal scores. “alphabetical” sorts by name for deterministic output. “random” shuffles within score groups using the provided seed. Default: “alphabetical”.
- seedint, optional
RNG seed for random ordering. Only used when order=”random”. Required for deterministic random ordering. Default: None.
Returns
- list[dict]
List of selected candidates, sorted by score (descending). Each candidate is augmented with “score”, “rank”, and “evaluation”.
Examples
>>> selected = select_names(candidates, policy, count=50) >>> selected[0]["rank"] 1 >>> selected[0]["score"] # Highest score 4 >>> len(selected) 50
Notes
The returned candidates are augmented with: - score: int - The policy score - rank: int - 1-based rank (1 = best) - evaluation: dict - Detailed evaluation breakdown
Name class policy data models and YAML loading.
This module defines the dataclasses for representing name class policies and provides functions to load them from YAML configuration files.
The Name Class Matrix is externalized to data/name_classes.yml, separating policy configuration from code. This enables: - Non-programmers to tune name classes - Version control tracking of policy evolution - Multiple projects sharing the codebase with different policies
Policy Structure
Each name class defines: - description: Human-readable purpose - syllable_range: [min, max] syllables (inclusive) - features: Dict mapping feature names to policy values
Policy values: - “preferred”: Actively sought (+1 score) - “tolerated”: Neutral (0 score) - “discouraged”: Rejected or penalized
Usage
>>> from build_tools.name_selector.name_class import load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> first_name_policy = policies["first_name"]
>>> first_name_policy.description
'Direct social address. Optimized for addressability and mouth-feel.'
>>> first_name_policy.features["ends_with_vowel"]
'preferred'
- class build_tools.name_selector.name_class.NameClassPolicy(name, description, syllable_range, features=<factory>)[source]
Bases:
objectPolicy configuration for a single name class.
Defines feature preferences for evaluating name candidates. Policies are loaded from YAML and remain immutable during evaluation.
Attributes
- namestr
Identifier for this name class (e.g., “first_name”, “place_name”).
- descriptionstr
Human-readable description of the name class purpose.
- syllable_rangetuple[int, int]
Allowed syllable count range [min, max], inclusive.
- featuresdict[str, PolicyValue]
Mapping of feature name to policy value (“preferred”, “tolerated”, “discouraged”).
Examples
>>> policy = NameClassPolicy( ... name="first_name", ... description="Direct social address.", ... syllable_range=(2, 3), ... features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"}, ... ) >>> policy.features["ends_with_vowel"] 'preferred'
- build_tools.name_selector.name_class.get_default_policy_path()[source]
Get the default path to name_classes.yml.
- Return type:
Returns
- Path
Path to data/name_classes.yml relative to project root.
Notes
This assumes the project structure has data/name_classes.yml at the root.
- build_tools.name_selector.name_class.load_name_classes(yaml_path)[source]
Load name class policies from a YAML file.
- Return type:
Parameters
- yaml_pathstr | Path
Path to the name_classes.yml file.
Returns
- dict[str, NameClassPolicy]
Dictionary mapping name class identifiers to their policies.
Raises
- FileNotFoundError
If the YAML file does not exist.
- ValueError
If the YAML structure is invalid or policies fail validation.
Examples
>>> policies = load_name_classes("data/name_classes.yml") >>> "first_name" in policies True >>> policies["first_name"].syllable_range (2, 3)
Policy evaluation logic for name candidates.
This module contains the core evaluation function that scores a name candidate against a name class policy. It implements the ✓/~/✗ scoring model defined in the Name Class Matrix.
Scoring Model
Preferred (✓): Feature present → +1 score
Tolerated (~): Feature present → 0 score (neutral)
Discouraged (✗): Feature present → Reject (hard) or -10 (soft)
The evaluation considers only features that are TRUE in the candidate. Features that are FALSE do not contribute to the score (absence is neutral).
Evaluation Modes
- Hard Mode (default):
Any discouraged feature present causes immediate rejection. The candidate is not scored further.
- Soft Mode:
Discouraged features apply a -10 penalty instead of rejection. Useful for exploring edge cases or when flexibility is needed.
Usage
>>> from build_tools.name_selector.policy import evaluate_candidate
>>> from build_tools.name_selector.name_class import NameClassPolicy
>>>
>>> policy = NameClassPolicy(
... name="first_name",
... description="Test",
... syllable_range=(2, 3),
... features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> candidate = {
... "name": "kali",
... "features": {"ends_with_vowel": True, "ends_with_stop": False},
... }
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
True
>>> score
1
>>> details["preferred_hits"]
['ends_with_vowel']
- build_tools.name_selector.policy.check_syllable_count(candidate, policy)[source]
Check if a candidate’s syllable count is within policy range.
- Return type:
Parameters
- candidatedict
Candidate dictionary with “syllables” key (list of syllable strings).
- policyNameClassPolicy
The policy with syllable_range constraint.
Returns
- bool
True if syllable count is within range, False otherwise.
Examples
>>> policy = NameClassPolicy(..., syllable_range=(2, 3)) >>> check_syllable_count({"syllables": ["ka", "li"]}, policy) True >>> check_syllable_count({"syllables": ["ka"]}, policy) False
- build_tools.name_selector.policy.evaluate_candidate(candidate, policy, mode='hard')[source]
Evaluate a name candidate against a name class policy.
Scores the candidate based on which of its TRUE features match preferred, tolerated, or discouraged designations in the policy.
Parameters
- candidatedict
Candidate dictionary with “name”, “features”, and optionally “syllables”. Features must be a dict[str, bool].
- policyNameClassPolicy
The policy to evaluate against.
- mode{“hard”, “soft”}, optional
Evaluation mode. “hard” rejects on any discouraged feature. “soft” applies a -10 penalty instead. Default: “hard”.
Returns
- tuple[bool, int, dict]
admitted: True if candidate passes policy, False if rejected
score: Numeric score (higher is better)
details: Evaluation details for debugging
- Details dict structure:
preferred_hits: list[str] - Preferred features that are TRUE
tolerated_hits: list[str] - Tolerated features that are TRUE
discouraged_hits: list[str] - Discouraged features that are TRUE
rejection_reason: str | None - Reason for rejection (if any)
Examples
>>> # Candidate with preferred feature >>> candidate = {"name": "kali", "features": {"ends_with_vowel": True}} >>> admitted, score, details = evaluate_candidate(candidate, policy) >>> admitted, score (True, 1)
>>> # Candidate with discouraged feature (hard mode) >>> candidate = {"name": "kalt", "features": {"ends_with_stop": True}} >>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard") >>> admitted False >>> details["rejection_reason"] 'ends_with_stop'
Notes
Only TRUE features are evaluated. If a feature is FALSE in the candidate, it does not contribute to the score regardless of its policy designation.
This means “discouraged” means “discouraged when present”, not “required to be absent”.
Main selector orchestration logic.
This module provides the high-level selection function that coordinates loading candidates, evaluating them against a policy, and producing ranked output.
The selector is the central orchestrator of the Selection Policy Layer. It ties together: - Candidate loading (from name_combiner output) - Policy evaluation (from policy.py) - Result ranking and filtering
Usage
>>> from build_tools.name_selector import select_names, load_name_classes
>>>
>>> # Load policies and candidates
>>> policies = load_name_classes("data/name_classes.yml")
>>> with open("candidates/pyphen_candidates_2syl.json") as f:
... candidates_data = json.load(f)
>>>
>>> # Select names
>>> selected = select_names(
... candidates=candidates_data["candidates"],
... policy=policies["first_name"],
... count=100,
... mode="hard",
... )
>>>
>>> for name in selected[:5]:
... print(f"{name['name']}: score={name['score']}, rank={name['rank']}")
- build_tools.name_selector.selector.compute_selection_statistics(candidates, policy, mode='hard')[source]
Compute statistics about a selection operation.
Evaluates all candidates and returns aggregate statistics without building the full result list.
- Return type:
Parameters
- candidatesSequence[dict]
List of candidate dictionaries.
- policyNameClassPolicy
The policy to evaluate against.
- mode{“hard”, “soft”}, optional
Evaluation mode. Default: “hard”.
Returns
- dict
Statistics dictionary containing: - total_evaluated: int - admitted: int - rejected: int - rejection_reasons: dict[str, int] - score_distribution: dict[int, int] (score -> count)
Examples
>>> stats = compute_selection_statistics(candidates, policy) >>> stats["admitted"] 2341 >>> stats["rejection_reasons"]["ends_with_stop"] 1234
- build_tools.name_selector.selector.select_names(candidates, policy, count=100, mode='hard', order='alphabetical', seed=None)[source]
Select and rank name candidates against a policy.
Evaluates all candidates, filters out rejected ones, ranks by score, and returns the top N.
Parameters
- candidatesSequence[dict]
List of candidate dictionaries from name_combiner output. Each must have “name”, “syllables”, and “features” keys.
- policyNameClassPolicy
The policy to evaluate against.
- countint, optional
Maximum number of names to return. Default: 100.
- mode{“hard”, “soft”}, optional
Evaluation mode. “hard” rejects on discouraged features. “soft” applies penalties. Default: “hard”.
- order{“alphabetical”, “random”}, optional
Ordering for names with equal scores. “alphabetical” sorts by name for deterministic output. “random” shuffles within score groups using the provided seed. Default: “alphabetical”.
- seedint, optional
RNG seed for random ordering. Only used when order=”random”. Required for deterministic random ordering. Default: None.
Returns
- list[dict]
List of selected candidates, sorted by score (descending). Each candidate is augmented with “score”, “rank”, and “evaluation”.
Examples
>>> selected = select_names(candidates, policy, count=50) >>> selected[0]["rank"] 1 >>> selected[0]["score"] # Highest score 4 >>> len(selected) 50
Notes
The returned candidates are augmented with: - score: int - The policy score - rank: int - 1-based rank (1 = best) - evaluation: dict - Detailed evaluation breakdown