Name Selector

Overview

Name Selector - Policy-Based Name Filtering and Ranking

Evaluates name candidates against name class policies to produce ranked, admissible name lists. This is a build-time tool only - not used during runtime name generation.

This module is the second stage of the Selection Policy Layer. It performs policy evaluation on candidates produced by the name_combiner module.

Architectural Boundary:: The selector is the governance layer. All admissibility decisions, scoring, and rejection logic live here. The combiner upstream is purely structural.

Features: - Load name class policies from YAML configuration - Evaluate candidates against 12-feature policies - Hard mode (reject on discouraged) or soft mode (negative score) - Ranked output by score - Detailed evaluation metadata for debugging

Policy Logic: - Preferred feature present: +1 score - Tolerated feature present: 0 score - Discouraged feature present: Reject (hard) or -10 (soft)

Usage:

>>> from build_tools.name_selector import select_names, load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> selected = select_names(candidates, policies["first_name"], count=100)
>>> for name in selected[:5]:
...     print(f"{name['name']}: score={name['score']}")

CLI:

python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --count 100

Command-Line Interface

Filter and rank name candidates against a name class policy. Evaluates candidates using the 12-feature policy matrix and produces ranked, admissible name lists. This is a build-time tool for the Selection Policy Layer.

usage: python -m build_tools.name_selector [-h] --run-dir RUN_DIR --candidates
                                           CANDIDATES --name-class NAME_CLASS
                                           [--policy-file POLICY_FILE]
                                           [--count COUNT]
                                           [--mode {hard,soft}]

Named Arguments

--run-dir

Path to extraction run directory. Example: _working/output/20260110_115453_pyphen/

--candidates

Path to candidates JSON file, relative to run-dir. If the wrong prefix is specified (e.g., nltk_ for a pyphen run), the correct file will be auto-detected. Example: candidates/pyphen_candidates_2syl.json

--name-class

Name class identifier from name_classes.yml. Examples: first_name, last_name, place_name

--policy-file

Path to name_classes.yml. If not specified, uses data/name_classes.yml from project root. Default: data/name_classes.yml

--count

Maximum number of names to output. Default: 100.

Default: 100

--mode

Possible choices: hard, soft

Evaluation mode. ‘hard’ rejects candidates with discouraged features. ‘soft’ applies -10 penalty instead. Default: hard.

Default: 'hard'

Examples:

# Select first names from 2-syllable candidates
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --count 100

# Select place names with soft mode (penalties instead of rejection)
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_3syl.json \
    --name-class place_name \
    --mode soft

# Use a custom policy file
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --policy-file custom_policies.yml

Output:: Creates selections/{prefix}_{name_class}_{N}syl.json in the run directory. The prefix and syllable count are extracted from the candidates filename.

Output Format

Input/Output Contract

Inputs:

<run_directory>/candidates/{prefix}_candidates_{N}syl.json - From name_combiner
data/name_classes.yml - Policy configuration (or custom path)

Output:

<run_directory>/selections/{prefix}_{name_class}_{N}syl.json

Example directory structure after selection:

_working/output/20260110_115453_pyphen/
├── candidates/
│   └── pyphen_candidates_2syl.json      ← Input
├── selections/
│   ├── pyphen_first_name_2syl.json      ← Generated output
│   ├── pyphen_last_name_2syl.json
│   ├── pyphen_place_name_2syl.json
│   ├── pyphen_location_name_2syl.json
│   ├── pyphen_object_item_2syl.json
│   ├── pyphen_organisation_2syl.json
│   └── pyphen_title_epithet_2syl.json
├── data/
├── meta/
└── ...

Available Name Classes

The default policy file (data/name_classes.yml) defines these name classes:

Name Class	Optimization	Syllables	Key Constraints
`first_name`	Addressability	2-3	Prefers vowel endings, avoids heavy clusters
`last_name`	Durability	2-3	Prefers stop endings, avoids vowel endings
`place_name`	Stability	2-4	Prefers clusters, vowel endings
`location_name`	Meaning Compression	1-3	Prefers heavy clusters, all texture features
`object_item`	Distinction	1-2	Prefers short vowels, stop endings
`organisation`	Cadence	2-4	All texture features, long vowels, nasal/stop endings
`title_epithet`	Authority	1-2	Heavy clusters, long vowels, avoids short vowels

Output Structure

The selector produces JSON with this structure:

{
  "metadata": {
    "source_candidates": "pyphen_candidates_2syl.json",
    "name_class": "first_name",
    "policy_description": "Direct social address...",
    "policy_file": "data/name_classes.yml",
    "mode": "hard",
    "order": "alphabetical",
    "seed": 42,
    "total_evaluated": 10000,
    "admitted": 7420,
    "rejected": 2580,
    "rejection_reasons": {
      "ends_with_stop": 2580
    },
    "score_distribution": {
      "0": 5000,
      "1": 2000,
      "2": 420
    },
    "output_count": 100,
    "generated_at": "2026-01-10T12:00:00Z"
  },
  "selections": [
    {
      "name": "kali",
      "syllables": ["ka", "li"],
      "features": {...},
      "score": 2,
      "rank": 1,
      "evaluation": {
        "preferred_hits": ["ends_with_vowel", "contains_liquid"],
        "tolerated_hits": [],
        "discouraged_hits": [],
        "rejection_reason": null
      }
    }
  ]
}

Policy Configuration

Policies are defined in YAML with the following structure:

version: "1.0"
name_classes:
  first_name:
    description: "Direct social address. Optimized for addressability."
    syllable_range: [2, 3]
    features:
      starts_with_vowel: preferred
      ends_with_vowel: preferred
      ends_with_stop: discouraged
      contains_liquid: preferred
      # ... all 12 features

Policy values:

preferred: +1 score when feature is present
tolerated: 0 score (neutral)
discouraged: Reject (hard mode) or -10 score (soft mode)

Integration Guide

The name selector is the governance layer of the Selection Policy Layer. It evaluates candidates produced by the name_combiner against name class policies.

Typical workflow:

# Generate candidates first
python -m build_tools.name_combiner \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --syllables 2 \
  --count 10000

# Select for different name classes
python -m build_tools.name_selector \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --candidates candidates/pyphen_candidates_2syl.json \
  --name-class first_name \
  --count 100

python -m build_tools.name_selector \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --candidates candidates/pyphen_candidates_2syl.json \
  --name-class last_name \
  --count 100

# Select for other name classes as needed
python -m build_tools.name_selector \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --candidates candidates/pyphen_candidates_2syl.json \
  --name-class organisation \
  --count 50

When to use this tool:

After generating candidates with name_combiner
When you need filtered, ranked name lists per class
For generating production-ready name pools
To analyze policy effectiveness via statistics output

Evaluation modes:

hard (default): Candidates with discouraged features are rejected entirely
soft: Candidates with discouraged features receive -10 penalty instead of rejection

Ordering modes:

alphabetical (default): Names with equal scores are sorted alphabetically for deterministic output
random: Names with equal scores are shuffled within score groups using a seeded RNG for variety while maintaining determinism

Notes

Scoring:

Preferred features: +1 each
Tolerated features: 0
Discouraged features: Reject (hard) or -10 (soft)

Names are ranked by total score (descending). Tiebreaking for equal scores can be:

Alphabetical (default): Deterministic ordering by name for reproducibility
Random: Shuffled within score groups using a seed for variety while maintaining determinism

Syllable count filtering:

The selector filters by syllable count from the policy’s syllable_range before scoring. Candidates outside the range are excluded regardless of feature scores.

Statistics output:

The CLI displays rejection statistics to help tune policies:

Evaluated: 10,000
Admitted: 7,420 (74.2%)
Rejected: 2,580
Rejection reasons:
  ends_with_stop: 2,580

Build-time tool:

This is a build-time tool only - not used during runtime name generation.

API Reference

Name Selector - Policy-Based Name Filtering and Ranking

Evaluates name candidates against name class policies to produce ranked, admissible name lists. This is a build-time tool only - not used during runtime name generation.

This module is the second stage of the Selection Policy Layer. It performs policy evaluation on candidates produced by the name_combiner module.

Architectural Boundary:: The selector is the governance layer. All admissibility decisions, scoring, and rejection logic live here. The combiner upstream is purely structural.

Features: - Load name class policies from YAML configuration - Evaluate candidates against 12-feature policies - Hard mode (reject on discouraged) or soft mode (negative score) - Ranked output by score - Detailed evaluation metadata for debugging

Policy Logic: - Preferred feature present: +1 score - Tolerated feature present: 0 score - Discouraged feature present: Reject (hard) or -10 (soft)

Usage:

>>> from build_tools.name_selector import select_names, load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> selected = select_names(candidates, policies["first_name"], count=100)
>>> for name in selected[:5]:
...     print(f"{name['name']}: score={name['score']}")

CLI:

python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --count 100

class build_tools.name_selector.NameClassPolicy(name, description, syllable_range, features=<factory>)[source]

Bases: object

Policy configuration for a single name class.

Defines feature preferences for evaluating name candidates. Policies are loaded from YAML and remain immutable during evaluation.

Attributes

namestr: Identifier for this name class (e.g., “first_name”, “place_name”).
descriptionstr: Human-readable description of the name class purpose.
syllable_rangetuple[int, int]: Allowed syllable count range [min, max], inclusive.
featuresdict[str, PolicyValue]: Mapping of feature name to policy value (“preferred”, “tolerated”, “discouraged”).

Examples

>>> policy = NameClassPolicy(
...     name="first_name",
...     description="Direct social address.",
...     syllable_range=(2, 3),
...     features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> policy.features["ends_with_vowel"]
'preferred'

__post_init__()[source]

Validate policy configuration.

Return type:: None

description: str

features: dict[str, Literal['preferred', 'tolerated', 'discouraged']]

name: str

syllable_range: tuple[int, int]

build_tools.name_selector.evaluate_candidate(candidate, policy, mode='hard')[source]

Evaluate a name candidate against a name class policy.

Scores the candidate based on which of its TRUE features match preferred, tolerated, or discouraged designations in the policy.

Return type:: tuple[bool, int, dict]

Parameters

candidatedict: Candidate dictionary with “name”, “features”, and optionally “syllables”. Features must be a dict[str, bool].
policyNameClassPolicy: The policy to evaluate against.
mode{“hard”, “soft”}, optional: Evaluation mode. “hard” rejects on any discouraged feature. “soft” applies a -10 penalty instead. Default: “hard”.

Returns

tuple[bool, int, dict]

admitted: True if candidate passes policy, False if rejected
score: Numeric score (higher is better)
details: Evaluation details for debugging

Details dict structure:

preferred_hits: list[str] - Preferred features that are TRUE
tolerated_hits: list[str] - Tolerated features that are TRUE
discouraged_hits: list[str] - Discouraged features that are TRUE
rejection_reason: str | None - Reason for rejection (if any)

Examples

>>> # Candidate with preferred feature
>>> candidate = {"name": "kali", "features": {"ends_with_vowel": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy)
>>> admitted, score
(True, 1)

>>> # Candidate with discouraged feature (hard mode)
>>> candidate = {"name": "kalt", "features": {"ends_with_stop": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
False
>>> details["rejection_reason"]
'ends_with_stop'

Notes

Only TRUE features are evaluated. If a feature is FALSE in the candidate, it does not contribute to the score regardless of its policy designation.

This means “discouraged” means “discouraged when present”, not “required to be absent”.

build_tools.name_selector.load_name_classes(yaml_path)[source]

Load name class policies from a YAML file.

Return type:: dict[str, NameClassPolicy]

Parameters

yaml_pathstr | Path: Path to the name_classes.yml file.

Returns

dict[str, NameClassPolicy]: Dictionary mapping name class identifiers to their policies.

Raises

FileNotFoundError: If the YAML file does not exist.
ValueError: If the YAML structure is invalid or policies fail validation.

Examples

>>> policies = load_name_classes("data/name_classes.yml")
>>> "first_name" in policies
True
>>> policies["first_name"].syllable_range
(2, 3)

build_tools.name_selector.select_names(candidates, policy, count=100, mode='hard', order='alphabetical', seed=None)[source]

Select and rank name candidates against a policy.

Evaluates all candidates, filters out rejected ones, ranks by score, and returns the top N.

Return type:: list[dict]

Parameters

candidatesSequence[dict]: List of candidate dictionaries from name_combiner output. Each must have “name”, “syllables”, and “features” keys.
policyNameClassPolicy: The policy to evaluate against.
countint, optional: Maximum number of names to return. Default: 100.
mode{“hard”, “soft”}, optional: Evaluation mode. “hard” rejects on discouraged features. “soft” applies penalties. Default: “hard”.
order{“alphabetical”, “random”}, optional: Ordering for names with equal scores. “alphabetical” sorts by name for deterministic output. “random” shuffles within score groups using the provided seed. Default: “alphabetical”.
seedint, optional: RNG seed for random ordering. Only used when order=”random”. Required for deterministic random ordering. Default: None.

Returns

list[dict]: List of selected candidates, sorted by score (descending). Each candidate is augmented with “score”, “rank”, and “evaluation”.

Examples

>>> selected = select_names(candidates, policy, count=50)
>>> selected[0]["rank"]
1
>>> selected[0]["score"]  # Highest score
4
>>> len(selected)
50

Notes

The returned candidates are augmented with: - score: int - The policy score - rank: int - 1-based rank (1 = best) - evaluation: dict - Detailed evaluation breakdown

Name class policy data models and YAML loading.

This module defines the dataclasses for representing name class policies and provides functions to load them from YAML configuration files.

The Name Class Matrix is externalized to data/name_classes.yml, separating policy configuration from code. This enables: - Non-programmers to tune name classes - Version control tracking of policy evolution - Multiple projects sharing the codebase with different policies

Policy Structure

Each name class defines: - description: Human-readable purpose - syllable_range: [min, max] syllables (inclusive) - features: Dict mapping feature names to policy values

Policy values: - “preferred”: Actively sought (+1 score) - “tolerated”: Neutral (0 score) - “discouraged”: Rejected or penalized

Usage

>>> from build_tools.name_selector.name_class import load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> first_name_policy = policies["first_name"]
>>> first_name_policy.description
'Direct social address. Optimized for addressability and mouth-feel.'
>>> first_name_policy.features["ends_with_vowel"]
'preferred'

class build_tools.name_selector.name_class.NameClassPolicy(name, description, syllable_range, features=<factory>)[source]

Bases: object

Policy configuration for a single name class.

Defines feature preferences for evaluating name candidates. Policies are loaded from YAML and remain immutable during evaluation.

Attributes

namestr: Identifier for this name class (e.g., “first_name”, “place_name”).
descriptionstr: Human-readable description of the name class purpose.
syllable_rangetuple[int, int]: Allowed syllable count range [min, max], inclusive.
featuresdict[str, PolicyValue]: Mapping of feature name to policy value (“preferred”, “tolerated”, “discouraged”).

Examples

>>> policy = NameClassPolicy(
...     name="first_name",
...     description="Direct social address.",
...     syllable_range=(2, 3),
...     features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> policy.features["ends_with_vowel"]
'preferred'

__post_init__()[source]

Validate policy configuration.

Return type:: None

description: str

features: dict[str, Literal['preferred', 'tolerated', 'discouraged']]

name: str

syllable_range: tuple[int, int]

build_tools.name_selector.name_class.get_default_policy_path()[source]

Get the default path to name_classes.yml.

Return type:: Path

Returns

Path: Path to data/name_classes.yml relative to project root.

Notes

This assumes the project structure has data/name_classes.yml at the root.

build_tools.name_selector.name_class.load_name_classes(yaml_path)[source]

Load name class policies from a YAML file.

Return type:: dict[str, NameClassPolicy]

Parameters

yaml_pathstr | Path: Path to the name_classes.yml file.

Returns

dict[str, NameClassPolicy]: Dictionary mapping name class identifiers to their policies.

Raises

FileNotFoundError: If the YAML file does not exist.
ValueError: If the YAML structure is invalid or policies fail validation.

Examples

>>> policies = load_name_classes("data/name_classes.yml")
>>> "first_name" in policies
True
>>> policies["first_name"].syllable_range
(2, 3)

Policy evaluation logic for name candidates.

This module contains the core evaluation function that scores a name candidate against a name class policy. It implements the ✓/~/✗ scoring model defined in the Name Class Matrix.

Scoring Model

Preferred (✓): Feature present → +1 score
Tolerated (~): Feature present → 0 score (neutral)
Discouraged (✗): Feature present → Reject (hard) or -10 (soft)

The evaluation considers only features that are TRUE in the candidate. Features that are FALSE do not contribute to the score (absence is neutral).

Evaluation Modes

Hard Mode (default):: Any discouraged feature present causes immediate rejection. The candidate is not scored further.
Soft Mode:: Discouraged features apply a -10 penalty instead of rejection. Useful for exploring edge cases or when flexibility is needed.

Usage

>>> from build_tools.name_selector.policy import evaluate_candidate
>>> from build_tools.name_selector.name_class import NameClassPolicy
>>>
>>> policy = NameClassPolicy(
...     name="first_name",
...     description="Test",
...     syllable_range=(2, 3),
...     features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> candidate = {
...     "name": "kali",
...     "features": {"ends_with_vowel": True, "ends_with_stop": False},
... }
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
True
>>> score
1
>>> details["preferred_hits"]
['ends_with_vowel']

build_tools.name_selector.policy.check_syllable_count(candidate, policy)[source]

Check if a candidate’s syllable count is within policy range.

Return type:: bool

Parameters

candidatedict: Candidate dictionary with “syllables” key (list of syllable strings).
policyNameClassPolicy: The policy with syllable_range constraint.

Returns

bool: True if syllable count is within range, False otherwise.

Examples

>>> policy = NameClassPolicy(..., syllable_range=(2, 3))
>>> check_syllable_count({"syllables": ["ka", "li"]}, policy)
True
>>> check_syllable_count({"syllables": ["ka"]}, policy)
False

build_tools.name_selector.policy.evaluate_candidate(candidate, policy, mode='hard')[source]

Evaluate a name candidate against a name class policy.

Scores the candidate based on which of its TRUE features match preferred, tolerated, or discouraged designations in the policy.

Return type:: tuple[bool, int, dict]

Parameters

candidatedict: Candidate dictionary with “name”, “features”, and optionally “syllables”. Features must be a dict[str, bool].
policyNameClassPolicy: The policy to evaluate against.
mode{“hard”, “soft”}, optional: Evaluation mode. “hard” rejects on any discouraged feature. “soft” applies a -10 penalty instead. Default: “hard”.

Returns

tuple[bool, int, dict]

admitted: True if candidate passes policy, False if rejected
score: Numeric score (higher is better)
details: Evaluation details for debugging

Details dict structure:

preferred_hits: list[str] - Preferred features that are TRUE
tolerated_hits: list[str] - Tolerated features that are TRUE
discouraged_hits: list[str] - Discouraged features that are TRUE
rejection_reason: str | None - Reason for rejection (if any)

Examples

>>> # Candidate with preferred feature
>>> candidate = {"name": "kali", "features": {"ends_with_vowel": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy)
>>> admitted, score
(True, 1)

>>> # Candidate with discouraged feature (hard mode)
>>> candidate = {"name": "kalt", "features": {"ends_with_stop": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
False
>>> details["rejection_reason"]
'ends_with_stop'

Notes

Only TRUE features are evaluated. If a feature is FALSE in the candidate, it does not contribute to the score regardless of its policy designation.

This means “discouraged” means “discouraged when present”, not “required to be absent”.

Main selector orchestration logic.

This module provides the high-level selection function that coordinates loading candidates, evaluating them against a policy, and producing ranked output.

The selector is the central orchestrator of the Selection Policy Layer. It ties together: - Candidate loading (from name_combiner output) - Policy evaluation (from policy.py) - Result ranking and filtering

Usage

>>> from build_tools.name_selector import select_names, load_name_classes
>>>
>>> # Load policies and candidates
>>> policies = load_name_classes("data/name_classes.yml")
>>> with open("candidates/pyphen_candidates_2syl.json") as f:
...     candidates_data = json.load(f)
>>>
>>> # Select names
>>> selected = select_names(
...     candidates=candidates_data["candidates"],
...     policy=policies["first_name"],
...     count=100,
...     mode="hard",
... )
>>>
>>> for name in selected[:5]:
...     print(f"{name['name']}: score={name['score']}, rank={name['rank']}")

build_tools.name_selector.selector.compute_selection_statistics(candidates, policy, mode='hard')[source]

Compute statistics about a selection operation.

Evaluates all candidates and returns aggregate statistics without building the full result list.

Return type:: dict

Parameters

candidatesSequence[dict]: List of candidate dictionaries.
policyNameClassPolicy: The policy to evaluate against.
mode{“hard”, “soft”}, optional: Evaluation mode. Default: “hard”.

Returns

dict: Statistics dictionary containing: - total_evaluated: int - admitted: int - rejected: int - rejection_reasons: dict[str, int] - score_distribution: dict[int, int] (score -> count)

Examples

>>> stats = compute_selection_statistics(candidates, policy)
>>> stats["admitted"]
2341
>>> stats["rejection_reasons"]["ends_with_stop"]
1234

build_tools.name_selector.selector.select_names(candidates, policy, count=100, mode='hard', order='alphabetical', seed=None)[source]

Select and rank name candidates against a policy.

Evaluates all candidates, filters out rejected ones, ranks by score, and returns the top N.

Return type:: list[dict]

Parameters

candidatesSequence[dict]: List of candidate dictionaries from name_combiner output. Each must have “name”, “syllables”, and “features” keys.
policyNameClassPolicy: The policy to evaluate against.
countint, optional: Maximum number of names to return. Default: 100.
mode{“hard”, “soft”}, optional: Evaluation mode. “hard” rejects on discouraged features. “soft” applies penalties. Default: “hard”.
order{“alphabetical”, “random”}, optional: Ordering for names with equal scores. “alphabetical” sorts by name for deterministic output. “random” shuffles within score groups using the provided seed. Default: “alphabetical”.
seedint, optional: RNG seed for random ordering. Only used when order=”random”. Required for deterministic random ordering. Default: None.

Returns

list[dict]: List of selected candidates, sorted by score (descending). Each candidate is augmented with “score”, “rank”, and “evaluation”.

Examples

>>> selected = select_names(candidates, policy, count=50)
>>> selected[0]["rank"]
1
>>> selected[0]["score"]  # Highest score
4
>>> len(selected)
50

Notes

The returned candidates are augmented with: - score: int - The policy score - rank: int - 1-based rank (1 = best) - evaluation: dict - Detailed evaluation breakdown