Name Selector

Overview

Name Selector - Policy-Based Name Filtering and Ranking

Evaluates name candidates against name class policies to produce ranked, admissible name lists. This is a build-time tool only - not used during runtime name generation.

This module is the second stage of the Selection Policy Layer. It performs policy evaluation on candidates produced by the name_combiner module.

Architectural Boundary:

The selector is the governance layer. All admissibility decisions, scoring, and rejection logic live here. The combiner upstream is purely structural.

Features: - Load name class policies from YAML configuration - Evaluate candidates against 12-feature policies - Hard mode (reject on discouraged) or soft mode (negative score) - Ranked output by score - Detailed evaluation metadata for debugging

Policy Logic: - Preferred feature present: +1 score - Tolerated feature present: 0 score - Discouraged feature present: Reject (hard) or -10 (soft)

Usage:
>>> from build_tools.name_selector import select_names, load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> selected = select_names(candidates, policies["first_name"], count=100)
>>> for name in selected[:5]:
...     print(f"{name['name']}: score={name['score']}")

CLI:

python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --count 100

Command-Line Interface

Filter and rank name candidates against a name class policy. Evaluates candidates using the 12-feature policy matrix and produces ranked, admissible name lists. This is a build-time tool for the Selection Policy Layer.

usage: python -m build_tools.name_selector [-h] --run-dir RUN_DIR --candidates
                                           CANDIDATES --name-class NAME_CLASS
                                           [--policy-file POLICY_FILE]
                                           [--count COUNT]
                                           [--mode {hard,soft}]

Named Arguments

--run-dir

Path to extraction run directory. Example: _working/output/20260110_115453_pyphen/

--candidates

Path to candidates JSON file, relative to run-dir. If the wrong prefix is specified (e.g., nltk_ for a pyphen run), the correct file will be auto-detected. Example: candidates/pyphen_candidates_2syl.json

--name-class

Name class identifier from name_classes.yml. Examples: first_name, last_name, place_name

--policy-file

Path to name_classes.yml. If not specified, uses data/name_classes.yml from project root. Default: data/name_classes.yml

--count

Maximum number of names to output. Default: 100.

Default: 100

--mode

Possible choices: hard, soft

Evaluation mode. ‘hard’ rejects candidates with discouraged features. ‘soft’ applies -10 penalty instead. Default: hard.

Default: 'hard'

Examples:

# Select first names from 2-syllable candidates
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --count 100

# Select place names with soft mode (penalties instead of rejection)
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_3syl.json \
    --name-class place_name \
    --mode soft

# Use a custom policy file
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --policy-file custom_policies.yml
Output:

Creates selections/{prefix}_{name_class}_{N}syl.json in the run directory. The prefix and syllable count are extracted from the candidates filename.

Output Format

Input/Output Contract

Inputs:

  • <run_directory>/candidates/{prefix}_candidates_{N}syl.json - From name_combiner

  • data/name_classes.yml - Policy configuration (or custom path)

Output:

  • <run_directory>/selections/{prefix}_{name_class}_{N}syl.json

Example directory structure after selection:

_working/output/20260110_115453_pyphen/
├── candidates/
│   └── pyphen_candidates_2syl.json      ← Input
├── selections/
│   ├── pyphen_first_name_2syl.json      ← Generated output
│   ├── pyphen_last_name_2syl.json
│   ├── pyphen_place_name_2syl.json
│   ├── pyphen_location_name_2syl.json
│   ├── pyphen_object_item_2syl.json
│   ├── pyphen_organisation_2syl.json
│   └── pyphen_title_epithet_2syl.json
├── data/
├── meta/
└── ...

Available Name Classes

The default policy file (data/name_classes.yml) defines these name classes:

Name Class

Optimization

Syllables

Key Constraints

first_name

Addressability

2-3

Prefers vowel endings, avoids heavy clusters

last_name

Durability

2-3

Prefers stop endings, avoids vowel endings

place_name

Stability

2-4

Prefers clusters, vowel endings

location_name

Meaning Compression

1-3

Prefers heavy clusters, all texture features

object_item

Distinction

1-2

Prefers short vowels, stop endings

organisation

Cadence

2-4

All texture features, long vowels, nasal/stop endings

title_epithet

Authority

1-2

Heavy clusters, long vowels, avoids short vowels

Output Structure

The selector produces JSON with this structure:

{
  "metadata": {
    "source_candidates": "pyphen_candidates_2syl.json",
    "name_class": "first_name",
    "policy_description": "Direct social address...",
    "policy_file": "data/name_classes.yml",
    "mode": "hard",
    "order": "alphabetical",
    "seed": 42,
    "total_evaluated": 10000,
    "admitted": 7420,
    "rejected": 2580,
    "rejection_reasons": {
      "ends_with_stop": 2580
    },
    "score_distribution": {
      "0": 5000,
      "1": 2000,
      "2": 420
    },
    "output_count": 100,
    "generated_at": "2026-01-10T12:00:00Z"
  },
  "selections": [
    {
      "name": "kali",
      "syllables": ["ka", "li"],
      "features": {...},
      "score": 2,
      "rank": 1,
      "evaluation": {
        "preferred_hits": ["ends_with_vowel", "contains_liquid"],
        "tolerated_hits": [],
        "discouraged_hits": [],
        "rejection_reason": null
      }
    }
  ]
}

Policy Configuration

Policies are defined in YAML with the following structure:

version: "1.0"
name_classes:
  first_name:
    description: "Direct social address. Optimized for addressability."
    syllable_range: [2, 3]
    features:
      starts_with_vowel: preferred
      ends_with_vowel: preferred
      ends_with_stop: discouraged
      contains_liquid: preferred
      # ... all 12 features

Policy values:

  • preferred: +1 score when feature is present

  • tolerated: 0 score (neutral)

  • discouraged: Reject (hard mode) or -10 score (soft mode)

Integration Guide

The name selector is the governance layer of the Selection Policy Layer. It evaluates candidates produced by the name_combiner against name class policies.

Typical workflow:

# Generate candidates first
python -m build_tools.name_combiner \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --syllables 2 \
  --count 10000

# Select for different name classes
python -m build_tools.name_selector \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --candidates candidates/pyphen_candidates_2syl.json \
  --name-class first_name \
  --count 100

python -m build_tools.name_selector \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --candidates candidates/pyphen_candidates_2syl.json \
  --name-class last_name \
  --count 100

# Select for other name classes as needed
python -m build_tools.name_selector \
  --run-dir _working/output/20260110_115453_pyphen/ \
  --candidates candidates/pyphen_candidates_2syl.json \
  --name-class organisation \
  --count 50

When to use this tool:

  • After generating candidates with name_combiner

  • When you need filtered, ranked name lists per class

  • For generating production-ready name pools

  • To analyze policy effectiveness via statistics output

Evaluation modes:

  • hard (default): Candidates with discouraged features are rejected entirely

  • soft: Candidates with discouraged features receive -10 penalty instead of rejection

Ordering modes:

  • alphabetical (default): Names with equal scores are sorted alphabetically for deterministic output

  • random: Names with equal scores are shuffled within score groups using a seeded RNG for variety while maintaining determinism

Notes

Scoring:

  • Preferred features: +1 each

  • Tolerated features: 0

  • Discouraged features: Reject (hard) or -10 (soft)

Names are ranked by total score (descending). Tiebreaking for equal scores can be:

  • Alphabetical (default): Deterministic ordering by name for reproducibility

  • Random: Shuffled within score groups using a seed for variety while maintaining determinism

Syllable count filtering:

The selector filters by syllable count from the policy’s syllable_range before scoring. Candidates outside the range are excluded regardless of feature scores.

Statistics output:

The CLI displays rejection statistics to help tune policies:

Evaluated: 10,000
Admitted: 7,420 (74.2%)
Rejected: 2,580
Rejection reasons:
  ends_with_stop: 2,580

Build-time tool:

This is a build-time tool only - not used during runtime name generation.

API Reference

Name Selector - Policy-Based Name Filtering and Ranking

Evaluates name candidates against name class policies to produce ranked, admissible name lists. This is a build-time tool only - not used during runtime name generation.

This module is the second stage of the Selection Policy Layer. It performs policy evaluation on candidates produced by the name_combiner module.

Architectural Boundary:

The selector is the governance layer. All admissibility decisions, scoring, and rejection logic live here. The combiner upstream is purely structural.

Features: - Load name class policies from YAML configuration - Evaluate candidates against 12-feature policies - Hard mode (reject on discouraged) or soft mode (negative score) - Ranked output by score - Detailed evaluation metadata for debugging

Policy Logic: - Preferred feature present: +1 score - Tolerated feature present: 0 score - Discouraged feature present: Reject (hard) or -10 (soft)

Usage:
>>> from build_tools.name_selector import select_names, load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> selected = select_names(candidates, policies["first_name"], count=100)
>>> for name in selected[:5]:
...     print(f"{name['name']}: score={name['score']}")

CLI:

python -m build_tools.name_selector \
    --run-dir _working/output/20260110_115453_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name \
    --count 100
class build_tools.name_selector.NameClassPolicy(name, description, syllable_range, features=<factory>)[source]

Bases: object

Policy configuration for a single name class.

Defines feature preferences for evaluating name candidates. Policies are loaded from YAML and remain immutable during evaluation.

Attributes

namestr

Identifier for this name class (e.g., “first_name”, “place_name”).

descriptionstr

Human-readable description of the name class purpose.

syllable_rangetuple[int, int]

Allowed syllable count range [min, max], inclusive.

featuresdict[str, PolicyValue]

Mapping of feature name to policy value (“preferred”, “tolerated”, “discouraged”).

Examples

>>> policy = NameClassPolicy(
...     name="first_name",
...     description="Direct social address.",
...     syllable_range=(2, 3),
...     features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> policy.features["ends_with_vowel"]
'preferred'
__post_init__()[source]

Validate policy configuration.

Return type:

None

description: str
features: dict[str, Literal['preferred', 'tolerated', 'discouraged']]
name: str
syllable_range: tuple[int, int]
build_tools.name_selector.evaluate_candidate(candidate, policy, mode='hard')[source]

Evaluate a name candidate against a name class policy.

Scores the candidate based on which of its TRUE features match preferred, tolerated, or discouraged designations in the policy.

Return type:

tuple[bool, int, dict]

Parameters

candidatedict

Candidate dictionary with “name”, “features”, and optionally “syllables”. Features must be a dict[str, bool].

policyNameClassPolicy

The policy to evaluate against.

mode{“hard”, “soft”}, optional

Evaluation mode. “hard” rejects on any discouraged feature. “soft” applies a -10 penalty instead. Default: “hard”.

Returns

tuple[bool, int, dict]
  • admitted: True if candidate passes policy, False if rejected

  • score: Numeric score (higher is better)

  • details: Evaluation details for debugging

Details dict structure:
  • preferred_hits: list[str] - Preferred features that are TRUE

  • tolerated_hits: list[str] - Tolerated features that are TRUE

  • discouraged_hits: list[str] - Discouraged features that are TRUE

  • rejection_reason: str | None - Reason for rejection (if any)

Examples

>>> # Candidate with preferred feature
>>> candidate = {"name": "kali", "features": {"ends_with_vowel": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy)
>>> admitted, score
(True, 1)
>>> # Candidate with discouraged feature (hard mode)
>>> candidate = {"name": "kalt", "features": {"ends_with_stop": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
False
>>> details["rejection_reason"]
'ends_with_stop'

Notes

Only TRUE features are evaluated. If a feature is FALSE in the candidate, it does not contribute to the score regardless of its policy designation.

This means “discouraged” means “discouraged when present”, not “required to be absent”.

build_tools.name_selector.load_name_classes(yaml_path)[source]

Load name class policies from a YAML file.

Return type:

dict[str, NameClassPolicy]

Parameters

yaml_pathstr | Path

Path to the name_classes.yml file.

Returns

dict[str, NameClassPolicy]

Dictionary mapping name class identifiers to their policies.

Raises

FileNotFoundError

If the YAML file does not exist.

ValueError

If the YAML structure is invalid or policies fail validation.

Examples

>>> policies = load_name_classes("data/name_classes.yml")
>>> "first_name" in policies
True
>>> policies["first_name"].syllable_range
(2, 3)
build_tools.name_selector.select_names(candidates, policy, count=100, mode='hard', order='alphabetical', seed=None)[source]

Select and rank name candidates against a policy.

Evaluates all candidates, filters out rejected ones, ranks by score, and returns the top N.

Return type:

list[dict]

Parameters

candidatesSequence[dict]

List of candidate dictionaries from name_combiner output. Each must have “name”, “syllables”, and “features” keys.

policyNameClassPolicy

The policy to evaluate against.

countint, optional

Maximum number of names to return. Default: 100.

mode{“hard”, “soft”}, optional

Evaluation mode. “hard” rejects on discouraged features. “soft” applies penalties. Default: “hard”.

order{“alphabetical”, “random”}, optional

Ordering for names with equal scores. “alphabetical” sorts by name for deterministic output. “random” shuffles within score groups using the provided seed. Default: “alphabetical”.

seedint, optional

RNG seed for random ordering. Only used when order=”random”. Required for deterministic random ordering. Default: None.

Returns

list[dict]

List of selected candidates, sorted by score (descending). Each candidate is augmented with “score”, “rank”, and “evaluation”.

Examples

>>> selected = select_names(candidates, policy, count=50)
>>> selected[0]["rank"]
1
>>> selected[0]["score"]  # Highest score
4
>>> len(selected)
50

Notes

The returned candidates are augmented with: - score: int - The policy score - rank: int - 1-based rank (1 = best) - evaluation: dict - Detailed evaluation breakdown

Name class policy data models and YAML loading.

This module defines the dataclasses for representing name class policies and provides functions to load them from YAML configuration files.

The Name Class Matrix is externalized to data/name_classes.yml, separating policy configuration from code. This enables: - Non-programmers to tune name classes - Version control tracking of policy evolution - Multiple projects sharing the codebase with different policies

Policy Structure

Each name class defines: - description: Human-readable purpose - syllable_range: [min, max] syllables (inclusive) - features: Dict mapping feature names to policy values

Policy values: - “preferred”: Actively sought (+1 score) - “tolerated”: Neutral (0 score) - “discouraged”: Rejected or penalized

Usage

>>> from build_tools.name_selector.name_class import load_name_classes
>>> policies = load_name_classes("data/name_classes.yml")
>>> first_name_policy = policies["first_name"]
>>> first_name_policy.description
'Direct social address. Optimized for addressability and mouth-feel.'
>>> first_name_policy.features["ends_with_vowel"]
'preferred'
class build_tools.name_selector.name_class.NameClassPolicy(name, description, syllable_range, features=<factory>)[source]

Bases: object

Policy configuration for a single name class.

Defines feature preferences for evaluating name candidates. Policies are loaded from YAML and remain immutable during evaluation.

Attributes

namestr

Identifier for this name class (e.g., “first_name”, “place_name”).

descriptionstr

Human-readable description of the name class purpose.

syllable_rangetuple[int, int]

Allowed syllable count range [min, max], inclusive.

featuresdict[str, PolicyValue]

Mapping of feature name to policy value (“preferred”, “tolerated”, “discouraged”).

Examples

>>> policy = NameClassPolicy(
...     name="first_name",
...     description="Direct social address.",
...     syllable_range=(2, 3),
...     features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> policy.features["ends_with_vowel"]
'preferred'
__post_init__()[source]

Validate policy configuration.

Return type:

None

description: str
features: dict[str, Literal['preferred', 'tolerated', 'discouraged']]
name: str
syllable_range: tuple[int, int]
build_tools.name_selector.name_class.get_default_policy_path()[source]

Get the default path to name_classes.yml.

Return type:

Path

Returns

Path

Path to data/name_classes.yml relative to project root.

Notes

This assumes the project structure has data/name_classes.yml at the root.

build_tools.name_selector.name_class.load_name_classes(yaml_path)[source]

Load name class policies from a YAML file.

Return type:

dict[str, NameClassPolicy]

Parameters

yaml_pathstr | Path

Path to the name_classes.yml file.

Returns

dict[str, NameClassPolicy]

Dictionary mapping name class identifiers to their policies.

Raises

FileNotFoundError

If the YAML file does not exist.

ValueError

If the YAML structure is invalid or policies fail validation.

Examples

>>> policies = load_name_classes("data/name_classes.yml")
>>> "first_name" in policies
True
>>> policies["first_name"].syllable_range
(2, 3)

Policy evaluation logic for name candidates.

This module contains the core evaluation function that scores a name candidate against a name class policy. It implements the ✓/~/✗ scoring model defined in the Name Class Matrix.

Scoring Model

  • Preferred (✓): Feature present → +1 score

  • Tolerated (~): Feature present → 0 score (neutral)

  • Discouraged (✗): Feature present → Reject (hard) or -10 (soft)

The evaluation considers only features that are TRUE in the candidate. Features that are FALSE do not contribute to the score (absence is neutral).

Evaluation Modes

Hard Mode (default):

Any discouraged feature present causes immediate rejection. The candidate is not scored further.

Soft Mode:

Discouraged features apply a -10 penalty instead of rejection. Useful for exploring edge cases or when flexibility is needed.

Usage

>>> from build_tools.name_selector.policy import evaluate_candidate
>>> from build_tools.name_selector.name_class import NameClassPolicy
>>>
>>> policy = NameClassPolicy(
...     name="first_name",
...     description="Test",
...     syllable_range=(2, 3),
...     features={"ends_with_vowel": "preferred", "ends_with_stop": "discouraged"},
... )
>>> candidate = {
...     "name": "kali",
...     "features": {"ends_with_vowel": True, "ends_with_stop": False},
... }
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
True
>>> score
1
>>> details["preferred_hits"]
['ends_with_vowel']
build_tools.name_selector.policy.check_syllable_count(candidate, policy)[source]

Check if a candidate’s syllable count is within policy range.

Return type:

bool

Parameters

candidatedict

Candidate dictionary with “syllables” key (list of syllable strings).

policyNameClassPolicy

The policy with syllable_range constraint.

Returns

bool

True if syllable count is within range, False otherwise.

Examples

>>> policy = NameClassPolicy(..., syllable_range=(2, 3))
>>> check_syllable_count({"syllables": ["ka", "li"]}, policy)
True
>>> check_syllable_count({"syllables": ["ka"]}, policy)
False
build_tools.name_selector.policy.evaluate_candidate(candidate, policy, mode='hard')[source]

Evaluate a name candidate against a name class policy.

Scores the candidate based on which of its TRUE features match preferred, tolerated, or discouraged designations in the policy.

Return type:

tuple[bool, int, dict]

Parameters

candidatedict

Candidate dictionary with “name”, “features”, and optionally “syllables”. Features must be a dict[str, bool].

policyNameClassPolicy

The policy to evaluate against.

mode{“hard”, “soft”}, optional

Evaluation mode. “hard” rejects on any discouraged feature. “soft” applies a -10 penalty instead. Default: “hard”.

Returns

tuple[bool, int, dict]
  • admitted: True if candidate passes policy, False if rejected

  • score: Numeric score (higher is better)

  • details: Evaluation details for debugging

Details dict structure:
  • preferred_hits: list[str] - Preferred features that are TRUE

  • tolerated_hits: list[str] - Tolerated features that are TRUE

  • discouraged_hits: list[str] - Discouraged features that are TRUE

  • rejection_reason: str | None - Reason for rejection (if any)

Examples

>>> # Candidate with preferred feature
>>> candidate = {"name": "kali", "features": {"ends_with_vowel": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy)
>>> admitted, score
(True, 1)
>>> # Candidate with discouraged feature (hard mode)
>>> candidate = {"name": "kalt", "features": {"ends_with_stop": True}}
>>> admitted, score, details = evaluate_candidate(candidate, policy, mode="hard")
>>> admitted
False
>>> details["rejection_reason"]
'ends_with_stop'

Notes

Only TRUE features are evaluated. If a feature is FALSE in the candidate, it does not contribute to the score regardless of its policy designation.

This means “discouraged” means “discouraged when present”, not “required to be absent”.

Main selector orchestration logic.

This module provides the high-level selection function that coordinates loading candidates, evaluating them against a policy, and producing ranked output.

The selector is the central orchestrator of the Selection Policy Layer. It ties together: - Candidate loading (from name_combiner output) - Policy evaluation (from policy.py) - Result ranking and filtering

Usage

>>> from build_tools.name_selector import select_names, load_name_classes
>>>
>>> # Load policies and candidates
>>> policies = load_name_classes("data/name_classes.yml")
>>> with open("candidates/pyphen_candidates_2syl.json") as f:
...     candidates_data = json.load(f)
>>>
>>> # Select names
>>> selected = select_names(
...     candidates=candidates_data["candidates"],
...     policy=policies["first_name"],
...     count=100,
...     mode="hard",
... )
>>>
>>> for name in selected[:5]:
...     print(f"{name['name']}: score={name['score']}, rank={name['rank']}")
build_tools.name_selector.selector.compute_selection_statistics(candidates, policy, mode='hard')[source]

Compute statistics about a selection operation.

Evaluates all candidates and returns aggregate statistics without building the full result list.

Return type:

dict

Parameters

candidatesSequence[dict]

List of candidate dictionaries.

policyNameClassPolicy

The policy to evaluate against.

mode{“hard”, “soft”}, optional

Evaluation mode. Default: “hard”.

Returns

dict

Statistics dictionary containing: - total_evaluated: int - admitted: int - rejected: int - rejection_reasons: dict[str, int] - score_distribution: dict[int, int] (score -> count)

Examples

>>> stats = compute_selection_statistics(candidates, policy)
>>> stats["admitted"]
2341
>>> stats["rejection_reasons"]["ends_with_stop"]
1234
build_tools.name_selector.selector.select_names(candidates, policy, count=100, mode='hard', order='alphabetical', seed=None)[source]

Select and rank name candidates against a policy.

Evaluates all candidates, filters out rejected ones, ranks by score, and returns the top N.

Return type:

list[dict]

Parameters

candidatesSequence[dict]

List of candidate dictionaries from name_combiner output. Each must have “name”, “syllables”, and “features” keys.

policyNameClassPolicy

The policy to evaluate against.

countint, optional

Maximum number of names to return. Default: 100.

mode{“hard”, “soft”}, optional

Evaluation mode. “hard” rejects on discouraged features. “soft” applies penalties. Default: “hard”.

order{“alphabetical”, “random”}, optional

Ordering for names with equal scores. “alphabetical” sorts by name for deterministic output. “random” shuffles within score groups using the provided seed. Default: “alphabetical”.

seedint, optional

RNG seed for random ordering. Only used when order=”random”. Required for deterministic random ordering. Default: None.

Returns

list[dict]

List of selected candidates, sorted by score (descending). Each candidate is augmented with “score”, “rank”, and “evaluation”.

Examples

>>> selected = select_names(candidates, policy, count=50)
>>> selected[0]["rank"]
1
>>> selected[0]["score"]  # Highest score
4
>>> len(selected)
50

Notes

The returned candidates are augmented with: - score: int - The policy score - rank: int - 1-based rank (1 = best) - evaluation: dict - Detailed evaluation breakdown