Build Tools

The pipeworks_name_generation project includes build-time tools for analyzing and extracting phonetic patterns from text. These tools are used to prepare syllable data for the name generator.

Important: These are build-time tools only - they are not used during runtime name generation.

Tool Overview

Tool	Description
Pyphen Syllable Extractor	Dictionary-based syllable extraction using pyphen (LibreOffice dictionaries)
NLTK Syllable Extractor	Phonetically-guided syllable extraction using NLTK CMUDict with onset/coda principles
Pyphen Syllable Normaliser	3-step normalization pipeline for pyphen extractor output
NLTK Syllable Normaliser	NLTK-specific normalization with fragment cleaning for phonetically coherent syllables
Syllable Feature Annotator	Phonetic feature detection (onset, nucleus, coda features)
Name Combiner	Generate N-syllable name candidates with feature aggregation
Name Selector	Filter and rank candidates against name class policies
Corpus SQLite Builder	Convert annotated JSON to SQLite databases for fast Walker loading (TUI + web; optional performance optimization)
Syllable Walker	Explore phonetic feature space via cost-based random walks (CLI)
Syllable Walker Web	Combined Pipeline + Walker web interface with dual-patch corpus comparison
Syllable Walker TUI	Interactive TUI for exploring phonetic space with side-by-side patch configuration
Corpus Database	Build provenance ledger for tracking extraction runs (inputs, outputs, settings)
Corpus Database Viewer	Interactive TUI for viewing corpus database provenance records
Analysis Tools	Post-annotation analysis (feature signatures, t-SNE visualization, random sampling)
TUI Common Components	Shared TUI components (controls, browsers, keybinding config) for Textual-based tools
Pipeline TUI	Interactive TUI for running extraction, normalization, and annotation pipelines

Quick Start

# Extract syllables from text (choose one extractor)

# Option 1: pyphen extractor (40+ languages, typographic splits)
python -m build_tools.pyphen_syllable_extractor --file input.txt --auto

# Option 2: NLTK extractor (English only, phonetic splits)
python -m build_tools.nltk_syllable_extractor --file input.txt

# Normalize extracted syllables (both use in-place processing)

# For pyphen extractor output:
python -m build_tools.pyphen_syllable_normaliser --run-dir _working/output/20260110_143022_pyphen/

# For NLTK extractor output:
python -m build_tools.nltk_syllable_normaliser --run-dir _working/output/20260110_095213_nltk/

# Annotate syllables with phonetic features
python -m build_tools.syllable_feature_annotator

# Generate name candidates
python -m build_tools.name_combiner \
    --run-dir _working/output/20260110_143022_pyphen/ \
    --syllables 2 --count 10000

# Select names for a class
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_143022_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name

# (Optional) Convert to SQLite for faster TUI loading
python -m build_tools.corpus_sqlite_builder _working/output/20260110_143022_pyphen/

# Explore syllable walks (choose one interface)
python -m build_tools.syllable_walk_web      # Browser-based Pipeline + Walker interface
python -m build_tools.syllable_walk_tui       # Terminal TUI with side-by-side comparison

# Analyze and visualize
python -m build_tools.syllable_analysis.tsne_visualizer --interactive

Detailed Documentation