Build Tools
The pipeworks_name_generation project includes build-time tools for analyzing and extracting phonetic patterns from text. These tools are used to prepare syllable data for the name generator.
Important: These are build-time tools only - they are not used during runtime name generation.
Tool Overview
Tool |
Description |
|---|---|
Dictionary-based syllable extraction using pyphen (LibreOffice dictionaries) |
|
Phonetically-guided syllable extraction using NLTK CMUDict with onset/coda principles |
|
3-step normalization pipeline for pyphen extractor output |
|
NLTK-specific normalization with fragment cleaning for phonetically coherent syllables |
|
Phonetic feature detection (onset, nucleus, coda features) |
|
Generate N-syllable name candidates with feature aggregation |
|
Filter and rank candidates against name class policies |
|
Convert annotated JSON to SQLite databases for fast Walker loading (TUI + web; optional performance optimization) |
|
Explore phonetic feature space via cost-based random walks (CLI) |
|
Combined Pipeline + Walker web interface with dual-patch corpus comparison |
|
Interactive TUI for exploring phonetic space with side-by-side patch configuration |
|
Build provenance ledger for tracking extraction runs (inputs, outputs, settings) |
|
Interactive TUI for viewing corpus database provenance records |
|
Post-annotation analysis (feature signatures, t-SNE visualization, random sampling) |
|
Shared TUI components (controls, browsers, keybinding config) for Textual-based tools |
|
Interactive TUI for running extraction, normalization, and annotation pipelines |
Quick Start
# Extract syllables from text (choose one extractor)
# Option 1: pyphen extractor (40+ languages, typographic splits)
python -m build_tools.pyphen_syllable_extractor --file input.txt --auto
# Option 2: NLTK extractor (English only, phonetic splits)
python -m build_tools.nltk_syllable_extractor --file input.txt
# Normalize extracted syllables (both use in-place processing)
# For pyphen extractor output:
python -m build_tools.pyphen_syllable_normaliser --run-dir _working/output/20260110_143022_pyphen/
# For NLTK extractor output:
python -m build_tools.nltk_syllable_normaliser --run-dir _working/output/20260110_095213_nltk/
# Annotate syllables with phonetic features
python -m build_tools.syllable_feature_annotator
# Generate name candidates
python -m build_tools.name_combiner \
--run-dir _working/output/20260110_143022_pyphen/ \
--syllables 2 --count 10000
# Select names for a class
python -m build_tools.name_selector \
--run-dir _working/output/20260110_143022_pyphen/ \
--candidates candidates/pyphen_candidates_2syl.json \
--name-class first_name
# (Optional) Convert to SQLite for faster TUI loading
python -m build_tools.corpus_sqlite_builder _working/output/20260110_143022_pyphen/
# Explore syllable walks (choose one interface)
python -m build_tools.syllable_walk_web # Browser-based Pipeline + Walker interface
python -m build_tools.syllable_walk_tui # Terminal TUI with side-by-side comparison
# Analyze and visualize
python -m build_tools.syllable_analysis.tsne_visualizer --interactive
Detailed Documentation
- Pyphen Syllable Extractor
- NLTK Syllable Extractor
- Pyphen Syllable Normaliser
- NLTK Syllable Normaliser
- Syllable Feature Annotator
- Name Combiner
- Name Selector
- Corpus SQLite Builder
- Syllable Walker
- Syllable Walker Web
- Syllable Walker TUI
- Corpus Database
- Corpus Database Viewer
- Analysis Tools
- TUI Common Components
- Pipeline TUI