Build Tools
The pipeworks_name_generation project includes build-time tools for analyzing and extracting phonetic patterns from text. These tools are used to prepare syllable data for the name generator.
Important: These are build-time tools only - they are not used during runtime name generation.
Tool Overview
Tool |
Description |
|---|---|
Dictionary-based syllable extraction using pyphen (LibreOffice dictionaries) |
|
Phonetically-guided syllable extraction using NLTK CMUDict with onset/coda principles |
|
3-step normalization pipeline for pyphen extractor output |
|
NLTK-specific normalization with fragment cleaning for phonetically coherent syllables |
|
Phonetic feature detection (onset, nucleus, coda features) |
|
Explore phonetic feature space via cost-based random walks |
|
Build provenance ledger for tracking extraction runs (inputs, outputs, settings) |
|
Interactive TUI for viewing corpus database provenance records |
|
Post-annotation analysis (feature signatures, t-SNE visualization, random sampling) |
Quick Start
# Extract syllables from text (choose one extractor)
# Option 1: pyphen extractor (40+ languages, typographic splits)
python -m build_tools.pyphen_syllable_extractor --file input.txt --auto
# Option 2: NLTK extractor (English only, phonetic splits)
python -m build_tools.nltk_syllable_extractor --file input.txt
# Normalize extracted syllables (both use in-place processing)
# For pyphen extractor output:
python -m build_tools.pyphen_syllable_normaliser --run-dir _working/output/20260110_143022_pyphen/
# For NLTK extractor output:
python -m build_tools.nltk_syllable_normaliser --run-dir _working/output/20260110_095213_nltk/
# Annotate syllables with phonetic features
python -m build_tools.syllable_feature_annotator
# Explore syllable walks (interactive)
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json --web
# Analyze and visualize
python -m build_tools.syllable_analysis.tsne_visualizer --interactive