Build Tools

The pipeworks_name_generation project includes build-time tools for analyzing and extracting phonetic patterns from text. These tools are used to prepare syllable data for the name generator.

Important: These are build-time tools only - they are not used during runtime name generation.

Tool Overview

Tool	Description
Pyphen Syllable Extractor	Dictionary-based syllable extraction using pyphen (LibreOffice dictionaries)
NLTK Syllable Extractor	Phonetically-guided syllable extraction using NLTK CMUDict with onset/coda principles
Pyphen Syllable Normaliser	3-step normalization pipeline for pyphen extractor output
NLTK Syllable Normaliser	NLTK-specific normalization with fragment cleaning for phonetically coherent syllables
Syllable Feature Annotator	Phonetic feature detection (onset, nucleus, coda features)
Syllable Walker	Explore phonetic feature space via cost-based random walks
Corpus Database	Build provenance ledger for tracking extraction runs (inputs, outputs, settings)
Corpus Database Viewer	Interactive TUI for viewing corpus database provenance records
Analysis Tools	Post-annotation analysis (feature signatures, t-SNE visualization, random sampling)

Quick Start

# Extract syllables from text (choose one extractor)

# Option 1: pyphen extractor (40+ languages, typographic splits)
python -m build_tools.pyphen_syllable_extractor --file input.txt --auto

# Option 2: NLTK extractor (English only, phonetic splits)
python -m build_tools.nltk_syllable_extractor --file input.txt

# Normalize extracted syllables (both use in-place processing)

# For pyphen extractor output:
python -m build_tools.pyphen_syllable_normaliser --run-dir _working/output/20260110_143022_pyphen/

# For NLTK extractor output:
python -m build_tools.nltk_syllable_normaliser --run-dir _working/output/20260110_095213_nltk/

# Annotate syllables with phonetic features
python -m build_tools.syllable_feature_annotator

# Explore syllable walks (interactive)
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json --web

# Analyze and visualize
python -m build_tools.syllable_analysis.tsne_visualizer --interactive

Detailed Documentation