Build Tools

The pipeworks_name_generation project includes build-time tools for analyzing and extracting phonetic patterns from text. These tools are used to prepare syllable data for the name generator.

Important: These are build-time tools only - they are not used during runtime name generation.

Tool Overview

Tool

Description

Pyphen Syllable Extractor

Dictionary-based syllable extraction using pyphen (LibreOffice dictionaries)

NLTK Syllable Extractor

Phonetically-guided syllable extraction using NLTK CMUDict with onset/coda principles

Pyphen Syllable Normaliser

3-step normalization pipeline for pyphen extractor output

NLTK Syllable Normaliser

NLTK-specific normalization with fragment cleaning for phonetically coherent syllables

Syllable Feature Annotator

Phonetic feature detection (onset, nucleus, coda features)

Syllable Walker

Explore phonetic feature space via cost-based random walks

Corpus Database

Build provenance ledger for tracking extraction runs (inputs, outputs, settings)

Corpus Database Viewer

Interactive TUI for viewing corpus database provenance records

Analysis Tools

Post-annotation analysis (feature signatures, t-SNE visualization, random sampling)

Quick Start

# Extract syllables from text (choose one extractor)

# Option 1: pyphen extractor (40+ languages, typographic splits)
python -m build_tools.pyphen_syllable_extractor --file input.txt --auto

# Option 2: NLTK extractor (English only, phonetic splits)
python -m build_tools.nltk_syllable_extractor --file input.txt

# Normalize extracted syllables (both use in-place processing)

# For pyphen extractor output:
python -m build_tools.pyphen_syllable_normaliser --run-dir _working/output/20260110_143022_pyphen/

# For NLTK extractor output:
python -m build_tools.nltk_syllable_normaliser --run-dir _working/output/20260110_095213_nltk/

# Annotate syllables with phonetic features
python -m build_tools.syllable_feature_annotator

# Explore syllable walks (interactive)
python -m build_tools.syllable_walk data/annotated/syllables_annotated.json --web

# Analyze and visualize
python -m build_tools.syllable_analysis.tsne_visualizer --interactive

Detailed Documentation