Build Tools

The pipeworks_name_generation project includes build-time tools for analyzing and extracting phonetic patterns from text. These tools are used to prepare syllable data for the name generator.

Important: These are build-time tools only - they are not used during runtime name generation.

Tool Overview

Tool

Description

Pyphen Syllable Extractor

Dictionary-based syllable extraction using pyphen (LibreOffice dictionaries)

NLTK Syllable Extractor

Phonetically-guided syllable extraction using NLTK CMUDict with onset/coda principles

Pyphen Syllable Normaliser

3-step normalization pipeline for pyphen extractor output

NLTK Syllable Normaliser

NLTK-specific normalization with fragment cleaning for phonetically coherent syllables

Syllable Feature Annotator

Phonetic feature detection (onset, nucleus, coda features)

Name Combiner

Generate N-syllable name candidates with feature aggregation

Name Selector

Filter and rank candidates against name class policies

Corpus SQLite Builder

Convert annotated JSON to SQLite databases for fast Walker loading (TUI + web; optional performance optimization)

Syllable Walker

Explore phonetic feature space via cost-based random walks (CLI)

Syllable Walker Web

Combined Pipeline + Walker web interface with dual-patch corpus comparison

Syllable Walker TUI

Interactive TUI for exploring phonetic space with side-by-side patch configuration

Corpus Database

Build provenance ledger for tracking extraction runs (inputs, outputs, settings)

Corpus Database Viewer

Interactive TUI for viewing corpus database provenance records

Analysis Tools

Post-annotation analysis (feature signatures, t-SNE visualization, random sampling)

TUI Common Components

Shared TUI components (controls, browsers, keybinding config) for Textual-based tools

Pipeline TUI

Interactive TUI for running extraction, normalization, and annotation pipelines

Quick Start

# Extract syllables from text (choose one extractor)

# Option 1: pyphen extractor (40+ languages, typographic splits)
python -m build_tools.pyphen_syllable_extractor --file input.txt --auto

# Option 2: NLTK extractor (English only, phonetic splits)
python -m build_tools.nltk_syllable_extractor --file input.txt

# Normalize extracted syllables (both use in-place processing)

# For pyphen extractor output:
python -m build_tools.pyphen_syllable_normaliser --run-dir _working/output/20260110_143022_pyphen/

# For NLTK extractor output:
python -m build_tools.nltk_syllable_normaliser --run-dir _working/output/20260110_095213_nltk/

# Annotate syllables with phonetic features
python -m build_tools.syllable_feature_annotator

# Generate name candidates
python -m build_tools.name_combiner \
    --run-dir _working/output/20260110_143022_pyphen/ \
    --syllables 2 --count 10000

# Select names for a class
python -m build_tools.name_selector \
    --run-dir _working/output/20260110_143022_pyphen/ \
    --candidates candidates/pyphen_candidates_2syl.json \
    --name-class first_name

# (Optional) Convert to SQLite for faster TUI loading
python -m build_tools.corpus_sqlite_builder _working/output/20260110_143022_pyphen/

# Explore syllable walks (choose one interface)
python -m build_tools.syllable_walk_web      # Browser-based Pipeline + Walker interface
python -m build_tools.syllable_walk_tui       # Terminal TUI with side-by-side comparison

# Analyze and visualize
python -m build_tools.syllable_analysis.tsne_visualizer --interactive

Detailed Documentation