build_tools.pyphen_syllable_extractor

Syllable Extractor - Dictionary-Based Syllable Extraction

The syllable extractor uses dictionary-based hyphenation to extract syllables from text files. This is a build-time tool only - not used during runtime name generation.

The tool supports two modes:

  • Interactive Mode - Guided prompts for single-file processing

  • Batch Mode - Automated processing of multiple files via command-line arguments

Features:

  • Dictionary-based hyphenation using pyphen (LibreOffice dictionaries)

  • Support for 40+ languages

  • Automatic language detection (optional, via langdetect)

  • Configurable syllable length constraints

  • Deterministic extraction (same input = same output)

  • Unicode support for accented characters

  • Comprehensive metadata and statistics

  • Automatic provenance tracking via corpus_db ledger (batch mode)

Main Components:

  • SyllableExtractor: Core extraction class

  • ExtractionResult: Data model for extraction results

  • FileProcessingResult: Result for single file in batch mode

  • BatchResult: Aggregate results for batch processing

  • SUPPORTED_LANGUAGES: Dictionary of supported language codes

Usage:
>>> from pathlib import Path
>>> from build_tools.pyphen_syllable_extractor import SyllableExtractor
>>>
>>> # Initialize extractor for English (US)
>>> extractor = SyllableExtractor('en_US', min_syllable_length=2, max_syllable_length=8)
>>>
>>> # Extract syllables from text
>>> syllables = extractor.extract_syllables_from_text("Hello wonderful world")
>>> print(sorted(syllables))
['der', 'ful', 'hel', 'lo', 'won', 'world']
>>>
>>> # Extract from a file
>>> syllables = extractor.extract_syllables_from_file(Path('input.txt'))
>>>
>>> # Save results
>>> extractor.save_syllables(syllables, Path('output.txt'))

CLI Usage:

# Interactive mode
python -m build_tools.pyphen_syllable_extractor

# Single file with specific language
python -m build_tools.pyphen_syllable_extractor --file input.txt --lang en_US

# Batch processing with auto-detection
python -m build_tools.pyphen_syllable_extractor --source ~/docs/ --recursive --auto

Submodules