build_tools.pyphen_syllable_extractor
Syllable Extractor - Dictionary-Based Syllable Extraction
The syllable extractor uses dictionary-based hyphenation to extract syllables from text files. This is a build-time tool only - not used during runtime name generation.
The tool supports two modes:
Interactive Mode - Guided prompts for single-file processing
Batch Mode - Automated processing of multiple files via command-line arguments
Features:
Dictionary-based hyphenation using pyphen (LibreOffice dictionaries)
Support for 40+ languages
Automatic language detection (optional, via langdetect)
Configurable syllable length constraints
Deterministic extraction (same input = same output)
Unicode support for accented characters
Comprehensive metadata and statistics
Automatic provenance tracking via corpus_db ledger (batch mode)
Main Components:
SyllableExtractor: Core extraction class
ExtractionResult: Data model for extraction results
FileProcessingResult: Result for single file in batch mode
BatchResult: Aggregate results for batch processing
SUPPORTED_LANGUAGES: Dictionary of supported language codes
- Usage:
>>> from pathlib import Path >>> from build_tools.pyphen_syllable_extractor import SyllableExtractor >>> >>> # Initialize extractor for English (US) >>> extractor = SyllableExtractor('en_US', min_syllable_length=2, max_syllable_length=8) >>> >>> # Extract syllables from text >>> syllables = extractor.extract_syllables_from_text("Hello wonderful world") >>> print(sorted(syllables)) ['der', 'ful', 'hel', 'lo', 'won', 'world'] >>> >>> # Extract from a file >>> syllables = extractor.extract_syllables_from_file(Path('input.txt')) >>> >>> # Save results >>> extractor.save_syllables(syllables, Path('output.txt'))
CLI Usage:
# Interactive mode python -m build_tools.pyphen_syllable_extractor # Single file with specific language python -m build_tools.pyphen_syllable_extractor --file input.txt --lang en_US # Batch processing with auto-detection python -m build_tools.pyphen_syllable_extractor --source ~/docs/ --recursive --auto
Submodules
- build_tools.pyphen_syllable_extractor.batch
- build_tools.pyphen_syllable_extractor.cli
- build_tools.pyphen_syllable_extractor.extractor
- build_tools.pyphen_syllable_extractor.file_io
- build_tools.pyphen_syllable_extractor.interactive
- build_tools.pyphen_syllable_extractor.language_detection
- build_tools.pyphen_syllable_extractor.languages
- build_tools.pyphen_syllable_extractor.models
Attributes
Package Contents
- build_tools.pyphen_syllable_extractor.main_interactive
- build_tools.pyphen_syllable_extractor.main_batch
- build_tools.pyphen_syllable_extractor.process_single_file_batch