build_tools.nltk_syllable_extractor
===================================

.. py:module:: build_tools.nltk_syllable_extractor

.. autoapi-nested-parse::

   NLTK Syllable Extractor - Phonetically-Guided Syllable Extraction

   The NLTK syllable extractor uses CMU Pronouncing Dictionary (via cmudict pip package)
   with onset/coda principles for phonetically-guided orthographic syllabification.
   This is a **build-time tool only** - not used during runtime name generation.

   The tool supports two modes:

   - **Interactive Mode** - Guided prompts for single-file processing
   - **Batch Mode** - Automated processing of multiple files via command-line arguments

   Features:

   - Phonetically-guided syllabification using CMU Pronouncing Dictionary (via cmudict package)
   - Onset/coda principles for natural consonant cluster splitting
   - English only (CMUDict limitation)
   - Preserves all syllables including duplicates (extraction only, no filtering)
   - Configurable syllable length constraints (defaults to no filtering)
   - Deterministic extraction (same input = same output)
   - Unicode support
   - Comprehensive metadata and statistics
   - Automatic provenance tracking via corpus_db ledger (batch mode)

   Key Differences from pyphen Extractor:

   - Uses phonetic information (CMUDict) rather than typographic hyphenation rules
   - Respects phonotactic constraints via onset/coda principles
   - Produces more "natural" phonetic splits (e.g., "Andrew" → "An-drew" not "And-rew")
   - English only vs pyphen's 40+ languages
   - Complementary tool, not a replacement

   Main Components:

   - NltkSyllableExtractor: Core extraction class
   - ExtractionResult: Data model for extraction results
   - FileProcessingResult: Result for single file in batch mode
   - BatchResult: Aggregate results for batch processing

   Usage:
       >>> from pathlib import Path
       >>> from build_tools.nltk_syllable_extractor import NltkSyllableExtractor
       >>>
       >>> # Initialize extractor for English (defaults to no length filtering)
       >>> extractor = NltkSyllableExtractor('en_US')
       >>>
       >>> # Extract syllables from text (preserves duplicates)
       >>> syllables, stats = extractor.extract_syllables_from_text("Hello wonderful world")
       >>> print(syllables)  # Note: includes all syllables with duplicates
       ['hel', 'lo', 'won', 'der', 'ful', 'world']
       >>> print(f"Total: {len(syllables)}, Unique: {len(set(syllables))}")
       Total: 6, Unique: 6
       >>>
       >>> # Extract from a file
       >>> syllables, stats = extractor.extract_syllables_from_file(Path('input.txt'))
       >>>
       >>> # Save results (preserves duplicates)
       >>> extractor.save_syllables(syllables, Path('output.txt'))

   CLI Usage:

       .. code-block:: bash

          # Interactive mode
          python -m build_tools.nltk_syllable_extractor

          # Single file
          python -m build_tools.nltk_syllable_extractor --file input.txt

          # Batch processing
          python -m build_tools.nltk_syllable_extractor --source ~/docs/ --recursive


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/build_tools/nltk_syllable_extractor/batch/index
   /autoapi/build_tools/nltk_syllable_extractor/cli/index
   /autoapi/build_tools/nltk_syllable_extractor/extractor/index
   /autoapi/build_tools/nltk_syllable_extractor/file_io/index
   /autoapi/build_tools/nltk_syllable_extractor/interactive/index
   /autoapi/build_tools/nltk_syllable_extractor/models/index


Attributes
----------

.. autoapisummary::

   build_tools.nltk_syllable_extractor.main_interactive
   build_tools.nltk_syllable_extractor.main_batch
   build_tools.nltk_syllable_extractor.process_single_file_batch


Package Contents
----------------

.. py:data:: main_interactive

.. py:data:: main_batch

.. py:data:: process_single_file_batch