build_tools.nltk_syllable_normaliser.cli ======================================== .. py:module:: build_tools.nltk_syllable_normaliser.cli .. autoapi-nested-parse:: Command-line interface for NLTK syllable normalization pipeline. This module provides the main CLI entry point for the nltk_syllable_normaliser tool, which processes NLTK extractor output with fragment cleaning + normalization pipeline. Functions --------- .. autoapisummary:: build_tools.nltk_syllable_normaliser.cli.detect_nltk_run_directories build_tools.nltk_syllable_normaliser.cli.run_full_pipeline build_tools.nltk_syllable_normaliser.cli.create_argument_parser build_tools.nltk_syllable_normaliser.cli.parse_arguments build_tools.nltk_syllable_normaliser.cli.main Module Contents --------------- .. py:function:: detect_nltk_run_directories(source_dir) Detect NLTK run directories within source directory. Searches for directories matching the pattern YYYYMMDD_HHMMSS_nltk/ which contain a syllables/ subdirectory. :param source_dir: Directory to search for NLTK run directories. :returns: List of Path objects pointing to NLTK run directories, sorted by directory name (chronological order). .. admonition:: Example >>> source = Path("_working/output/") >>> runs = detect_nltk_run_directories(source) >>> for run in runs: ... print(run.name) 20260110_095213_nltk 20260110_143022_nltk .. py:function:: run_full_pipeline(run_directory, config, verbose = False, skip_fragment_cleaning = False) Run complete NLTK normalization pipeline with in-place processing. Executes the full NLTK-specific workflow: 1. Aggregate syllables from run_directory/syllables/*.txt 2. Fragment cleaning (NLTK-specific preprocessing) 3. Canonicalize syllables (Unicode normalization, etc.) 4. Frequency analysis 5. Write 5 output files to run_directory (in-place) :param run_directory: NLTK run directory (e.g., _working/output/20260110_095213_nltk/). :param config: NormalizationConfig specifying normalization parameters. :param verbose: If True, print detailed progress information. :param skip_fragment_cleaning: If True, skip fragment cleaning step (for comparison). :returns: NormalizationResult containing all outputs, statistics, and file paths. :raises FileNotFoundError: If run_directory or syllables/ subdirectory doesn't exist. :raises ValueError: If run_directory is not a directory. .. admonition:: Example >>> from pathlib import Path >>> config = NormalizationConfig(min_length=2, max_length=8) >>> run_dir = Path("_working/output/20260110_095213_nltk/") >>> result = run_full_pipeline( ... run_directory=run_dir, ... config=config, ... verbose=True ... ) >>> result.stats.raw_count 15234 >>> result.stats.unique_canonical 4821 .. py:function:: create_argument_parser() Create and return the argument parser for NLTK syllable normaliser. :returns: Configured ArgumentParser ready to parse command-line arguments. .. py:function:: parse_arguments(args = None) Parse command-line arguments. .. py:function:: main(args = None) Main entry point for CLI. :param args: Command-line arguments (for testing). If None, uses sys.argv. :returns: Exit code (0 for success, 1 for error).