build_tools.nltk_syllable_extractor.cli ======================================= .. py:module:: build_tools.nltk_syllable_extractor.cli .. autoapi-nested-parse:: Command-line interface for the NLTK-based syllable extractor. This module provides both interactive and batch processing functionality for syllable extraction using NLTK's CMU Pronouncing Dictionary. Attributes ---------- .. autoapisummary:: build_tools.nltk_syllable_extractor.cli.CORPUS_DB_AVAILABLE build_tools.nltk_syllable_extractor.cli.EXTRACTOR_VERSION build_tools.nltk_syllable_extractor.cli.READLINE_AVAILABLE Functions --------- .. autoapisummary:: build_tools.nltk_syllable_extractor.cli.path_completer build_tools.nltk_syllable_extractor.cli.setup_tab_completion build_tools.nltk_syllable_extractor.cli.input_with_completion build_tools.nltk_syllable_extractor.cli.discover_files build_tools.nltk_syllable_extractor.cli.process_single_file_batch build_tools.nltk_syllable_extractor.cli.process_batch build_tools.nltk_syllable_extractor.cli.create_argument_parser build_tools.nltk_syllable_extractor.cli.main_interactive build_tools.nltk_syllable_extractor.cli.main_batch build_tools.nltk_syllable_extractor.cli.main Module Contents --------------- .. py:data:: CORPUS_DB_AVAILABLE :value: True .. py:data:: EXTRACTOR_VERSION :value: 'unknown' .. py:data:: READLINE_AVAILABLE :value: True .. py:function:: path_completer(text, state) Tab completion function for file paths. This enables bash-like tab completion for navigating directories and selecting files. :param text: The current text being completed :param state: The completion state (0 for first call, incremented for each match) :returns: The next completion match, or None when no more matches .. py:function:: setup_tab_completion() Configure readline for tab completion with file paths. This enables: - Tab completion for file and directory names - Tilde (~) expansion for home directory - Standard bash-like completion behavior .. py:function:: input_with_completion(prompt) Get user input with tab completion enabled. :param prompt: The prompt to display :returns: User input string .. py:function:: discover_files(source, pattern = '*.txt', recursive = False) Discover text files in a directory matching the specified pattern. This function searches for files matching a glob pattern in the specified directory, optionally recursing into subdirectories. Results are sorted alphabetically for deterministic processing order. :param source: Directory to search for files. Must be an existing directory. :param pattern: Glob pattern for file matching (default: "*.txt"). Examples: "*.txt", "*.md", "data_*.csv" :param recursive: If True, search recursively into subdirectories using rglob. If False, search only the top level (default: False). :returns: List of Path objects for matching files, sorted alphabetically. Returns empty list if no files match. :raises ValueError: If source is not a directory or doesn't exist. .. admonition:: Example >>> # Find all .txt files in a directory >>> files = discover_files(Path("/data/texts")) >>> print(f"Found {len(files)} files") >>> # Find all .md files recursively >>> files = discover_files(Path("/data"), pattern="*.md", recursive=True) .. py:function:: process_single_file_batch(input_path, min_len, max_len, output_dir, run_timestamp, verbose = False) Process a single file in batch mode with comprehensive error handling. This function attempts to extract syllables from a single file and saves the results. Unlike interactive mode, this function catches all exceptions and returns a result object indicating success or failure, allowing batch processing to continue even when individual files fail. :param input_path: Path to the input text file to process :param min_len: Minimum syllable length to include in results :param max_len: Maximum syllable length to include in results :param output_dir: Directory where output files should be saved :param run_timestamp: Timestamp for the batch run (shared across all files in batch) :param verbose: If True, print detailed progress messages (default: False) :returns: FileProcessingResult object with success status, syllables count, output paths (if successful), or error message (if failed). .. note:: This function never raises exceptions. All errors are caught and returned in the FileProcessingResult.error_message field. This design allows batch processing to continue despite individual failures. .. admonition:: Example >>> timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") >>> result = process_single_file_batch( ... Path("book.txt"), ... min_len=2, ... max_len=8, ... output_dir=Path("output/"), ... run_timestamp=timestamp, ... verbose=True ... ) >>> if result.success: ... print(f"Extracted {result.syllables_count} syllables") ... else: ... print(f"Failed: {result.error_message}") .. py:function:: process_batch(files, min_len, max_len, output_dir, quiet = False, verbose = False) Process multiple files sequentially in batch mode. This function processes a list of files one at a time, extracting syllables from each and saving results to the specified output directory. All files in the batch share a single timestamped run directory, grouping them as one logical batch operation. :param files: List of input file paths to process :param min_len: Minimum syllable length to include :param max_len: Maximum syllable length to include :param output_dir: Output directory for all results (created if needed) :param quiet: If True, suppress all output except errors (default: False) :param verbose: If True, show detailed progress for each file (default: False). Ignored if quiet=True. :returns: BatchResult with overall statistics and individual file results. .. admonition:: Example >>> files = [Path("book1.txt"), Path("book2.txt"), Path("book3.txt")] >>> result = process_batch( ... files, ... min_len=2, ... max_len=8, ... output_dir=Path("output/") ... ) >>> print(f"Processed {result.successful}/{result.total_files} files") >>> print(result.format_summary()) .. note:: Processing is sequential (not parallel). Files are processed in the order provided in the files list. All outputs share a single run directory identified by the batch start timestamp. .. py:function:: create_argument_parser() Create and configure the argument parser for batch mode. This function sets up the argparse parser with all command-line options for batch processing mode. :returns: Configured ArgumentParser instance ready to parse sys.argv. .. admonition:: Example >>> parser = create_argument_parser() >>> args = parser.parse_args(["--file", "input.txt"]) >>> print(args.file) PosixPath('input.txt') .. py:function:: main_interactive() Interactive mode entry point for the NLTK syllable extractor CLI. Workflow: 1. Display tool information and CMUDict notice 2. Configure extraction parameters (min/max syllable length) 3. Prompt for input file path 4. Extract syllables using CMUDict + onset/coda principles 5. Generate timestamped output filenames 6. Save syllables and metadata to separate files 7. Display summary to console Output Files: - YYYYMMDD_HHMMSS.syllables.en_US.txt: One syllable per line, sorted - YYYYMMDD_HHMMSS.meta.en_US.txt: Extraction metadata and statistics Corpus Database Integration: All interactive mode extractions are automatically recorded to the corpus database ledger for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails. Both files are saved to _working/output/ by default. .. py:function:: main_batch(args) Batch mode entry point for the NLTK syllable extractor CLI. This function processes multiple files based on command-line arguments, providing progress indicators and comprehensive error reporting. :param args: Parsed command-line arguments from argparse.Namespace containing: - file: Single file path (optional) - files: List of file paths (optional) - source: Directory path for scanning (optional) - pattern: File pattern for directory scanning (default: "*.txt") - recursive: Whether to scan directories recursively - min: Minimum syllable length (default: 2) - max: Maximum syllable length (default: 8) - output: Output directory (default: _working/output/) - quiet: Suppress progress indicators - verbose: Show detailed processing information Corpus Database Integration: All batch mode extractions are automatically recorded to the corpus database ledger for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails. Exit Codes: 0: All files processed successfully 1: One or more files failed to process :raises SystemExit: On validation errors or processing completion .. py:function:: main() Main entry point for the NLTK syllable extractor CLI. This function determines whether to run in interactive or batch mode based on the presence of command-line arguments. Modes: - Interactive Mode: No arguments provided. Prompts user for all settings. - Batch Mode: Arguments provided. Processes files based on CLI flags. .. admonition:: Examples Interactive mode (no arguments): $ python -m build_tools.nltk_syllable_extractor Batch mode (with arguments): $ python -m build_tools.nltk_syllable_extractor --file input.txt $ python -m build_tools.nltk_syllable_extractor --files *.txt $ python -m build_tools.nltk_syllable_extractor --source ~/docs/ --recursive