build_tools.nltk_syllable_extractor.cli
Command-line interface for the NLTK-based syllable extractor.
This module provides both interactive and batch processing functionality for syllable extraction using NLTK’s CMU Pronouncing Dictionary.
Attributes
Functions
|
Tab completion function for file paths. |
Configure readline for tab completion with file paths. |
|
|
Get user input with tab completion enabled. |
|
Discover text files in a directory matching the specified pattern. |
|
Process a single file in batch mode with comprehensive error handling. |
|
Process multiple files sequentially in batch mode. |
Create and configure the argument parser for batch mode. |
|
Interactive mode entry point for the NLTK syllable extractor CLI. |
|
|
Batch mode entry point for the NLTK syllable extractor CLI. |
|
Main entry point for the NLTK syllable extractor CLI. |
Module Contents
- build_tools.nltk_syllable_extractor.cli.CORPUS_DB_AVAILABLE = True
- build_tools.nltk_syllable_extractor.cli.EXTRACTOR_VERSION = 'unknown'
- build_tools.nltk_syllable_extractor.cli.READLINE_AVAILABLE = True
- build_tools.nltk_syllable_extractor.cli.path_completer(text, state)[source]
Tab completion function for file paths.
This enables bash-like tab completion for navigating directories and selecting files.
- Parameters:
text – The current text being completed
state – The completion state (0 for first call, incremented for each match)
- Returns:
The next completion match, or None when no more matches
- build_tools.nltk_syllable_extractor.cli.setup_tab_completion()[source]
Configure readline for tab completion with file paths.
This enables: - Tab completion for file and directory names - Tilde (~) expansion for home directory - Standard bash-like completion behavior
- build_tools.nltk_syllable_extractor.cli.input_with_completion(prompt)[source]
Get user input with tab completion enabled.
- build_tools.nltk_syllable_extractor.cli.discover_files(source, pattern='*.txt', recursive=False)[source]
Discover text files in a directory matching the specified pattern.
This function searches for files matching a glob pattern in the specified directory, optionally recursing into subdirectories. Results are sorted alphabetically for deterministic processing order.
- Parameters:
source (pathlib.Path) – Directory to search for files. Must be an existing directory.
pattern (str) – Glob pattern for file matching (default: “.txt”). Examples: “.txt”, “.md”, “data_.csv”
recursive (bool) – If True, search recursively into subdirectories using rglob. If False, search only the top level (default: False).
- Returns:
List of Path objects for matching files, sorted alphabetically. Returns empty list if no files match.
- Raises:
ValueError – If source is not a directory or doesn’t exist.
- Return type:
List[pathlib.Path]
Example
>>> # Find all .txt files in a directory >>> files = discover_files(Path("/data/texts")) >>> print(f"Found {len(files)} files")
>>> # Find all .md files recursively >>> files = discover_files(Path("/data"), pattern="*.md", recursive=True)
- build_tools.nltk_syllable_extractor.cli.process_single_file_batch(input_path, min_len, max_len, output_dir, run_timestamp, verbose=False)[source]
Process a single file in batch mode with comprehensive error handling.
This function attempts to extract syllables from a single file and saves the results. Unlike interactive mode, this function catches all exceptions and returns a result object indicating success or failure, allowing batch processing to continue even when individual files fail.
- Parameters:
input_path (pathlib.Path) – Path to the input text file to process
min_len (int) – Minimum syllable length to include in results
max_len (int) – Maximum syllable length to include in results
output_dir (pathlib.Path) – Directory where output files should be saved
run_timestamp (str) – Timestamp for the batch run (shared across all files in batch)
verbose (bool) – If True, print detailed progress messages (default: False)
- Returns:
FileProcessingResult object with success status, syllables count, output paths (if successful), or error message (if failed).
- Return type:
build_tools.nltk_syllable_extractor.models.FileProcessingResult
Note
This function never raises exceptions. All errors are caught and returned in the FileProcessingResult.error_message field. This design allows batch processing to continue despite individual failures.
Example
>>> timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") >>> result = process_single_file_batch( ... Path("book.txt"), ... min_len=2, ... max_len=8, ... output_dir=Path("output/"), ... run_timestamp=timestamp, ... verbose=True ... ) >>> if result.success: ... print(f"Extracted {result.syllables_count} syllables") ... else: ... print(f"Failed: {result.error_message}")
- build_tools.nltk_syllable_extractor.cli.process_batch(files, min_len, max_len, output_dir, quiet=False, verbose=False)[source]
Process multiple files sequentially in batch mode.
This function processes a list of files one at a time, extracting syllables from each and saving results to the specified output directory. All files in the batch share a single timestamped run directory, grouping them as one logical batch operation.
- Parameters:
files (List[pathlib.Path]) – List of input file paths to process
min_len (int) – Minimum syllable length to include
max_len (int) – Maximum syllable length to include
output_dir (pathlib.Path) – Output directory for all results (created if needed)
quiet (bool) – If True, suppress all output except errors (default: False)
verbose (bool) – If True, show detailed progress for each file (default: False). Ignored if quiet=True.
- Returns:
BatchResult with overall statistics and individual file results.
- Return type:
Example
>>> files = [Path("book1.txt"), Path("book2.txt"), Path("book3.txt")] >>> result = process_batch( ... files, ... min_len=2, ... max_len=8, ... output_dir=Path("output/") ... ) >>> print(f"Processed {result.successful}/{result.total_files} files") >>> print(result.format_summary())
Note
Processing is sequential (not parallel). Files are processed in the order provided in the files list. All outputs share a single run directory identified by the batch start timestamp.
- build_tools.nltk_syllable_extractor.cli.create_argument_parser()[source]
Create and configure the argument parser for batch mode.
This function sets up the argparse parser with all command-line options for batch processing mode.
- Returns:
Configured ArgumentParser instance ready to parse sys.argv.
- Return type:
Example
>>> parser = create_argument_parser() >>> args = parser.parse_args(["--file", "input.txt"]) >>> print(args.file) PosixPath('input.txt')
- build_tools.nltk_syllable_extractor.cli.main_interactive()[source]
Interactive mode entry point for the NLTK syllable extractor CLI.
- Workflow:
Display tool information and CMUDict notice
Configure extraction parameters (min/max syllable length)
Prompt for input file path
Extract syllables using CMUDict + onset/coda principles
Generate timestamped output filenames
Save syllables and metadata to separate files
Display summary to console
- Output Files:
YYYYMMDD_HHMMSS.syllables.en_US.txt: One syllable per line, sorted
YYYYMMDD_HHMMSS.meta.en_US.txt: Extraction metadata and statistics
- Corpus Database Integration:
All interactive mode extractions are automatically recorded to the corpus database ledger for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails.
Both files are saved to _working/output/ by default.
- build_tools.nltk_syllable_extractor.cli.main_batch(args)[source]
Batch mode entry point for the NLTK syllable extractor CLI.
This function processes multiple files based on command-line arguments, providing progress indicators and comprehensive error reporting.
- Parameters:
args (argparse.Namespace) – Parsed command-line arguments from argparse.Namespace containing: - file: Single file path (optional) - files: List of file paths (optional) - source: Directory path for scanning (optional) - pattern: File pattern for directory scanning (default: “*.txt”) - recursive: Whether to scan directories recursively - min: Minimum syllable length (default: 2) - max: Maximum syllable length (default: 8) - output: Output directory (default: _working/output/) - quiet: Suppress progress indicators - verbose: Show detailed processing information
- Corpus Database Integration:
All batch mode extractions are automatically recorded to the corpus database ledger for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails.
- Exit Codes:
0: All files processed successfully 1: One or more files failed to process
- Raises:
SystemExit – On validation errors or processing completion
- build_tools.nltk_syllable_extractor.cli.main()[source]
Main entry point for the NLTK syllable extractor CLI.
This function determines whether to run in interactive or batch mode based on the presence of command-line arguments.
- Modes:
Interactive Mode: No arguments provided. Prompts user for all settings.
Batch Mode: Arguments provided. Processes files based on CLI flags.
Examples
- Interactive mode (no arguments):
$ python -m build_tools.nltk_syllable_extractor
- Batch mode (with arguments):
$ python -m build_tools.nltk_syllable_extractor –file input.txt $ python -m build_tools.nltk_syllable_extractor –files *.txt $ python -m build_tools.nltk_syllable_extractor –source ~/docs/ –recursive