build_tools.nltk_syllable_extractor.cli

Command-line interface for the NLTK-based syllable extractor.

This module provides both interactive and batch processing functionality for syllable extraction using NLTK’s CMU Pronouncing Dictionary.

Attributes

CORPUS_DB_AVAILABLE

EXTRACTOR_VERSION

READLINE_AVAILABLE

Functions

path_completer(text, state)

Tab completion function for file paths.

setup_tab_completion()

Configure readline for tab completion with file paths.

input_with_completion(prompt)

Get user input with tab completion enabled.

discover_files(source[, pattern, recursive])

Discover text files in a directory matching the specified pattern.

process_single_file_batch(input_path, min_len, ...[, ...])

Process a single file in batch mode with comprehensive error handling.

process_batch(files, min_len, max_len, output_dir[, ...])

Process multiple files sequentially in batch mode.

create_argument_parser()

Create and configure the argument parser for batch mode.

main_interactive()

Interactive mode entry point for the NLTK syllable extractor CLI.

main_batch(args)

Batch mode entry point for the NLTK syllable extractor CLI.

main()

Main entry point for the NLTK syllable extractor CLI.

Module Contents

build_tools.nltk_syllable_extractor.cli.CORPUS_DB_AVAILABLE = True
build_tools.nltk_syllable_extractor.cli.EXTRACTOR_VERSION = 'unknown'
build_tools.nltk_syllable_extractor.cli.READLINE_AVAILABLE = True
build_tools.nltk_syllable_extractor.cli.path_completer(text, state)[source]

Tab completion function for file paths.

This enables bash-like tab completion for navigating directories and selecting files.

Parameters:
  • text – The current text being completed

  • state – The completion state (0 for first call, incremented for each match)

Returns:

The next completion match, or None when no more matches

build_tools.nltk_syllable_extractor.cli.setup_tab_completion()[source]

Configure readline for tab completion with file paths.

This enables: - Tab completion for file and directory names - Tilde (~) expansion for home directory - Standard bash-like completion behavior

build_tools.nltk_syllable_extractor.cli.input_with_completion(prompt)[source]

Get user input with tab completion enabled.

Parameters:

prompt (str) – The prompt to display

Returns:

User input string

Return type:

str

build_tools.nltk_syllable_extractor.cli.discover_files(source, pattern='*.txt', recursive=False)[source]

Discover text files in a directory matching the specified pattern.

This function searches for files matching a glob pattern in the specified directory, optionally recursing into subdirectories. Results are sorted alphabetically for deterministic processing order.

Parameters:
  • source (pathlib.Path) – Directory to search for files. Must be an existing directory.

  • pattern (str) – Glob pattern for file matching (default: “.txt”). Examples: “.txt”, “.md”, “data_.csv”

  • recursive (bool) – If True, search recursively into subdirectories using rglob. If False, search only the top level (default: False).

Returns:

List of Path objects for matching files, sorted alphabetically. Returns empty list if no files match.

Raises:

ValueError – If source is not a directory or doesn’t exist.

Return type:

List[pathlib.Path]

Example

>>> # Find all .txt files in a directory
>>> files = discover_files(Path("/data/texts"))
>>> print(f"Found {len(files)} files")
>>> # Find all .md files recursively
>>> files = discover_files(Path("/data"), pattern="*.md", recursive=True)
build_tools.nltk_syllable_extractor.cli.process_single_file_batch(input_path, min_len, max_len, output_dir, run_timestamp, verbose=False)[source]

Process a single file in batch mode with comprehensive error handling.

This function attempts to extract syllables from a single file and saves the results. Unlike interactive mode, this function catches all exceptions and returns a result object indicating success or failure, allowing batch processing to continue even when individual files fail.

Parameters:
  • input_path (pathlib.Path) – Path to the input text file to process

  • min_len (int) – Minimum syllable length to include in results

  • max_len (int) – Maximum syllable length to include in results

  • output_dir (pathlib.Path) – Directory where output files should be saved

  • run_timestamp (str) – Timestamp for the batch run (shared across all files in batch)

  • verbose (bool) – If True, print detailed progress messages (default: False)

Returns:

FileProcessingResult object with success status, syllables count, output paths (if successful), or error message (if failed).

Return type:

build_tools.nltk_syllable_extractor.models.FileProcessingResult

Note

This function never raises exceptions. All errors are caught and returned in the FileProcessingResult.error_message field. This design allows batch processing to continue despite individual failures.

Example

>>> timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
>>> result = process_single_file_batch(
...     Path("book.txt"),
...     min_len=2,
...     max_len=8,
...     output_dir=Path("output/"),
...     run_timestamp=timestamp,
...     verbose=True
... )
>>> if result.success:
...     print(f"Extracted {result.syllables_count} syllables")
... else:
...     print(f"Failed: {result.error_message}")
build_tools.nltk_syllable_extractor.cli.process_batch(files, min_len, max_len, output_dir, quiet=False, verbose=False)[source]

Process multiple files sequentially in batch mode.

This function processes a list of files one at a time, extracting syllables from each and saving results to the specified output directory. All files in the batch share a single timestamped run directory, grouping them as one logical batch operation.

Parameters:
  • files (List[pathlib.Path]) – List of input file paths to process

  • min_len (int) – Minimum syllable length to include

  • max_len (int) – Maximum syllable length to include

  • output_dir (pathlib.Path) – Output directory for all results (created if needed)

  • quiet (bool) – If True, suppress all output except errors (default: False)

  • verbose (bool) – If True, show detailed progress for each file (default: False). Ignored if quiet=True.

Returns:

BatchResult with overall statistics and individual file results.

Return type:

build_tools.nltk_syllable_extractor.models.BatchResult

Example

>>> files = [Path("book1.txt"), Path("book2.txt"), Path("book3.txt")]
>>> result = process_batch(
...     files,
...     min_len=2,
...     max_len=8,
...     output_dir=Path("output/")
... )
>>> print(f"Processed {result.successful}/{result.total_files} files")
>>> print(result.format_summary())

Note

Processing is sequential (not parallel). Files are processed in the order provided in the files list. All outputs share a single run directory identified by the batch start timestamp.

build_tools.nltk_syllable_extractor.cli.create_argument_parser()[source]

Create and configure the argument parser for batch mode.

This function sets up the argparse parser with all command-line options for batch processing mode.

Returns:

Configured ArgumentParser instance ready to parse sys.argv.

Return type:

argparse.ArgumentParser

Example

>>> parser = create_argument_parser()
>>> args = parser.parse_args(["--file", "input.txt"])
>>> print(args.file)
PosixPath('input.txt')
build_tools.nltk_syllable_extractor.cli.main_interactive()[source]

Interactive mode entry point for the NLTK syllable extractor CLI.

Workflow:
  1. Display tool information and CMUDict notice

  2. Configure extraction parameters (min/max syllable length)

  3. Prompt for input file path

  4. Extract syllables using CMUDict + onset/coda principles

  5. Generate timestamped output filenames

  6. Save syllables and metadata to separate files

  7. Display summary to console

Output Files:
  • YYYYMMDD_HHMMSS.syllables.en_US.txt: One syllable per line, sorted

  • YYYYMMDD_HHMMSS.meta.en_US.txt: Extraction metadata and statistics

Corpus Database Integration:

All interactive mode extractions are automatically recorded to the corpus database ledger for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails.

Both files are saved to _working/output/ by default.

build_tools.nltk_syllable_extractor.cli.main_batch(args)[source]

Batch mode entry point for the NLTK syllable extractor CLI.

This function processes multiple files based on command-line arguments, providing progress indicators and comprehensive error reporting.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments from argparse.Namespace containing: - file: Single file path (optional) - files: List of file paths (optional) - source: Directory path for scanning (optional) - pattern: File pattern for directory scanning (default: “*.txt”) - recursive: Whether to scan directories recursively - min: Minimum syllable length (default: 2) - max: Maximum syllable length (default: 8) - output: Output directory (default: _working/output/) - quiet: Suppress progress indicators - verbose: Show detailed processing information

Corpus Database Integration:

All batch mode extractions are automatically recorded to the corpus database ledger for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails.

Exit Codes:

0: All files processed successfully 1: One or more files failed to process

Raises:

SystemExit – On validation errors or processing completion

build_tools.nltk_syllable_extractor.cli.main()[source]

Main entry point for the NLTK syllable extractor CLI.

This function determines whether to run in interactive or batch mode based on the presence of command-line arguments.

Modes:
  • Interactive Mode: No arguments provided. Prompts user for all settings.

  • Batch Mode: Arguments provided. Processes files based on CLI flags.

Examples

Interactive mode (no arguments):

$ python -m build_tools.nltk_syllable_extractor

Batch mode (with arguments):

$ python -m build_tools.nltk_syllable_extractor –file input.txt $ python -m build_tools.nltk_syllable_extractor –files *.txt $ python -m build_tools.nltk_syllable_extractor –source ~/docs/ –recursive