build_tools.nltk_syllable_extractor.batch

Batch mode for the NLTK syllable extractor.

This module provides batch processing functionality for extracting syllables from multiple files using NLTK’s CMU Pronouncing Dictionary.

Functions

process_single_file(input_path, min_len, max_len, ...)

Process a single file in batch mode with comprehensive error handling.

process_batch(files, min_len, max_len, output_dir[, ...])

Process multiple files sequentially in batch mode.

run_batch(args)

Batch mode entry point for the NLTK syllable extractor CLI.

Module Contents

build_tools.nltk_syllable_extractor.batch.process_single_file(input_path, min_len, max_len, output_dir, run_timestamp, verbose=False)[source]

Process a single file in batch mode with comprehensive error handling.

This function attempts to extract syllables from a single file and saves the results. Unlike interactive mode, this function catches all exceptions and returns a result object indicating success or failure, allowing batch processing to continue even when individual files fail.

Parameters:
  • input_path (pathlib.Path) – Path to the input text file to process

  • min_len (int) – Minimum syllable length to include in results

  • max_len (int) – Maximum syllable length to include in results

  • output_dir (pathlib.Path) – Directory where output files should be saved

  • run_timestamp (str) – Timestamp for the batch run (shared across all files in batch)

  • verbose (bool) – If True, print detailed progress messages (default: False)

Returns:

FileProcessingResult object with success status, syllables count, output paths (if successful), or error message (if failed).

Return type:

build_tools.nltk_syllable_extractor.models.FileProcessingResult

Note

This function never raises exceptions. All errors are caught and returned in the FileProcessingResult.error_message field.

build_tools.nltk_syllable_extractor.batch.process_batch(files, min_len, max_len, output_dir, quiet=False, verbose=False)[source]

Process multiple files sequentially in batch mode.

This is a backwards-compatible wrapper around run_batch_extraction.

Parameters:
  • files (list[pathlib.Path]) – List of input file paths to process

  • min_len (int) – Minimum syllable length to include

  • max_len (int) – Maximum syllable length to include

  • output_dir (pathlib.Path) – Output directory for all results (created if needed)

  • quiet (bool) – If True, suppress all output except errors (default: False)

  • verbose (bool) – If True, show detailed progress for each file (default: False).

Returns:

BatchResult with overall statistics and individual file results.

Return type:

build_tools.nltk_syllable_extractor.models.BatchResult

build_tools.nltk_syllable_extractor.batch.run_batch(args)[source]

Batch mode entry point for the NLTK syllable extractor CLI.

This function processes multiple files based on command-line arguments, providing progress indicators and comprehensive error reporting.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments from argparse.Namespace containing: - file: Single file path (optional) - files: List of file paths (optional) - source: Directory path for scanning (optional) - pattern: File pattern for directory scanning (default: “*.txt”) - recursive: Whether to scan directories recursively - min: Minimum syllable length (default: 1) - max: Maximum syllable length (default: 999) - output: Output directory (default: _working/output/) - quiet: Suppress progress indicators - verbose: Show detailed processing information

Exit Codes:

0: All files processed successfully 1: One or more files failed to process

Raises:

SystemExit – On validation errors or processing completion