build_tools.tui_common.batch

Shared batch processing utilities for syllable extractors.

This module provides common batch processing orchestration that can be used by both pyphen and NLTK syllable extractors. It abstracts the common patterns of processing multiple files while allowing extractor-specific logic.

Usage:

from build_tools.tui_common.batch import run_batch_extraction

# Define extractor-specific single-file processor
def process_file(input_path, output_dir, run_timestamp, verbose):
    # ... extraction logic ...
    return FileProcessingResult(...)

# Run batch with shared orchestration
result = run_batch_extraction(
    files=files_to_process,
    output_dir=output_dir,
    process_file_func=process_file,
    extractor_name="pyphen",
    language_display="en_US",
    min_len=2,
    max_len=8,
    quiet=False,
    verbose=True,
)

Attributes

SingleFileProcessor

Functions

run_batch_extraction(files, output_dir, ...[, quiet, ...])

Run batch extraction with shared orchestration logic.

collect_files_from_args(file_arg, files_arg, ...)

Collect files to process from CLI arguments.

validate_extraction_params(min_len, max_len)

Validate extraction parameters.

Module Contents

build_tools.tui_common.batch.SingleFileProcessor
build_tools.tui_common.batch.run_batch_extraction(files, output_dir, process_file_func, batch_result_class, extractor_name, language_display, min_len, max_len, quiet=False, verbose=False)[source]

Run batch extraction with shared orchestration logic.

This function provides the common batch processing pattern: - Generate shared timestamp for the batch run - Create output directory - Display batch header - Process each file with progress indicators - Collect and return results

Parameters:
  • files (list[pathlib.Path]) – List of input file paths to process

  • output_dir (pathlib.Path) – Output directory for all results

  • process_file_func (SingleFileProcessor) – Callable that processes a single file. Signature: (input_path, output_dir, run_timestamp, verbose) -> FileProcessingResult

  • batch_result_class (type[Any]) – Class to use for BatchResult (from models module)

  • extractor_name (str) – Name of extractor for display (“pyphen” or “nltk”)

  • language_display (str) – Language string for display (e.g., “en_US”, “auto”)

  • min_len (int) – Minimum syllable length (for display)

  • max_len (int) – Maximum syllable length (for display)

  • quiet (bool) – Suppress all output except errors

  • verbose (bool) – Show detailed progress for each file

Returns:

BatchResult with overall statistics and individual file results.

Return type:

Any

Example

>>> from build_tools.pyphen_syllable_extractor.models import BatchResult
>>>
>>> def my_processor(path, out_dir, timestamp, verbose):
...     # Process file and return FileProcessingResult
...     pass
>>>
>>> result = run_batch_extraction(
...     files=[Path("a.txt"), Path("b.txt")],
...     output_dir=Path("output/"),
...     process_file_func=my_processor,
...     batch_result_class=BatchResult,
...     extractor_name="pyphen",
...     language_display="en_US",
...     min_len=2,
...     max_len=8,
... )
build_tools.tui_common.batch.collect_files_from_args(file_arg, files_arg, source_arg, pattern, recursive)[source]

Collect files to process from CLI arguments.

Validates and resolves paths from the three mutually exclusive input modes: - Single file (–file) - Multiple files (–files) - Directory scan (–source)

Parameters:
  • file_arg (pathlib.Path | None) – Single file path (from –file)

  • files_arg (list[pathlib.Path] | None) – List of file paths (from –files)

  • source_arg (pathlib.Path | None) – Directory path (from –source)

  • pattern (str) – File pattern for directory scanning

  • recursive (bool) – Whether to scan directories recursively

Returns:

Tuple of (list of resolved file paths, source directory or None)

Raises:
  • ValueError – If validation fails (file not found, not a file, etc.)

  • SystemExit – If no input is specified

Return type:

tuple[list[pathlib.Path], pathlib.Path | None]

Example

>>> files, source_dir = collect_files_from_args(
...     file_arg=Path("input.txt"),
...     files_arg=None,
...     source_arg=None,
...     pattern="*.txt",
...     recursive=False,
... )
build_tools.tui_common.batch.validate_extraction_params(min_len, max_len)[source]

Validate extraction parameters.

Parameters:
  • min_len (int) – Minimum syllable length

  • max_len (int) – Maximum syllable length

Raises:

SystemExit – If validation fails