build_tools.pyphen_syllable_extractor.cli

Command-line interface for the syllable extractor.

This module provides both interactive and batch processing functionality for syllable extraction, including language selection, user input prompts, tab completion, and command-line argument parsing for batch operations.

Attributes

CORPUS_DB_AVAILABLE

EXTRACTOR_VERSION

READLINE_AVAILABLE

Functions

path_completer(text, state)

Tab completion function for file paths.

setup_tab_completion()

Configure readline for tab completion with file paths.

input_with_completion(prompt)

Get user input with tab completion enabled.

select_language()

Interactive prompt to select a language from supported options.

discover_files(source[, pattern, recursive])

Discover text files in a directory matching the specified pattern.

process_single_file_batch(input_path, language_code, ...)

Process a single file in batch mode with comprehensive error handling.

process_batch(files, language_code, min_len, max_len, ...)

Process multiple files sequentially in batch mode.

create_argument_parser()

Create and configure the argument parser for batch mode.

main_interactive()

Interactive mode entry point for the syllable extractor CLI.

main_batch(args)

Batch mode entry point for the syllable extractor CLI.

main()

Main entry point for the syllable extractor CLI.

Module Contents

build_tools.pyphen_syllable_extractor.cli.CORPUS_DB_AVAILABLE = True
build_tools.pyphen_syllable_extractor.cli.EXTRACTOR_VERSION = 'unknown'
build_tools.pyphen_syllable_extractor.cli.READLINE_AVAILABLE = True
build_tools.pyphen_syllable_extractor.cli.path_completer(text, state)[source]

Tab completion function for file paths.

This enables bash-like tab completion for navigating directories and selecting files.

Parameters:
  • text – The current text being completed

  • state – The completion state (0 for first call, incremented for each match)

Returns:

The next completion match, or None when no more matches

build_tools.pyphen_syllable_extractor.cli.setup_tab_completion()[source]

Configure readline for tab completion with file paths.

This enables: - Tab completion for file and directory names - Tilde (~) expansion for home directory - Standard bash-like completion behavior

build_tools.pyphen_syllable_extractor.cli.input_with_completion(prompt)[source]

Get user input with tab completion enabled.

Parameters:

prompt (str) – The prompt to display

Returns:

User input string

Return type:

str

build_tools.pyphen_syllable_extractor.cli.select_language()[source]

Interactive prompt to select a language from supported options.

Returns:

The pyphen language code for the selected language, or “auto” for automatic language detection

Return type:

str

Note

Exits the program if the user provides invalid input after multiple attempts or requests to quit.

build_tools.pyphen_syllable_extractor.cli.discover_files(source, pattern='*.txt', recursive=False)[source]

Discover text files in a directory matching the specified pattern.

This function searches for files matching a glob pattern in the specified directory, optionally recursing into subdirectories. Results are sorted alphabetically for deterministic processing order.

Parameters:
  • source (pathlib.Path) – Directory to search for files. Must be an existing directory.

  • pattern (str) – Glob pattern for file matching (default: “.txt”). Examples: “.txt”, “.md”, “data_.csv”

  • recursive (bool) – If True, search recursively into subdirectories using rglob. If False, search only the top level (default: False).

Returns:

List of Path objects for matching files, sorted alphabetically. Returns empty list if no files match.

Raises:

ValueError – If source is not a directory or doesn’t exist.

Return type:

List[pathlib.Path]

Example

>>> # Find all .txt files in a directory
>>> files = discover_files(Path("/data/texts"))
>>> print(f"Found {len(files)} files")
>>> # Find all .md files recursively
>>> files = discover_files(Path("/data"), pattern="*.md", recursive=True)
>>> # Find files with custom pattern
>>> files = discover_files(Path("/data"), pattern="book_*.txt")
build_tools.pyphen_syllable_extractor.cli.process_single_file_batch(input_path, language_code, min_len, max_len, output_dir, run_timestamp, verbose=False)[source]

Process a single file in batch mode with comprehensive error handling.

This function attempts to extract syllables from a single file and saves the results. Unlike interactive mode, this function catches all exceptions and returns a result object indicating success or failure, allowing batch processing to continue even when individual files fail.

Parameters:
  • input_path (pathlib.Path) – Path to the input text file to process

  • language_code (str) – Language code (e.g., “en_US”, “de_DE”) or “auto” for automatic language detection

  • min_len (int) – Minimum syllable length to include in results

  • max_len (int) – Maximum syllable length to include in results

  • output_dir (pathlib.Path) – Directory where output files should be saved

  • run_timestamp (str) – Timestamp for the batch run (shared across all files in batch)

  • verbose (bool) – If True, print detailed progress messages (default: False)

Returns:

FileProcessingResult object with success status, syllables count, output paths (if successful), or error message (if failed).

Return type:

build_tools.pyphen_syllable_extractor.models.FileProcessingResult

Note

This function never raises exceptions. All errors are caught and returned in the FileProcessingResult.error_message field. This design allows batch processing to continue despite individual failures.

Example

>>> timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
>>> result = process_single_file_batch(
...     Path("book.txt"),
...     language_code="en_US",
...     min_len=2,
...     max_len=8,
...     output_dir=Path("output/"),
...     run_timestamp=timestamp,
...     verbose=True
... )
>>> if result.success:
...     print(f"Extracted {result.syllables_count} syllables")
... else:
...     print(f"Failed: {result.error_message}")
build_tools.pyphen_syllable_extractor.cli.process_batch(files, language_code, min_len, max_len, output_dir, quiet=False, verbose=False)[source]

Process multiple files sequentially in batch mode.

This function processes a list of files one at a time, extracting syllables from each and saving results to the specified output directory. All files in the batch share a single timestamped run directory, grouping them as one logical batch operation.

Parameters:
  • files (List[pathlib.Path]) – List of input file paths to process

  • language_code (str) – Language code (e.g., “en_US”) or “auto” for detection

  • min_len (int) – Minimum syllable length to include

  • max_len (int) – Maximum syllable length to include

  • output_dir (pathlib.Path) – Output directory for all results (created if needed)

  • quiet (bool) – If True, suppress all output except errors (default: False)

  • verbose (bool) – If True, show detailed progress for each file (default: False). Ignored if quiet=True.

Returns:

BatchResult with overall statistics and individual file results.

Return type:

build_tools.pyphen_syllable_extractor.models.BatchResult

Example

>>> files = [Path("book1.txt"), Path("book2.txt"), Path("book3.txt")]
>>> result = process_batch(
...     files,
...     language_code="auto",
...     min_len=2,
...     max_len=8,
...     output_dir=Path("output/")
... )
>>> print(f"Processed {result.successful}/{result.total_files} files")
>>> print(result.format_summary())

Note

Processing is sequential (not parallel). Files are processed in the order provided in the files list. All outputs share a single run directory identified by the batch start timestamp.

build_tools.pyphen_syllable_extractor.cli.create_argument_parser()[source]

Create and configure the argument parser for batch mode.

This function sets up the argparse parser with all command-line options for batch processing mode. The parser supports mutually exclusive groups for input specification and language selection.

Returns:

Configured ArgumentParser instance ready to parse sys.argv.

Return type:

argparse.ArgumentParser

Example

>>> parser = create_argument_parser()
>>> args = parser.parse_args(["--file", "input.txt", "--lang", "en_US"])
>>> print(args.file)
PosixPath('input.txt')
build_tools.pyphen_syllable_extractor.cli.main_interactive()[source]

Interactive mode entry point for the syllable extractor CLI.

Workflow:
  1. Prompt user to select a language (or ‘auto’ for automatic detection)

  2. Configure extraction parameters (min/max syllable length)

  3. Prompt for input file path

  4. Extract syllables from input file (with optional auto-detection)

  5. Generate timestamped output filenames

  6. Save syllables and metadata to separate files

  7. Display summary to console

Language Detection:
  • If ‘auto’ is selected and langdetect is installed, the tool will automatically detect the language of the input text

  • Detection requires at least 20-50 characters for reliable results

  • Falls back to English (en_US) if detection fails

Output Files:
  • YYYYMMDD_HHMMSS.syllables.LANG.txt: One syllable per line, sorted

  • YYYYMMDD_HHMMSS.meta.LANG.txt: Extraction metadata and statistics

Corpus Database Integration:

All interactive mode extractions are automatically recorded to the corpus database ledger (data/raw/syllable_extractor.db) for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails.

Both files are saved to _working/output/ by default.

build_tools.pyphen_syllable_extractor.cli.main_batch(args)[source]

Batch mode entry point for the syllable extractor CLI.

This function processes multiple files based on command-line arguments, providing progress indicators and comprehensive error reporting.

Parameters:

args (argparse.Namespace) – Parsed command-line arguments from argparse.Namespace containing: - file: Single file path (optional) - files: List of file paths (optional) - source: Directory path for scanning (optional) - pattern: File pattern for directory scanning (default: “*.txt”) - recursive: Whether to scan directories recursively - lang: Manual language code (mutually exclusive with auto) - auto: Use automatic language detection (mutually exclusive with lang) - min: Minimum syllable length (default: 2) - max: Maximum syllable length (default: 8) - output: Output directory (default: _working/output/) - quiet: Suppress progress indicators - verbose: Show detailed processing information

Corpus Database Integration:

All batch mode extractions are automatically recorded to the corpus database ledger (data/raw/syllable_extractor.db) for build provenance tracking. Recording is optional - extraction succeeds even if ledger fails.

Exit Codes:

0: All files processed successfully 1: One or more files failed to process

Raises:

SystemExit – On validation errors or processing completion

build_tools.pyphen_syllable_extractor.cli.main()[source]

Main entry point for the syllable extractor CLI.

This function determines whether to run in interactive or batch mode based on the presence of command-line arguments.

Modes:
  • Interactive Mode: No arguments provided. Prompts user for all settings.

  • Batch Mode: Arguments provided. Processes files based on CLI flags.

Examples

Interactive mode (no arguments):

$ python -m build_tools.syllable_extractor

Batch mode (with arguments):

$ python -m build_tools.syllable_extractor –file input.txt –lang en_US $ python -m build_tools.syllable_extractor –files *.txt –auto $ python -m build_tools.syllable_extractor –source ~/docs/ –recursive –auto