build_tools.nltk_syllable_normaliser.cli

Command-line interface for NLTK syllable normalization pipeline.

This module provides the main CLI entry point for the nltk_syllable_normaliser tool, which processes NLTK extractor output with fragment cleaning + normalization pipeline.

Functions

detect_nltk_run_directories(source_dir)

Detect NLTK run directories within source directory.

run_full_pipeline(run_directory, config[, verbose, ...])

Run complete NLTK normalization pipeline with in-place processing.

create_argument_parser()

Create and return the argument parser for NLTK syllable normaliser.

parse_arguments([args])

Parse command-line arguments.

main([argv])

Main entry point for CLI.

Module Contents

build_tools.nltk_syllable_normaliser.cli.detect_nltk_run_directories(source_dir)[source]

Detect NLTK run directories within source directory.

Searches for directories matching the pattern YYYYMMDD_HHMMSS_nltk/ which contain a syllables/ subdirectory.

Parameters:

source_dir (pathlib.Path) – Directory to search for NLTK run directories.

Returns:

List of Path objects pointing to NLTK run directories, sorted by directory name (chronological order).

Return type:

List[pathlib.Path]

Example

>>> source = Path("_working/output/")
>>> runs = detect_nltk_run_directories(source)
>>> for run in runs:
...     print(run.name)
20260110_095213_nltk
20260110_143022_nltk
build_tools.nltk_syllable_normaliser.cli.run_full_pipeline(run_directory, config, verbose=False, skip_fragment_cleaning=False)[source]

Run complete NLTK normalization pipeline with in-place processing.

Executes the full NLTK-specific workflow: 1. Aggregate syllables from run_directory/syllables/*.txt 2. Fragment cleaning (NLTK-specific preprocessing) 3. Canonicalize syllables (Unicode normalization, etc.) 4. Frequency analysis 5. Write 5 output files to run_directory (in-place)

Parameters:
Returns:

NormalizationResult containing all outputs, statistics, and file paths.

Raises:
  • FileNotFoundError – If run_directory or syllables/ subdirectory doesn’t exist.

  • ValueError – If run_directory is not a directory.

Return type:

build_tools.pyphen_syllable_normaliser.NormalizationResult

Example

>>> from pathlib import Path
>>> config = NormalizationConfig(min_length=2, max_length=8)
>>> run_dir = Path("_working/output/20260110_095213_nltk/")
>>> result = run_full_pipeline(
...     run_directory=run_dir,
...     config=config,
...     verbose=True
... )
>>> result.stats.raw_count
15234
>>> result.stats.unique_canonical
4821
build_tools.nltk_syllable_normaliser.cli.create_argument_parser()[source]

Create and return the argument parser for NLTK syllable normaliser.

Returns:

Configured ArgumentParser ready to parse command-line arguments.

Return type:

argparse.ArgumentParser

build_tools.nltk_syllable_normaliser.cli.parse_arguments(args=None)[source]

Parse command-line arguments.

build_tools.nltk_syllable_normaliser.cli.main(argv=None)[source]

Main entry point for CLI.

Parameters:

argv (Optional[List[str]]) – Command-line arguments (for testing). If None, uses sys.argv.

Returns:

Exit code (0 for success, 1 for error).

Return type:

int