build_tools.pyphen_syllable_normaliser.cli
Command-line interface for syllable normalization pipeline.
This module provides the main CLI entry point for the syllable_normaliser tool, which processes pyphen extractor output with 3-step normalization pipeline.
Functions
|
Detect pyphen run directories within source directory. |
|
Run complete pyphen normalization pipeline with in-place processing. |
Create and return the argument parser for pyphen syllable normaliser. |
|
|
Parse command-line arguments. |
|
Main entry point for CLI. |
Module Contents
- build_tools.pyphen_syllable_normaliser.cli.detect_pyphen_run_directories(source_dir)[source]
Detect pyphen run directories within source directory.
Searches for directories matching the pattern YYYYMMDD_HHMMSS_pyphen/ which contain a syllables/ subdirectory.
- Parameters:
source_dir (pathlib.Path) – Directory to search for pyphen run directories.
- Returns:
List of Path objects pointing to pyphen run directories, sorted by directory name (chronological order).
- Return type:
List[pathlib.Path]
Example
>>> source = Path("_working/output/") >>> runs = detect_pyphen_run_directories(source) >>> for run in runs: ... print(run.name) 20260110_143022_pyphen 20260110_153045_pyphen
- build_tools.pyphen_syllable_normaliser.cli.run_full_pipeline(run_directory, config, verbose=False, quiet=False)[source]
Run complete pyphen normalization pipeline with in-place processing.
Executes the full pyphen-specific workflow: 1. Aggregate syllables from run_directory/syllables/*.txt 2. Canonicalize syllables (Unicode normalization, etc.) 3. Frequency analysis 4. Write 5 output files to run_directory (in-place)
- Parameters:
run_directory (pathlib.Path) – Pyphen run directory (e.g., _working/output/20260110_143022_pyphen/).
config (build_tools.pyphen_syllable_normaliser.models.NormalizationConfig) – NormalizationConfig specifying normalization parameters.
verbose (bool) – If True, print detailed progress information.
quiet (bool) – If True, suppress all output except errors.
- Returns:
NormalizationResult containing all outputs, statistics, and file paths.
- Raises:
FileNotFoundError – If run_directory or syllables/ subdirectory doesn’t exist.
ValueError – If run_directory is not a directory.
- Return type:
build_tools.pyphen_syllable_normaliser.models.NormalizationResult
Example
>>> from pathlib import Path >>> config = NormalizationConfig(min_length=2, max_length=8) >>> run_dir = Path("_working/output/20260110_143022_pyphen/") >>> result = run_full_pipeline( ... run_directory=run_dir, ... config=config, ... verbose=True ... ) >>> result.stats.raw_count 15234 >>> result.stats.unique_canonical 4821
- build_tools.pyphen_syllable_normaliser.cli.create_argument_parser()[source]
Create and return the argument parser for pyphen syllable normaliser.
- Returns:
Configured ArgumentParser ready to parse command-line arguments.
- Return type: