build_tools.syllable_analysis.random_sampler
Random sampling utility for annotated syllables.
This module provides functionality to randomly sample annotated syllables for inspection and quality assurance. It reads the output of the syllable feature annotator and generates a random sample in JSON format.
This module has been refactored (Phase 2) to use common utilities from the analysis.common package, eliminating code duplication.
- Usage:
# Sample 100 syllables (default) python -m build_tools.syllable_analysis.random_sampler
# Sample specific number of syllables python -m build_tools.syllable_analysis.random_sampler –samples 50
# Specify custom input/output paths python -m build_tools.syllable_analysis.random_sampler –input data/annotated/syllables_annotated.json –output _working/samples.json –samples 200
# Use a specific random seed for reproducibility python -m build_tools.syllable_analysis.random_sampler –samples 50 –seed 42
Functions
|
Randomly sample syllables from the full corpus. |
Create and return the argument parser for random sampler. |
|
Parse command-line arguments. |
|
|
Main entry point for random sampling. |
Module Contents
- build_tools.syllable_analysis.random_sampler.sample_syllables(records, sample_count, seed=None)[source]
Randomly sample syllables from the full corpus.
- Parameters:
- Returns:
List of sampled syllable records.
- Raises:
ValueError – If sample_count is larger than available records.
- Return type:
List[Dict[str, Any]]
- build_tools.syllable_analysis.random_sampler.create_argument_parser()[source]
Create and return the argument parser for random sampler.
This function creates the ArgumentParser with all CLI options but does not parse arguments. This separation allows Sphinx documentation tools to introspect the parser and auto-generate CLI documentation.
Returns
- argparse.ArgumentParser
Configured ArgumentParser ready to parse command-line arguments