build_tools.syllable_analysis.random_sampler

Random sampling utility for annotated syllables.

This module provides functionality to randomly sample annotated syllables for inspection and quality assurance. It reads the output of the syllable feature annotator and generates a random sample in JSON format.

This module has been refactored (Phase 2) to use common utilities from the analysis.common package, eliminating code duplication.

Usage:

# Sample 100 syllables (default)
python -m build_tools.syllable_analysis.random_sampler

# Sample specific number of syllables
python -m build_tools.syllable_analysis.random_sampler --samples 50

# Specify custom input/output paths
python -m build_tools.syllable_analysis.random_sampler         --input data/annotated/syllables_annotated.json         --output _working/samples.json         --samples 200

# Use a specific random seed for reproducibility
python -m build_tools.syllable_analysis.random_sampler --samples 50 --seed 42

Functions

sample_syllables(records, sample_count[, seed])

Randomly sample syllables from the full corpus.

create_argument_parser()

Create and return the argument parser for random sampler.

parse_arguments()

Parse command-line arguments.

main()

Main entry point for random sampling.

Module Contents

build_tools.syllable_analysis.random_sampler.sample_syllables(records, sample_count, seed=None)[source]

Randomly sample syllables from the full corpus.

Parameters:
  • records (list[dict[str, Any]]) – List of annotated syllable records.

  • sample_count (int) – Number of samples to draw.

  • seed (int | None) – Optional random seed for reproducibility.

Returns:

List of sampled syllable records.

Raises:

ValueError – If sample_count is larger than available records.

Return type:

list[dict[str, Any]]

build_tools.syllable_analysis.random_sampler.create_argument_parser()[source]

Create and return the argument parser for random sampler.

This function creates the ArgumentParser with all CLI options but does not parse arguments. This separation allows Sphinx documentation tools to introspect the parser and auto-generate CLI documentation.

Returns

argparse.ArgumentParser

Configured ArgumentParser ready to parse command-line arguments

build_tools.syllable_analysis.random_sampler.parse_arguments()[source]

Parse command-line arguments.

Returns:

Parsed argument namespace.

Return type:

argparse.Namespace

build_tools.syllable_analysis.random_sampler.main()[source]

Main entry point for random sampling.

Returns:

Exit code (0 for success, 1 for error).

Return type:

int