build_tools.syllable_analysis
Analysis tools for annotated syllables.
This subpackage provides post-annotation analysis utilities for inspecting and understanding the annotated syllable corpus.
Subpackages
common: Shared utilities (data I/O, paths, output management) dimensionality: Dimensionality reduction (feature matrices, t-SNE, mapping) plotting: Visualization utilities (static matplotlib, interactive Plotly)
Available Tools
random_sampler: Random sampling utility for QA and inspection feature_signatures: Feature signature analysis and distribution reporting tsne_visualizer: t-SNE visualization of feature signature space
Quick Start
Random sampling:
$ python -m build_tools.syllable_analysis.random_sampler --samples 50
Feature signature analysis:
$ python -m build_tools.syllable_analysis.feature_signatures
t-SNE visualization:
$ python -m build_tools.syllable_analysis.tsne_visualizer
Programmatic Usage
Using common utilities:
>>> from build_tools.syllable_analysis import (
... default_paths,
... load_annotated_syllables,
... ensure_output_dir,
... )
>>> # Load data using default paths
>>> records = load_annotated_syllables(default_paths.annotated_syllables)
>>> # Prepare output directory
>>> output_dir = ensure_output_dir(default_paths.analysis_output_dir("my_tool"))
Random sampling:
>>> from build_tools.syllable_analysis import (
... load_annotated_syllables,
... sample_syllables,
... save_json_output
... )
>>> records = load_annotated_syllables(Path("data/annotated/syllables_annotated.json"))
>>> samples = sample_syllables(records, 50, seed=42)
>>> save_json_output(samples, Path("output.json"))
Feature signature analysis:
>>> from build_tools.syllable_analysis import (
... run_analysis,
... extract_signature,
... analyze_feature_signatures
... )
>>> result = run_analysis(
... input_path=Path("data/annotated/syllables_annotated.json"),
... output_dir=Path("_working/analysis/"),
... limit=20
... )
t-SNE visualization:
>>> from build_tools.syllable_analysis import (
... run_tsne_visualization,
... extract_feature_matrix
... )
>>> result = run_tsne_visualization(
... input_path=Path("data/annotated/syllables_annotated.json"),
... output_dir=Path("_working/analysis/tsne/")
... )