build_tools.syllable_analysis.tsne_visualizer ============================================= .. py:module:: build_tools.syllable_analysis.tsne_visualizer .. autoapi-nested-parse:: t-SNE Visualization for Feature Signature Space This build-time analysis tool creates a t-SNE (t-distributed Stochastic Neighbor Embedding) visualization of the feature signature space in the annotated syllable corpus. t-SNE is a dimensionality reduction technique that projects high-dimensional feature vectors into 2D space while preserving local structure. This visualization helps identify: - Clustering patterns in the feature space - Syllable similarity based on phonetic features - Natural groupings and outliers in the corpus The visualization uses: - Position (x, y): t-SNE projection of 12-dimensional feature vectors - Size: Syllable frequency (larger points = more common syllables) - Color: Syllable frequency (warmer colors = more common syllables) Technical Details: - Uses Hamming distance metric (optimal for binary feature vectors) - Perplexity=30 (balances local vs global structure) - Fixed random seed for reproducibility (seed=42) Output Formats: - Static PNG: High-resolution matplotlib visualization (always generated) - Interactive HTML: Plotly-based interactive visualization (optional, requires --interactive flag) Usage: # Generate static PNG visualization with default paths python -m build_tools.syllable_analysis.tsne_visualizer # Generate both static PNG and interactive HTML python -m build_tools.syllable_analysis.tsne_visualizer \ --interactive \ --save-mapping # Custom input/output paths python -m build_tools.syllable_analysis.tsne_visualizer \ --input data/annotated/syllables_annotated.json \ --output _working/analysis/tsne/ \ --interactive # Adjust t-SNE parameters python -m build_tools.syllable_analysis.tsne_visualizer \ --perplexity 50 \ --random-state 123 \ --interactive # High-resolution output with interactive HTML python -m build_tools.syllable_analysis.tsne_visualizer \ --dpi 600 \ --interactive \ --save-mapping Programmatic Usage: >>> from pathlib import Path >>> from build_tools.syllable_analysis import ( ... run_tsne_visualization, ... extract_feature_matrix ... ) >>> result = run_tsne_visualization( ... input_path=Path("data/annotated/syllables_annotated.json"), ... output_dir=Path("_working/analysis/tsne/"), ... perplexity=30, ... random_state=42, ... interactive=True, ... save_mapping=True ... ) >>> print(f"Static visualization: {result['output_path']}") >>> print(f"Interactive HTML: {result['interactive_path']}") Architecture: This module orchestrates calls to specialized modules: - common.data_io: Load annotated syllables - common.paths: Default path configuration - common.output: Output directory and file management - dimensionality.feature_matrix: Extract feature matrices - dimensionality.tsne_core: Apply t-SNE reduction - dimensionality.mapping: Create and save coordinate mappings - plotting.static: Create and save matplotlib PNG visualizations - plotting.interactive: Create and save Plotly HTML visualizations Functions --------- .. autoapisummary:: build_tools.syllable_analysis.tsne_visualizer.run_tsne_visualization build_tools.syllable_analysis.tsne_visualizer.create_argument_parser build_tools.syllable_analysis.tsne_visualizer.parse_args build_tools.syllable_analysis.tsne_visualizer.main Module Contents --------------- .. py:function:: run_tsne_visualization(input_path, output_dir, perplexity = 30, random_state = 42, dpi = 300, verbose = False, save_mapping = False, interactive = False) Run the complete t-SNE visualization pipeline. This is the main entry point for programmatic use. It handles the full workflow: 1. Load annotated syllables 2. Extract feature matrix 3. Apply t-SNE dimensionality reduction 4. Create visualization 5. Save outputs (PNG + optional HTML + optional mapping) :param input_path: Path to syllables_annotated.json :param output_dir: Directory to save visualization outputs :param perplexity: t-SNE perplexity parameter (default: 30) :param random_state: Random seed for reproducibility (default: 42) :param dpi: Output resolution in dots per inch (default: 300) :param verbose: Print detailed progress information :param save_mapping: Save syllable→features→coordinates mapping as JSON (default: False) :param interactive: Generate interactive HTML visualization (requires Plotly, default: False) :returns: - syllable_count: Number of syllables visualized - feature_count: Number of features (always 12) - output_path: Path to saved visualization PNG - metadata_path: Path to saved metadata file - tsne_coordinates: numpy array of 2D coordinates - mapping_path: Path to mapping JSON (None if save_mapping=False) - interactive_path: Path to interactive HTML (None if interactive=False or Plotly unavailable) - processing_time: Total processing time in seconds :rtype: Dictionary containing :raises FileNotFoundError: If input file does not exist :raises ImportError: If required dependencies are missing :raises ValueError: If input data is invalid .. admonition:: Example >>> from pathlib import Path >>> result = run_tsne_visualization( ... input_path=Path("data/annotated/syllables_annotated.json"), ... output_dir=Path("_working/analysis/tsne/"), ... interactive=True, ... save_mapping=True ... ) >>> print(f"Visualized {result['syllable_count']} syllables") >>> print(f"Interactive HTML: {result['interactive_path']}") .. py:function:: create_argument_parser() Create and return the argument parser for t-SNE visualization. This function creates the ArgumentParser with all CLI options but does not parse arguments. This separation allows Sphinx documentation tools to introspect the parser and auto-generate CLI documentation. Returns ------- argparse.ArgumentParser Configured ArgumentParser ready to parse command-line arguments .. py:function:: parse_args() Parse command-line arguments. :returns: Parsed argument namespace with validated parameters .. py:function:: main() Main entry point for the t-SNE visualization tool.