build_tools.syllable_analysis.dimensionality.feature_matrix =========================================================== .. py:module:: build_tools.syllable_analysis.dimensionality.feature_matrix .. autoapi-nested-parse:: Feature matrix extraction for dimensionality reduction. This module provides utilities for extracting numerical feature matrices from annotated syllable records. The matrices are suitable for dimensionality reduction algorithms like t-SNE, PCA, UMAP, etc. Attributes ---------- .. autoapisummary:: build_tools.syllable_analysis.dimensionality.feature_matrix.ALL_FEATURES Functions --------- .. autoapisummary:: build_tools.syllable_analysis.dimensionality.feature_matrix.extract_feature_matrix build_tools.syllable_analysis.dimensionality.feature_matrix.validate_feature_matrix build_tools.syllable_analysis.dimensionality.feature_matrix.get_feature_vector Module Contents --------------- .. py:data:: ALL_FEATURES :value: ['contains_liquid', 'contains_plosive', 'contains_fricative', 'contains_nasal', 'long_vowel',... .. py:function:: extract_feature_matrix(records, feature_names = ALL_FEATURES) Extract binary feature matrix from annotated syllable records. Converts feature dictionaries to a numerical matrix suitable for dimensionality reduction algorithms. Each row represents a syllable, each column represents a feature (0 or 1). :param records: List of annotated syllable records with 'features' and 'frequency' keys. Each record should have structure: { "syllable": "ka", "frequency": 187, "features": {"contains_liquid": False, "contains_plosive": True, ...} } :param feature_names: Ordered list of feature names to extract (default: ALL_FEATURES). Order determines column order in output matrix. :returns: - feature_matrix: numpy array of shape (n_syllables, n_features) with binary values - frequencies: List of frequency counts for each syllable :rtype: Tuple of (feature_matrix, frequencies) .. admonition:: Example >>> records = [ ... { ... "syllable": "ka", ... "frequency": 187, ... "features": {"contains_liquid": False, "contains_plosive": True, ...} ... } ... ] >>> matrix, freqs = extract_feature_matrix(records) >>> matrix.shape (1, 12) >>> freqs [187] .. admonition:: Notes - Missing features default to False (0) - Feature values are converted to int (True→1, False→0) - Output matrix dtype is int for memory efficiency - Empty record list returns (0, n_features) shaped array .. py:function:: validate_feature_matrix(feature_matrix, expected_features = 12) Validate feature matrix shape and contents. Ensures the feature matrix has the expected structure for dimensionality reduction algorithms. :param feature_matrix: Binary feature matrix :param expected_features: Expected number of features (default: 12) :raises ValueError: If validation fails (wrong shape, non-binary values, etc.) .. admonition:: Example >>> matrix = np.array([[1, 0, 1], [0, 1, 0]]) >>> validate_feature_matrix(matrix, expected_features=3) # OK >>> validate_feature_matrix(matrix, expected_features=4) # Raises ValueError .. py:function:: get_feature_vector(features, feature_names = ALL_FEATURES) Extract a single feature vector from a feature dictionary. Converts a dictionary of feature flags to an ordered binary vector. Useful for extracting vectors from individual syllables. :param features: Dictionary of feature name → boolean value :param feature_names: Ordered list of feature names (default: ALL_FEATURES) :returns: Binary feature vector matching feature_names order .. admonition:: Example >>> features = {"contains_liquid": True, "contains_plosive": False} >>> vector = get_feature_vector(features, ["contains_liquid", "contains_plosive"]) >>> vector [1, 0] .. admonition:: Notes - Missing features default to False (0) - Order of output matches order of feature_names - Output is Python list, not numpy array (for flexibility)