build_tools.syllable_analysis.feature_signatures ================================================ .. py:module:: build_tools.syllable_analysis.feature_signatures .. autoapi-nested-parse:: Feature Signature Analysis Tool This build-time analysis tool examines the annotated syllable corpus to identify which feature combinations actually exist in the data and how frequently each combination appears. A "feature signature" is the set of all active (True) features for a syllable. For example, a syllable with only "starts_with_vowel" and "ends_with_vowel" active would have the signature: ('ends_with_vowel', 'starts_with_vowel'). This analysis helps answer questions like: - What feature patterns are most common in natural language? - Are certain feature combinations rare or impossible? - How diverse is the feature space in the corpus? Output is saved to _working/analysis/feature_signatures/ for review. Functions --------- .. autoapisummary:: build_tools.syllable_analysis.feature_signatures.extract_signature build_tools.syllable_analysis.feature_signatures.analyze_feature_signatures build_tools.syllable_analysis.feature_signatures.format_signature_report build_tools.syllable_analysis.feature_signatures.save_report build_tools.syllable_analysis.feature_signatures.run_analysis build_tools.syllable_analysis.feature_signatures.create_argument_parser build_tools.syllable_analysis.feature_signatures.parse_args build_tools.syllable_analysis.feature_signatures.main Module Contents --------------- .. py:function:: extract_signature(features) Extract the feature signature from a feature dictionary. A signature is a sorted tuple of feature names where the feature value is True. This creates a canonical representation of the active feature set. :param features: Dictionary mapping feature names to boolean values :returns: Sorted tuple of feature names that are active (True) .. admonition:: Example >>> extract_signature({"starts_with_vowel": True, "ends_with_vowel": False}) ('starts_with_vowel',) .. py:function:: analyze_feature_signatures(records) Analyze feature signatures across all syllable records. Counts how many syllables share each unique feature signature. :param records: List of syllable records from syllables_annotated.json Each record should have "syllable", "frequency", and "features" keys :returns: Counter mapping feature signatures to occurrence counts .. admonition:: Example >>> records = [ ... {"syllable": "ka", "features": {"starts_with_vowel": False}}, ... {"syllable": "a", "features": {"starts_with_vowel": True}} ... ] >>> counter = analyze_feature_signatures(records) >>> counter[('starts_with_vowel',)] 1 .. py:function:: format_signature_report(signature_counter, total_syllables, limit = None) Format the signature analysis results as a human-readable report. :param signature_counter: Counter of signatures to their occurrence counts :param total_syllables: Total number of syllables in the corpus :param limit: Maximum number of signatures to include (None = all) :returns: Formatted multi-line string report .. py:function:: save_report(report, output_dir) Save the formatted report to the output directory. :param report: Formatted report string :param output_dir: Directory to save the report in :returns: Path to the saved report file .. py:function:: run_analysis(input_path, output_dir, limit = None) Run the complete feature signature analysis pipeline. :param input_path: Path to syllables_annotated.json :param output_dir: Directory to save analysis results :param limit: Maximum number of signatures to include in report (None = all) :returns: - total_syllables: Total number of syllables analyzed - unique_signatures: Number of unique feature signatures - output_path: Path to the saved report :rtype: Dictionary with analysis results including .. py:function:: create_argument_parser() Create and return the argument parser for feature signature analysis. This function creates the ArgumentParser with all CLI options but does not parse arguments. This separation allows Sphinx documentation tools to introspect the parser and auto-generate CLI documentation. Returns ------- argparse.ArgumentParser Configured ArgumentParser ready to parse command-line arguments .. py:function:: parse_args() Parse command-line arguments. .. py:function:: main() Main entry point for the feature signature analysis tool.