build_tools.syllable_analysis.feature_signatures

Feature Signature Analysis Tool

This build-time analysis tool examines the annotated syllable corpus to identify which feature combinations actually exist in the data and how frequently each combination appears.

A “feature signature” is the set of all active (True) features for a syllable. For example, a syllable with only “starts_with_vowel” and “ends_with_vowel” active would have the signature: (‘ends_with_vowel’, ‘starts_with_vowel’).

This analysis helps answer questions like: - What feature patterns are most common in natural language? - Are certain feature combinations rare or impossible? - How diverse is the feature space in the corpus?

Output is saved to _working/analysis/feature_signatures/ for review.

Functions

extract_signature(features)

Extract the feature signature from a feature dictionary.

analyze_feature_signatures(records)

Analyze feature signatures across all syllable records.

format_signature_report(signature_counter, total_syllables)

Format the signature analysis results as a human-readable report.

save_report(report, output_dir)

Save the formatted report to the output directory.

run_analysis(input_path, output_dir[, limit])

Run the complete feature signature analysis pipeline.

create_argument_parser()

Create and return the argument parser for feature signature analysis.

parse_args()

Parse command-line arguments.

main()

Main entry point for the feature signature analysis tool.

Module Contents

build_tools.syllable_analysis.feature_signatures.extract_signature(features)[source]

Extract the feature signature from a feature dictionary.

A signature is a sorted tuple of feature names where the feature value is True. This creates a canonical representation of the active feature set.

Parameters:

features (dict[str, bool]) – Dictionary mapping feature names to boolean values

Returns:

Sorted tuple of feature names that are active (True)

Return type:

tuple[str, Ellipsis]

Example

>>> extract_signature({"starts_with_vowel": True, "ends_with_vowel": False})
('starts_with_vowel',)
build_tools.syllable_analysis.feature_signatures.analyze_feature_signatures(records)[source]

Analyze feature signatures across all syllable records.

Counts how many syllables share each unique feature signature.

Parameters:

records (list[dict]) – List of syllable records from syllables_annotated.json Each record should have “syllable”, “frequency”, and “features” keys

Returns:

Counter mapping feature signatures to occurrence counts

Return type:

collections.Counter

Example

>>> records = [
...     {"syllable": "ka", "features": {"starts_with_vowel": False}},
...     {"syllable": "a", "features": {"starts_with_vowel": True}}
... ]
>>> counter = analyze_feature_signatures(records)
>>> counter[('starts_with_vowel',)]
1
build_tools.syllable_analysis.feature_signatures.format_signature_report(signature_counter, total_syllables, limit=None)[source]

Format the signature analysis results as a human-readable report.

Parameters:
  • signature_counter (collections.Counter) – Counter of signatures to their occurrence counts

  • total_syllables (int) – Total number of syllables in the corpus

  • limit (int | None) – Maximum number of signatures to include (None = all)

Returns:

Formatted multi-line string report

Return type:

str

build_tools.syllable_analysis.feature_signatures.save_report(report, output_dir)[source]

Save the formatted report to the output directory.

Parameters:
  • report (str) – Formatted report string

  • output_dir (pathlib.Path) – Directory to save the report in

Returns:

Path to the saved report file

Return type:

pathlib.Path

build_tools.syllable_analysis.feature_signatures.run_analysis(input_path, output_dir, limit=None)[source]

Run the complete feature signature analysis pipeline.

Parameters:
  • input_path (pathlib.Path) – Path to syllables_annotated.json

  • output_dir (pathlib.Path) – Directory to save analysis results

  • limit (int | None) – Maximum number of signatures to include in report (None = all)

Returns:

  • total_syllables: Total number of syllables analyzed

  • unique_signatures: Number of unique feature signatures

  • output_path: Path to the saved report

Return type:

Dictionary with analysis results including

build_tools.syllable_analysis.feature_signatures.create_argument_parser()[source]

Create and return the argument parser for feature signature analysis.

This function creates the ArgumentParser with all CLI options but does not parse arguments. This separation allows Sphinx documentation tools to introspect the parser and auto-generate CLI documentation.

Returns

argparse.ArgumentParser

Configured ArgumentParser ready to parse command-line arguments

build_tools.syllable_analysis.feature_signatures.parse_args()[source]

Parse command-line arguments.

build_tools.syllable_analysis.feature_signatures.main()[source]

Main entry point for the feature signature analysis tool.