=========================== Syllable Feature Annotator =========================== .. currentmodule:: build_tools.syllable_feature_annotator Overview -------- .. automodule:: build_tools.syllable_feature_annotator :no-members: Command-Line Interface ---------------------- .. argparse:: :module: build_tools.syllable_feature_annotator.cli :func: create_argument_parser :prog: python -m build_tools.syllable_feature_annotator Output Format ------------- Input/Output Contract ~~~~~~~~~~~~~~~~~~~~~ **Inputs** (from syllable normaliser): - ``syllables_unique.txt`` - One canonical syllable per line - ``syllables_frequencies.json`` - ``{"syllable": count}`` mapping **Output**: - ``syllables_annotated.json`` - Array of syllable records with features Output Structure ~~~~~~~~~~~~~~~~ The annotator produces JSON with this structure: .. code-block:: json [ { "syllable": "kran", "frequency": 7, "features": { "starts_with_vowel": false, "starts_with_cluster": true, "starts_with_heavy_cluster": false, "contains_plosive": true, "contains_fricative": false, "contains_liquid": true, "contains_nasal": true, "short_vowel": true, "long_vowel": false, "ends_with_vowel": false, "ends_with_nasal": true, "ends_with_stop": false } } ] **Feature set:** All 12 features are applied to every syllable: - Onset features (starts_with_vowel, starts_with_cluster, starts_with_heavy_cluster) - Content features (contains_plosive, contains_fricative, contains_liquid, contains_nasal) - Vowel features (short_vowel, long_vowel) - Coda features (ends_with_vowel, ends_with_nasal, ends_with_stop) Integration Guide ----------------- The feature annotator sits between the normaliser and pattern development: .. code-block:: bash # Step 1: Normalize syllables from corpus python -m build_tools.pyphen_syllable_normaliser \ --source data/corpus/ \ --output data/normalized/ # Step 2: Annotate normalized syllables with features python -m build_tools.syllable_feature_annotator \ --syllables data/normalized/syllables_unique.txt \ --frequencies data/normalized/syllables_frequencies.json \ --output data/annotated/syllables_annotated.json # Step 3: Use annotated syllables for pattern generation (future) **When to use this tool:** - After syllable normalization is complete - Before developing phonotactic patterns or constraints - To add structural feature metadata to your syllable corpus - For analysis tasks requiring feature-based filtering or grouping Notes ----- **Features are structural observations:** Features are structural observations based on phoneme presence, not linguistic interpretations. This ensures deterministic, language-agnostic detection. **Processing characteristics:** - Fast and deterministic (same input = same output) - All 12 features applied to every syllable (no selective detection) - Designed to integrate seamlessly with syllable normalizer output **Build-time tool:** This is a build-time tool only - not used during runtime name generation. API Reference ------------- .. automodule:: build_tools.syllable_feature_annotator :members: :undoc-members: :show-inheritance: