build_tools.syllable_feature_annotator.feature_rules
Feature detection rules for syllable annotation.
This module defines pure, deterministic feature detectors that observe structural patterns in syllables. Each detector is a boolean function that takes a syllable string and returns True or False based on observable character patterns.
Design Principles
Deterministic: Same input always produces same output
Pure Functions: No state, no side effects, no randomness, no I/O
Language-Agnostic: Structural patterns only, no linguistic interpretation
Feature Independence: Detectors don’t depend on each other
Conservative Detection: Approximate patterns without overthinking
Feature Categories
- Onset Features - Syllable-initial patterns
starts_with_vowel: Syllable begins with vowel (open onset)
starts_with_cluster: Initial consonant cluster (2+ consonants)
starts_with_heavy_cluster: Heavy initial cluster (3+ consonants)
- Internal Features - Manner of articulation presence
contains_plosive: Contains plosive consonant (p, t, k, b, d, g)
contains_fricative: Contains fricative consonant (f, s, z, v, h)
contains_liquid: Contains liquid consonant (l, r, w)
contains_nasal: Contains nasal consonant (m, n)
- Nucleus Features - Vowel structure (length proxies)
short_vowel: Exactly one vowel (closed/short syllable)
long_vowel: Two or more vowels (open/long syllable)
- Coda Features - Syllable-final patterns
ends_with_vowel: Syllable ends with vowel (open syllable)
ends_with_nasal: Syllable ends with nasal consonant
ends_with_stop: Syllable ends with stop consonant
Usage
Feature detectors can be called directly:
>>> from build_tools.syllable_feature_annotator.feature_rules import (
... starts_with_cluster, contains_plosive, short_vowel
... )
>>> starts_with_cluster("kran")
True
>>> contains_plosive("kran")
True
>>> short_vowel("kran")
True
Or accessed via the feature registry:
>>> from build_tools.syllable_feature_annotator.feature_rules import FEATURE_DETECTORS
>>> detector = FEATURE_DETECTORS["starts_with_cluster"]
>>> detector("kran")
True
Applying all features to a syllable:
>>> syllable = "spla"
>>> features = {
... name: detector(syllable)
... for name, detector in FEATURE_DETECTORS.items()
... }
>>> features["starts_with_heavy_cluster"]
True
Implementation Notes
Nucleus Logic is Intentionally Simple:
The short_vowel and long_vowel detectors are NOT linguistic vowel length. They are structural proxies for: - Syllable weight (light vs heavy) - Openness/closedness patterns - Nucleus complexity
This is a deliberate simplification that provides sufficient signal for downstream pattern generation without requiring linguistic analysis.
Heavy Cluster Definition is Future-Safe:
The heavy cluster detector (3+ consonants) is a placeholder that can be refined later without breaking downstream consumers. Current definition is conservative and catches the most obvious cases.
Conservative Detection:
Detectors use simple character-based rules. For example, starts_with_cluster just checks if first two characters are non-vowels. This intentionally: - Catches clear cases (tr, kr, st, etc.) - Avoids overthinking language-specific rules - Maintains determinism across different syllable sources
Why Feature Independence Matters
No detector depends on another detector’s output. This is critical because:
Composability: Downstream consumers can combine features freely
Invertibility: Features can be weighted positively or negatively
Extensibility: New features don’t break existing ones
Testability: Each feature can be tested in isolation
Clarity: Each rule has one clear responsibility
Examples
Classify a simple syllable:
>>> syllable = "na"
>>> starts_with_cluster("na")
False
>>> short_vowel("na")
True
>>> ends_with_vowel("na")
True
Classify a complex cluster:
>>> syllable = "spla"
>>> starts_with_heavy_cluster("spla")
True
>>> starts_with_cluster("spla") # Also true - heavy clusters are clusters
True
>>> contains_liquid("spla")
True
Classify a closed syllable:
>>> syllable = "takt"
>>> contains_plosive("takt")
True
>>> ends_with_stop("takt")
True
>>> short_vowel("takt")
True
Test edge cases:
>>> starts_with_vowel("") # Empty string
False
>>> short_vowel("a") # Single vowel
True
>>> long_vowel("ae") # Diphthong
True
Attributes
Functions
Detect if syllable starts with a vowel (vowel-initial or open onset). |
|
Detect if syllable starts with a consonant cluster (2+ consonants). |
|
Detect if syllable starts with a heavy consonant cluster (3+ consonants). |
|
Detect if syllable contains any plosive consonant. |
|
Detect if syllable contains any fricative consonant. |
|
Detect if syllable contains any liquid consonant. |
|
Detect if syllable contains any nasal consonant. |
|
|
Detect if syllable has exactly one vowel (short vowel proxy). |
|
Detect if syllable has two or more vowels (long vowel proxy). |
Detect if syllable ends with a vowel (open syllable). |
|
Detect if syllable ends with a nasal consonant (nasal coda). |
|
Detect if syllable ends with a stop consonant (stop coda). |
Module Contents
- build_tools.syllable_feature_annotator.feature_rules.starts_with_vowel(s)[source]
Detect if syllable starts with a vowel (vowel-initial or open onset).
This feature identifies syllables that begin directly with a vowel, without any initial consonant. Such syllables have an “open onset” in phonological terms.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable starts with vowel, False otherwise
Examples
>>> starts_with_vowel("apple") True >>> starts_with_vowel("kran") False >>> starts_with_vowel("a") True >>> starts_with_vowel("") # Edge case: empty string False
Notes
Empty strings return False (no onset to analyze)
Only checks the first character
Vowels are defined in phoneme_sets.VOWELS (a, e, i, o, u)
- build_tools.syllable_feature_annotator.feature_rules.starts_with_cluster(s)[source]
Detect if syllable starts with a consonant cluster (2+ consonants).
A consonant cluster is two or more adjacent consonants at the beginning of a syllable. This creates increased phonetic complexity and affects pronunciation difficulty and syllable weight.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable starts with 2+ consonants, False otherwise
Examples
>>> starts_with_cluster("kran") True >>> starts_with_cluster("train") True >>> starts_with_cluster("na") False >>> starts_with_cluster("a") False
Notes
Requires at least 2 characters
Checks that first two characters are both non-vowels
Conservative detection: catches obvious clusters (tr, kr, st, etc.)
Does not handle vowel-glides or language-specific edge cases
Heavy clusters (3+ consonants) will also trigger this detector
- build_tools.syllable_feature_annotator.feature_rules.starts_with_heavy_cluster(s)[source]
Detect if syllable starts with a heavy consonant cluster (3+ consonants).
Heavy clusters are particularly complex initial consonant sequences. These are relatively rare in natural language but create distinctive phonetic patterns when present.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable starts with 3+ consonants, False otherwise
Examples
>>> starts_with_heavy_cluster("spla") True >>> starts_with_heavy_cluster("stra") True >>> starts_with_heavy_cluster("kran") False >>> starts_with_heavy_cluster("na") False
Notes
Requires at least 3 characters
Checks that first three characters are all non-vowels
Future-safe: can be refined or replaced without breaking consumers
This is a placeholder definition that catches obvious cases
Syllables with heavy clusters will also trigger starts_with_cluster
- build_tools.syllable_feature_annotator.feature_rules.contains_plosive(s)[source]
Detect if syllable contains any plosive consonant.
Plosives (p, t, k, b, d, g) are consonants produced by completely blocking airflow then releasing it suddenly. They inject “hardness” and percussive texture into syllables.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable contains any plosive, False otherwise
Examples
>>> contains_plosive("takt") True >>> contains_plosive("pat") True >>> contains_plosive("sal") False >>> contains_plosive("") False
Notes
Checks entire syllable, not just specific positions
Plosives defined in phoneme_sets.PLOSIVES (p, t, k, b, d, g)
Multiple plosives in one syllable still return True
Empty strings return False
- build_tools.syllable_feature_annotator.feature_rules.contains_fricative(s)[source]
Detect if syllable contains any fricative consonant.
Fricatives (f, s, z, v, h) are consonants produced by forcing air through a narrow channel, creating turbulent airflow and friction. They create “hissing” or “buzzing” texture.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable contains any fricative, False otherwise
Examples
>>> contains_fricative("fish") True >>> contains_fricative("zone") True >>> contains_fricative("bat") False >>> contains_fricative("") False
Notes
Checks entire syllable, not just specific positions
Fricatives defined in phoneme_sets.FRICATIVES (f, s, z, v, h)
Multiple fricatives in one syllable still return True
Empty strings return False
- build_tools.syllable_feature_annotator.feature_rules.contains_liquid(s)[source]
Detect if syllable contains any liquid consonant.
Liquids (l, r, w) are consonants with vowel-like qualities that flow smoothly. They have lateral (l) or rhotic (r) characteristics and contribute to syllable fluidity.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable contains any liquid, False otherwise
Examples
>>> contains_liquid("kran") True >>> contains_liquid("slow") True >>> contains_liquid("bat") False >>> contains_liquid("") False
Notes
Checks entire syllable, not just specific positions
Liquids defined in phoneme_sets.LIQUIDS (l, r, w)
‘w’ is included due to its semi-vowel/glide properties
Multiple liquids in one syllable still return True
Empty strings return False
- build_tools.syllable_feature_annotator.feature_rules.contains_nasal(s)[source]
Detect if syllable contains any nasal consonant.
Nasals (m, n) are consonants where air flows through the nasal cavity. They have resonant qualities and often appear in coda positions, contributing to syllable closure patterns.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable contains any nasal, False otherwise
Examples
>>> contains_nasal("kran") True >>> contains_nasal("man") True >>> contains_nasal("bat") False >>> contains_nasal("") False
Notes
Checks entire syllable, not just specific positions
Nasals defined in phoneme_sets.NASALS (m, n)
Multiple nasals in one syllable still return True
Empty strings return False
See also: ends_with_nasal for coda-specific detection
- build_tools.syllable_feature_annotator.feature_rules.short_vowel(s)[source]
Detect if syllable has exactly one vowel (short vowel proxy).
This is a structural proxy for syllable weight and nucleus complexity, not linguistic vowel length. Syllables with one vowel tend to be lighter and more closed.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable contains exactly one vowel, False otherwise
Examples
>>> short_vowel("bat") True >>> short_vowel("kran") True >>> short_vowel("beat") # 'ea' = 2 vowels False >>> short_vowel("") False
Notes
Counts total vowels in syllable (any position)
Returns True only if count == 1
Not linguistic vowel length (short vs long /a/ vs /aː/)
Provides proxy for syllable weight and openness
Mutually exclusive with long_vowel
Empty strings return False (no nucleus)
- build_tools.syllable_feature_annotator.feature_rules.long_vowel(s)[source]
Detect if syllable has two or more vowels (long vowel proxy).
This is a structural proxy for syllable weight and nucleus complexity, not linguistic vowel length. Syllables with multiple vowels tend to be heavier and more open, including diphthongs and vowel sequences.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable contains 2+ vowels, False otherwise
Examples
>>> long_vowel("beat") # 'ea' = 2 vowels True >>> long_vowel("aura") # 'au' + 'a' = 3 vowels True >>> long_vowel("bat") False >>> long_vowel("") False
Notes
Counts total vowels in syllable (any position)
Returns True if count >= 2
Not linguistic vowel length (short vs long /a/ vs /aː/)
Catches diphthongs (ae, au, etc.) and vowel sequences
Provides proxy for syllable weight and complexity
Mutually exclusive with short_vowel
Empty strings return False (no nucleus)
- build_tools.syllable_feature_annotator.feature_rules.ends_with_vowel(s)[source]
Detect if syllable ends with a vowel (open syllable).
Syllables ending in vowels are “open” in phonological terms. They tend to have higher sonority and different prosodic properties compared to consonant-final syllables.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable ends with vowel, False otherwise
Examples
>>> ends_with_vowel("na") True >>> ends_with_vowel("hello") True >>> ends_with_vowel("bat") False >>> ends_with_vowel("") False
Notes
Only checks the final character
Vowels defined in phoneme_sets.VOWELS (a, e, i, o, u)
Open syllables (vowel-final) vs closed syllables (consonant-final)
Empty strings return False (no coda to analyze)
Mutually exclusive with ends_with_nasal and ends_with_stop
- build_tools.syllable_feature_annotator.feature_rules.ends_with_nasal(s)[source]
Detect if syllable ends with a nasal consonant (nasal coda).
Nasal codas (m, n) create specific closure patterns and resonance. They are common syllable-final consonants across many languages and contribute to syllable weight.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable ends with nasal, False otherwise
Examples
>>> ends_with_nasal("turn") True >>> ends_with_nasal("man") True >>> ends_with_nasal("bat") False >>> ends_with_nasal("") False
Notes
Only checks the final character
Nasals defined in phoneme_sets.NASALS (m, n)
Nasal codas are distinct from stop codas in sonority
Empty strings return False (no coda to analyze)
See also: contains_nasal for position-independent detection
- build_tools.syllable_feature_annotator.feature_rules.ends_with_stop(s)[source]
Detect if syllable ends with a stop consonant (stop coda).
Stop codas create abrupt syllable termination with complete airflow closure. They include plosives and other stops that contribute to syllable closure and weight.
Parameters
- sstr
Syllable string to analyze
Returns
- bool
True if syllable ends with stop, False otherwise
Examples
>>> ends_with_stop("takt") True >>> ends_with_stop("bat") True >>> ends_with_stop("man") False >>> ends_with_stop("") False
Notes
Only checks the final character
Stops defined in phoneme_sets.STOPS (p, t, k, b, d, g, q)
STOPS includes all PLOSIVES plus ‘q’ (terminal closure)
Stop codas create heavier, more closed syllables
Empty strings return False (no coda to analyze)
Distinction: STOPS for coda detection, PLOSIVES for internal texture
- build_tools.syllable_feature_annotator.feature_rules.FEATURE_DETECTORS