build_tools.syllable_feature_annotator.phoneme_sets
Character class definitions for syllable feature annotation.
This module defines character sets used for structural pattern detection in syllables. These sets are language-agnostic and based purely on observable character properties.
Design Principles
Pure data structures: Sets only, no logic or behavior
Explicit membership: Clear, enumerable character classes
Immutable definitions: Constants that don’t change at runtime
Set-based lookup: O(1) membership testing for performance
Character Classes
- VOWELSset[str]
Vowel characters (a, e, i, o, u)
- PLOSIVESset[str]
Plosive/stop consonants that inject hardness (p, t, k, b, d, g)
- FRICATIVESset[str]
Fricative consonants with continuous airflow (f, s, z, v, h)
- NASALSset[str]
Nasal consonants (m, n)
- LIQUIDSset[str]
Liquid consonants (l, r, w)
- STOPSset[str]
Consonants that terminate flow (plosives + q)
Usage
Character sets are used for membership testing in feature detection:
>>> from build_tools.syllable_feature_annotator.phoneme_sets import VOWELS, PLOSIVES
>>> 'a' in VOWELS
True
>>> 't' in PLOSIVES
True
>>> 's' in PLOSIVES
False
Implementation Notes
Set Construction: Using
set("abc")converts a string to a character set efficientlySet Operations: STOPS is constructed using set union (
|) operatorPerformance: Set membership testing is O(1), making it ideal for frequent lookups
Immutability: These are module-level constants and should not be modified at runtime
Why Sets?
Using sets instead of lists or strings provides:
Fast Membership Testing: O(1) vs O(n) for lists
Clear Intent: “Does this character belong to this class?”
Set Operations: Easy to combine classes (union, intersection, difference)
No Duplicates: Character uniqueness enforced automatically
Example
Check if a syllable starts with a vowel:
from build_tools.syllable_feature_annotator.phoneme_sets import VOWELS
syllable = "apple"
if syllable and syllable[0] in VOWELS:
print("Starts with vowel")
Combining character classes:
from build_tools.syllable_feature_annotator.phoneme_sets import PLOSIVES, FRICATIVES
# All consonants that are either plosives or fricatives
obstruents = PLOSIVES | FRICATIVES
if any(char in obstruents for char in syllable):
print("Contains obstruent")
Design Notes
Why ‘q’ is in STOPS but not PLOSIVES:
The distinction between PLOSIVES and STOPS is subtle but intentional:
PLOSIVES: Characters that inject hardness/texture anywhere in a syllable
STOPS: Characters that specifically terminate flow at syllable boundaries
The character ‘q’ contributes to closure (stopping flow) but doesn’t necessarily contribute the same internal plosive texture as ‘p’, ‘t’, ‘k’, etc. This separation allows for more nuanced feature detection in coda positions.
Why these specific characters?:
These character classes are designed for the canonical syllables produced by the syllable normalizer, which strips diacritics and normalizes to ASCII lowercase. The sets focus on the most common phonetic patterns in the normalized corpus.
Future Extensions
Additional character classes can be added as needed for more sophisticated feature detection (e.g., APPROXIMANTS, SIBILANTS, GLIDES). The modular design makes extension straightforward without affecting existing detectors.
Attributes
Module Contents
- build_tools.syllable_feature_annotator.phoneme_sets.VOWELS
- build_tools.syllable_feature_annotator.phoneme_sets.PLOSIVES
- build_tools.syllable_feature_annotator.phoneme_sets.FRICATIVES
- build_tools.syllable_feature_annotator.phoneme_sets.NASALS
- build_tools.syllable_feature_annotator.phoneme_sets.LIQUIDS
- build_tools.syllable_feature_annotator.phoneme_sets.STOPS