build_tools.syllable_feature_annotator.phoneme_sets

Character class definitions for syllable feature annotation.

This module defines character sets used for structural pattern detection in syllables. These sets are language-agnostic and based purely on observable character properties.

Design Principles

  1. Pure data structures: Sets only, no logic or behavior

  2. Explicit membership: Clear, enumerable character classes

  3. Immutable definitions: Constants that don’t change at runtime

  4. Set-based lookup: O(1) membership testing for performance

Character Classes

VOWELSset[str]

Vowel characters (a, e, i, o, u)

PLOSIVESset[str]

Plosive/stop consonants that inject hardness (p, t, k, b, d, g)

FRICATIVESset[str]

Fricative consonants with continuous airflow (f, s, z, v, h)

NASALSset[str]

Nasal consonants (m, n)

LIQUIDSset[str]

Liquid consonants (l, r, w)

STOPSset[str]

Consonants that terminate flow (plosives + q)

Usage

Character sets are used for membership testing in feature detection:

>>> from build_tools.syllable_feature_annotator.phoneme_sets import VOWELS, PLOSIVES
>>> 'a' in VOWELS
True
>>> 't' in PLOSIVES
True
>>> 's' in PLOSIVES
False

Implementation Notes

  • Set Construction: Using set("abc") converts a string to a character set efficiently

  • Set Operations: STOPS is constructed using set union (|) operator

  • Performance: Set membership testing is O(1), making it ideal for frequent lookups

  • Immutability: These are module-level constants and should not be modified at runtime

Why Sets?

Using sets instead of lists or strings provides:

  1. Fast Membership Testing: O(1) vs O(n) for lists

  2. Clear Intent: “Does this character belong to this class?”

  3. Set Operations: Easy to combine classes (union, intersection, difference)

  4. No Duplicates: Character uniqueness enforced automatically

Example

Check if a syllable starts with a vowel:

from build_tools.syllable_feature_annotator.phoneme_sets import VOWELS

syllable = "apple"
if syllable and syllable[0] in VOWELS:
    print("Starts with vowel")

Combining character classes:

from build_tools.syllable_feature_annotator.phoneme_sets import PLOSIVES, FRICATIVES

# All consonants that are either plosives or fricatives
obstruents = PLOSIVES | FRICATIVES

if any(char in obstruents for char in syllable):
    print("Contains obstruent")

Design Notes

Why ‘q’ is in STOPS but not PLOSIVES:

The distinction between PLOSIVES and STOPS is subtle but intentional:

  • PLOSIVES: Characters that inject hardness/texture anywhere in a syllable

  • STOPS: Characters that specifically terminate flow at syllable boundaries

The character ‘q’ contributes to closure (stopping flow) but doesn’t necessarily contribute the same internal plosive texture as ‘p’, ‘t’, ‘k’, etc. This separation allows for more nuanced feature detection in coda positions.

Why these specific characters?:

These character classes are designed for the canonical syllables produced by the syllable normalizer, which strips diacritics and normalizes to ASCII lowercase. The sets focus on the most common phonetic patterns in the normalized corpus.

Future Extensions

Additional character classes can be added as needed for more sophisticated feature detection (e.g., APPROXIMANTS, SIBILANTS, GLIDES). The modular design makes extension straightforward without affecting existing detectors.

Attributes

VOWELS

PLOSIVES

FRICATIVES

NASALS

LIQUIDS

STOPS

Module Contents

build_tools.syllable_feature_annotator.phoneme_sets.VOWELS
build_tools.syllable_feature_annotator.phoneme_sets.PLOSIVES
build_tools.syllable_feature_annotator.phoneme_sets.FRICATIVES
build_tools.syllable_feature_annotator.phoneme_sets.NASALS
build_tools.syllable_feature_annotator.phoneme_sets.LIQUIDS
build_tools.syllable_feature_annotator.phoneme_sets.STOPS