build_tools.syllable_feature_annotator.feature_rules

Feature detection rules for syllable annotation.

This module defines pure, deterministic feature detectors that observe structural patterns in syllables. Each detector is a boolean function that takes a syllable string and returns True or False based on observable character patterns.

Design Principles

  1. Deterministic: Same input always produces same output

  2. Pure Functions: No state, no side effects, no randomness, no I/O

  3. Language-Agnostic: Structural patterns only, no linguistic interpretation

  4. Feature Independence: Detectors don’t depend on each other

  5. Conservative Detection: Approximate patterns without overthinking

Feature Categories

Onset Features - Syllable-initial patterns
  • starts_with_vowel: Syllable begins with vowel (open onset)

  • starts_with_cluster: Initial consonant cluster (2+ consonants)

  • starts_with_heavy_cluster: Heavy initial cluster (3+ consonants)

Internal Features - Manner of articulation presence
  • contains_plosive: Contains plosive consonant (p, t, k, b, d, g)

  • contains_fricative: Contains fricative consonant (f, s, z, v, h)

  • contains_liquid: Contains liquid consonant (l, r, w)

  • contains_nasal: Contains nasal consonant (m, n)

Nucleus Features - Vowel structure (length proxies)
  • short_vowel: Exactly one vowel (closed/short syllable)

  • long_vowel: Two or more vowels (open/long syllable)

Coda Features - Syllable-final patterns
  • ends_with_vowel: Syllable ends with vowel (open syllable)

  • ends_with_nasal: Syllable ends with nasal consonant

  • ends_with_stop: Syllable ends with stop consonant

Usage

Feature detectors can be called directly:

>>> from build_tools.syllable_feature_annotator.feature_rules import (
...     starts_with_cluster, contains_plosive, short_vowel
... )
>>> starts_with_cluster("kran")
True
>>> contains_plosive("kran")
True
>>> short_vowel("kran")
True

Or accessed via the feature registry:

>>> from build_tools.syllable_feature_annotator.feature_rules import FEATURE_DETECTORS
>>> detector = FEATURE_DETECTORS["starts_with_cluster"]
>>> detector("kran")
True

Applying all features to a syllable:

>>> syllable = "spla"
>>> features = {
...     name: detector(syllable)
...     for name, detector in FEATURE_DETECTORS.items()
... }
>>> features["starts_with_heavy_cluster"]
True

Implementation Notes

Nucleus Logic is Intentionally Simple:

The short_vowel and long_vowel detectors are NOT linguistic vowel length. They are structural proxies for: - Syllable weight (light vs heavy) - Openness/closedness patterns - Nucleus complexity

This is a deliberate simplification that provides sufficient signal for downstream pattern generation without requiring linguistic analysis.

Heavy Cluster Definition is Future-Safe:

The heavy cluster detector (3+ consonants) is a placeholder that can be refined later without breaking downstream consumers. Current definition is conservative and catches the most obvious cases.

Conservative Detection:

Detectors use simple character-based rules. For example, starts_with_cluster just checks if first two characters are non-vowels. This intentionally: - Catches clear cases (tr, kr, st, etc.) - Avoids overthinking language-specific rules - Maintains determinism across different syllable sources

Why Feature Independence Matters

No detector depends on another detector’s output. This is critical because:

  1. Composability: Downstream consumers can combine features freely

  2. Invertibility: Features can be weighted positively or negatively

  3. Extensibility: New features don’t break existing ones

  4. Testability: Each feature can be tested in isolation

  5. Clarity: Each rule has one clear responsibility

Examples

Classify a simple syllable:

>>> syllable = "na"
>>> starts_with_cluster("na")
False
>>> short_vowel("na")
True
>>> ends_with_vowel("na")
True

Classify a complex cluster:

>>> syllable = "spla"
>>> starts_with_heavy_cluster("spla")
True
>>> starts_with_cluster("spla")  # Also true - heavy clusters are clusters
True
>>> contains_liquid("spla")
True

Classify a closed syllable:

>>> syllable = "takt"
>>> contains_plosive("takt")
True
>>> ends_with_stop("takt")
True
>>> short_vowel("takt")
True

Test edge cases:

>>> starts_with_vowel("")  # Empty string
False
>>> short_vowel("a")  # Single vowel
True
>>> long_vowel("ae")  # Diphthong
True

Attributes

FEATURE_DETECTORS

Functions

starts_with_vowel(s)

Detect if syllable starts with a vowel (vowel-initial or open onset).

starts_with_cluster(s)

Detect if syllable starts with a consonant cluster (2+ consonants).

starts_with_heavy_cluster(s)

Detect if syllable starts with a heavy consonant cluster (3+ consonants).

contains_plosive(s)

Detect if syllable contains any plosive consonant.

contains_fricative(s)

Detect if syllable contains any fricative consonant.

contains_liquid(s)

Detect if syllable contains any liquid consonant.

contains_nasal(s)

Detect if syllable contains any nasal consonant.

short_vowel(s)

Detect if syllable has exactly one vowel (short vowel proxy).

long_vowel(s)

Detect if syllable has two or more vowels (long vowel proxy).

ends_with_vowel(s)

Detect if syllable ends with a vowel (open syllable).

ends_with_nasal(s)

Detect if syllable ends with a nasal consonant (nasal coda).

ends_with_stop(s)

Detect if syllable ends with a stop consonant (stop coda).

Module Contents

build_tools.syllable_feature_annotator.feature_rules.starts_with_vowel(s)[source]

Detect if syllable starts with a vowel (vowel-initial or open onset).

This feature identifies syllables that begin directly with a vowel, without any initial consonant. Such syllables have an “open onset” in phonological terms.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable starts with vowel, False otherwise

Examples

>>> starts_with_vowel("apple")
True
>>> starts_with_vowel("kran")
False
>>> starts_with_vowel("a")
True
>>> starts_with_vowel("")  # Edge case: empty string
False

Notes

  • Empty strings return False (no onset to analyze)

  • Only checks the first character

  • Vowels are defined in phoneme_sets.VOWELS (a, e, i, o, u)

build_tools.syllable_feature_annotator.feature_rules.starts_with_cluster(s)[source]

Detect if syllable starts with a consonant cluster (2+ consonants).

A consonant cluster is two or more adjacent consonants at the beginning of a syllable. This creates increased phonetic complexity and affects pronunciation difficulty and syllable weight.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable starts with 2+ consonants, False otherwise

Examples

>>> starts_with_cluster("kran")
True
>>> starts_with_cluster("train")
True
>>> starts_with_cluster("na")
False
>>> starts_with_cluster("a")
False

Notes

  • Requires at least 2 characters

  • Checks that first two characters are both non-vowels

  • Conservative detection: catches obvious clusters (tr, kr, st, etc.)

  • Does not handle vowel-glides or language-specific edge cases

  • Heavy clusters (3+ consonants) will also trigger this detector

build_tools.syllable_feature_annotator.feature_rules.starts_with_heavy_cluster(s)[source]

Detect if syllable starts with a heavy consonant cluster (3+ consonants).

Heavy clusters are particularly complex initial consonant sequences. These are relatively rare in natural language but create distinctive phonetic patterns when present.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable starts with 3+ consonants, False otherwise

Examples

>>> starts_with_heavy_cluster("spla")
True
>>> starts_with_heavy_cluster("stra")
True
>>> starts_with_heavy_cluster("kran")
False
>>> starts_with_heavy_cluster("na")
False

Notes

  • Requires at least 3 characters

  • Checks that first three characters are all non-vowels

  • Future-safe: can be refined or replaced without breaking consumers

  • This is a placeholder definition that catches obvious cases

  • Syllables with heavy clusters will also trigger starts_with_cluster

build_tools.syllable_feature_annotator.feature_rules.contains_plosive(s)[source]

Detect if syllable contains any plosive consonant.

Plosives (p, t, k, b, d, g) are consonants produced by completely blocking airflow then releasing it suddenly. They inject “hardness” and percussive texture into syllables.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable contains any plosive, False otherwise

Examples

>>> contains_plosive("takt")
True
>>> contains_plosive("pat")
True
>>> contains_plosive("sal")
False
>>> contains_plosive("")
False

Notes

  • Checks entire syllable, not just specific positions

  • Plosives defined in phoneme_sets.PLOSIVES (p, t, k, b, d, g)

  • Multiple plosives in one syllable still return True

  • Empty strings return False

build_tools.syllable_feature_annotator.feature_rules.contains_fricative(s)[source]

Detect if syllable contains any fricative consonant.

Fricatives (f, s, z, v, h) are consonants produced by forcing air through a narrow channel, creating turbulent airflow and friction. They create “hissing” or “buzzing” texture.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable contains any fricative, False otherwise

Examples

>>> contains_fricative("fish")
True
>>> contains_fricative("zone")
True
>>> contains_fricative("bat")
False
>>> contains_fricative("")
False

Notes

  • Checks entire syllable, not just specific positions

  • Fricatives defined in phoneme_sets.FRICATIVES (f, s, z, v, h)

  • Multiple fricatives in one syllable still return True

  • Empty strings return False

build_tools.syllable_feature_annotator.feature_rules.contains_liquid(s)[source]

Detect if syllable contains any liquid consonant.

Liquids (l, r, w) are consonants with vowel-like qualities that flow smoothly. They have lateral (l) or rhotic (r) characteristics and contribute to syllable fluidity.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable contains any liquid, False otherwise

Examples

>>> contains_liquid("kran")
True
>>> contains_liquid("slow")
True
>>> contains_liquid("bat")
False
>>> contains_liquid("")
False

Notes

  • Checks entire syllable, not just specific positions

  • Liquids defined in phoneme_sets.LIQUIDS (l, r, w)

  • ‘w’ is included due to its semi-vowel/glide properties

  • Multiple liquids in one syllable still return True

  • Empty strings return False

build_tools.syllable_feature_annotator.feature_rules.contains_nasal(s)[source]

Detect if syllable contains any nasal consonant.

Nasals (m, n) are consonants where air flows through the nasal cavity. They have resonant qualities and often appear in coda positions, contributing to syllable closure patterns.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable contains any nasal, False otherwise

Examples

>>> contains_nasal("kran")
True
>>> contains_nasal("man")
True
>>> contains_nasal("bat")
False
>>> contains_nasal("")
False

Notes

  • Checks entire syllable, not just specific positions

  • Nasals defined in phoneme_sets.NASALS (m, n)

  • Multiple nasals in one syllable still return True

  • Empty strings return False

  • See also: ends_with_nasal for coda-specific detection

build_tools.syllable_feature_annotator.feature_rules.short_vowel(s)[source]

Detect if syllable has exactly one vowel (short vowel proxy).

This is a structural proxy for syllable weight and nucleus complexity, not linguistic vowel length. Syllables with one vowel tend to be lighter and more closed.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable contains exactly one vowel, False otherwise

Examples

>>> short_vowel("bat")
True
>>> short_vowel("kran")
True
>>> short_vowel("beat")  # 'ea' = 2 vowels
False
>>> short_vowel("")
False

Notes

  • Counts total vowels in syllable (any position)

  • Returns True only if count == 1

  • Not linguistic vowel length (short vs long /a/ vs /aː/)

  • Provides proxy for syllable weight and openness

  • Mutually exclusive with long_vowel

  • Empty strings return False (no nucleus)

build_tools.syllable_feature_annotator.feature_rules.long_vowel(s)[source]

Detect if syllable has two or more vowels (long vowel proxy).

This is a structural proxy for syllable weight and nucleus complexity, not linguistic vowel length. Syllables with multiple vowels tend to be heavier and more open, including diphthongs and vowel sequences.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable contains 2+ vowels, False otherwise

Examples

>>> long_vowel("beat")  # 'ea' = 2 vowels
True
>>> long_vowel("aura")  # 'au' + 'a' = 3 vowels
True
>>> long_vowel("bat")
False
>>> long_vowel("")
False

Notes

  • Counts total vowels in syllable (any position)

  • Returns True if count >= 2

  • Not linguistic vowel length (short vs long /a/ vs /aː/)

  • Catches diphthongs (ae, au, etc.) and vowel sequences

  • Provides proxy for syllable weight and complexity

  • Mutually exclusive with short_vowel

  • Empty strings return False (no nucleus)

build_tools.syllable_feature_annotator.feature_rules.ends_with_vowel(s)[source]

Detect if syllable ends with a vowel (open syllable).

Syllables ending in vowels are “open” in phonological terms. They tend to have higher sonority and different prosodic properties compared to consonant-final syllables.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable ends with vowel, False otherwise

Examples

>>> ends_with_vowel("na")
True
>>> ends_with_vowel("hello")
True
>>> ends_with_vowel("bat")
False
>>> ends_with_vowel("")
False

Notes

  • Only checks the final character

  • Vowels defined in phoneme_sets.VOWELS (a, e, i, o, u)

  • Open syllables (vowel-final) vs closed syllables (consonant-final)

  • Empty strings return False (no coda to analyze)

  • Mutually exclusive with ends_with_nasal and ends_with_stop

build_tools.syllable_feature_annotator.feature_rules.ends_with_nasal(s)[source]

Detect if syllable ends with a nasal consonant (nasal coda).

Nasal codas (m, n) create specific closure patterns and resonance. They are common syllable-final consonants across many languages and contribute to syllable weight.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable ends with nasal, False otherwise

Examples

>>> ends_with_nasal("turn")
True
>>> ends_with_nasal("man")
True
>>> ends_with_nasal("bat")
False
>>> ends_with_nasal("")
False

Notes

  • Only checks the final character

  • Nasals defined in phoneme_sets.NASALS (m, n)

  • Nasal codas are distinct from stop codas in sonority

  • Empty strings return False (no coda to analyze)

  • See also: contains_nasal for position-independent detection

build_tools.syllable_feature_annotator.feature_rules.ends_with_stop(s)[source]

Detect if syllable ends with a stop consonant (stop coda).

Stop codas create abrupt syllable termination with complete airflow closure. They include plosives and other stops that contribute to syllable closure and weight.

Parameters

sstr

Syllable string to analyze

Returns

bool

True if syllable ends with stop, False otherwise

Examples

>>> ends_with_stop("takt")
True
>>> ends_with_stop("bat")
True
>>> ends_with_stop("man")
False
>>> ends_with_stop("")
False

Notes

  • Only checks the final character

  • Stops defined in phoneme_sets.STOPS (p, t, k, b, d, g, q)

  • STOPS includes all PLOSIVES plus ‘q’ (terminal closure)

  • Stop codas create heavier, more closed syllables

  • Empty strings return False (no coda to analyze)

  • Distinction: STOPS for coda detection, PLOSIVES for internal texture

build_tools.syllable_feature_annotator.feature_rules.FEATURE_DETECTORS