build_tools.syllable_walk.db

SQLite database access layer for the syllable walker web interface.

This module provides functions to query the corpus.db SQLite database for syllable data, avoiding the need to load large JSON files into memory.

The database schema stores syllables with their 12 phonetic features and frequency counts, with indexes optimized for common query patterns.

Functions:

load_syllables_from_sqlite: Load all syllables with features get_syllable_count: Get total syllable count syllable_exists: Check if a syllable exists get_random_syllable: Get a random syllable

Attributes

FEATURE_COLUMNS

Functions

load_syllables_from_sqlite(db_path)

Load all syllables with features from the database.

get_syllable_count(db_path)

Get the total number of syllables in the database.

syllable_exists(db_path, syllable)

Check if a syllable exists in the database.

get_syllable_data(db_path, syllable)

Get data for a specific syllable.

get_random_syllable(db_path[, seed])

Get a random syllable from the database.

load_syllables_from_json(json_path)

Load syllables from annotated JSON file (fallback when no DB).

load_syllables([db_path, json_path])

Load syllables from database or JSON, with automatic fallback.

Module Contents

build_tools.syllable_walk.db.FEATURE_COLUMNS = ['starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...
build_tools.syllable_walk.db.load_syllables_from_sqlite(db_path)[source]

Load all syllables with features from the database.

This is the primary data loading function for the web interface. Returns data in the same format as the annotated JSON files.

Parameters:

db_path (pathlib.Path) – Path to corpus.db

Returns:

List of dicts with ‘syllable’, ‘frequency’, and ‘features’ keys

Return type:

list[dict]

Example

>>> syllables = load_syllables_from_sqlite(Path("corpus.db"))
>>> syllables[0]
{'syllable': 'ab', 'frequency': 1, 'features': {'starts_with_vowel': True, ...}}
build_tools.syllable_walk.db.get_syllable_count(db_path)[source]

Get the total number of syllables in the database.

Parameters:

db_path (pathlib.Path) – Path to corpus.db

Returns:

Total syllable count

Return type:

int

build_tools.syllable_walk.db.syllable_exists(db_path, syllable)[source]

Check if a syllable exists in the database.

Parameters:
  • db_path (pathlib.Path) – Path to corpus.db

  • syllable (str) – Syllable to check

Returns:

True if syllable exists, False otherwise

Return type:

bool

build_tools.syllable_walk.db.get_syllable_data(db_path, syllable)[source]

Get data for a specific syllable.

Parameters:
  • db_path (pathlib.Path) – Path to corpus.db

  • syllable (str) – Syllable to look up

Returns:

Dict with syllable data, or None if not found

Return type:

dict | None

build_tools.syllable_walk.db.get_random_syllable(db_path, seed=None)[source]

Get a random syllable from the database.

Uses frequency-weighted random selection if seed is provided for reproducibility.

Parameters:
  • db_path (pathlib.Path) – Path to corpus.db

  • seed (int | None) – Optional random seed for reproducibility

Returns:

Random syllable string

Return type:

str

build_tools.syllable_walk.db.load_syllables_from_json(json_path)[source]

Load syllables from annotated JSON file (fallback when no DB).

Parameters:

json_path (pathlib.Path) – Path to *_syllables_annotated.json

Returns:

List of dicts with syllable data

Return type:

list[dict]

build_tools.syllable_walk.db.load_syllables(db_path=None, json_path=None)[source]

Load syllables from database or JSON, with automatic fallback.

Prefers SQLite database for performance, falls back to JSON if database is not available.

Parameters:
  • db_path (pathlib.Path | None) – Path to corpus.db (optional)

  • json_path (pathlib.Path | None) – Path to annotated JSON (optional)

Returns:

Tuple of (syllables list, source description)

Raises:

ValueError – If neither db_path nor json_path is valid

Return type:

tuple[list[dict], str]