build_tools.syllable_walk.db
SQLite database access layer for the syllable walker web interface.
This module provides functions to query the corpus.db SQLite database for syllable data, avoiding the need to load large JSON files into memory.
The database schema stores syllables with their 12 phonetic features and frequency counts, with indexes optimized for common query patterns.
- Functions:
load_syllables_from_sqlite: Load all syllables with features get_syllable_count: Get total syllable count syllable_exists: Check if a syllable exists get_random_syllable: Get a random syllable
Attributes
Functions
|
Load all syllables with features from the database. |
|
Get the total number of syllables in the database. |
|
Check if a syllable exists in the database. |
|
Get data for a specific syllable. |
|
Get a random syllable from the database. |
|
Load syllables from annotated JSON file (fallback when no DB). |
|
Load syllables from database or JSON, with automatic fallback. |
Module Contents
- build_tools.syllable_walk.db.FEATURE_COLUMNS = ['starts_with_vowel', 'starts_with_cluster', 'starts_with_heavy_cluster', 'contains_plosive',...
- build_tools.syllable_walk.db.load_syllables_from_sqlite(db_path)[source]
Load all syllables with features from the database.
This is the primary data loading function for the web interface. Returns data in the same format as the annotated JSON files.
- Parameters:
db_path (pathlib.Path) – Path to corpus.db
- Returns:
List of dicts with ‘syllable’, ‘frequency’, and ‘features’ keys
- Return type:
Example
>>> syllables = load_syllables_from_sqlite(Path("corpus.db")) >>> syllables[0] {'syllable': 'ab', 'frequency': 1, 'features': {'starts_with_vowel': True, ...}}
- build_tools.syllable_walk.db.get_syllable_count(db_path)[source]
Get the total number of syllables in the database.
- Parameters:
db_path (pathlib.Path) – Path to corpus.db
- Returns:
Total syllable count
- Return type:
- build_tools.syllable_walk.db.syllable_exists(db_path, syllable)[source]
Check if a syllable exists in the database.
- Parameters:
db_path (pathlib.Path) – Path to corpus.db
syllable (str) – Syllable to check
- Returns:
True if syllable exists, False otherwise
- Return type:
- build_tools.syllable_walk.db.get_syllable_data(db_path, syllable)[source]
Get data for a specific syllable.
- Parameters:
db_path (pathlib.Path) – Path to corpus.db
syllable (str) – Syllable to look up
- Returns:
Dict with syllable data, or None if not found
- Return type:
dict | None
- build_tools.syllable_walk.db.get_random_syllable(db_path, seed=None)[source]
Get a random syllable from the database.
Uses frequency-weighted random selection if seed is provided for reproducibility.
- Parameters:
db_path (pathlib.Path) – Path to corpus.db
seed (int | None) – Optional random seed for reproducibility
- Returns:
Random syllable string
- Return type:
- build_tools.syllable_walk.db.load_syllables_from_json(json_path)[source]
Load syllables from annotated JSON file (fallback when no DB).
- Parameters:
json_path (pathlib.Path) – Path to *_syllables_annotated.json
- Returns:
List of dicts with syllable data
- Return type:
- build_tools.syllable_walk.db.load_syllables(db_path=None, json_path=None)[source]
Load syllables from database or JSON, with automatic fallback.
Prefers SQLite database for performance, falls back to JSON if database is not available.
- Parameters:
db_path (pathlib.Path | None) – Path to corpus.db (optional)
json_path (pathlib.Path | None) – Path to annotated JSON (optional)
- Returns:
Tuple of (syllables list, source description)
- Raises:
ValueError – If neither db_path nor json_path is valid
- Return type: