build_tools.corpus_sqlite_builder
Corpus SQLite Builder - JSON to SQLite Conversion Tool
Converts large annotated JSON files into optimized SQLite databases for efficient querying in interactive tools like the syllable_walk_tui.
This is a build-time tool only - not used during runtime name generation.
Features: - Memory-efficient conversion of 100MB+ JSON files - Batched transactions for performance - Idempotent conversion (safe to re-run) - Auto-discovery of annotated JSON files - Batch conversion support
- Usage:
>>> from build_tools.corpus_sqlite_builder import convert_json_to_sqlite >>> from pathlib import Path >>> corpus_dir = Path("_working/output/20260110_115453_pyphen") >>> db_path = convert_json_to_sqlite(corpus_dir) >>> print(f"Created: {db_path}")
Command-line usage:
# Convert single corpus
python -m build_tools.corpus_sqlite_builder _working/output/20260110_115453_pyphen/
# Force overwrite
python -m build_tools.corpus_sqlite_builder _working/output/20260110_115453_pyphen/ --force
# Batch convert all
python -m build_tools.corpus_sqlite_builder --batch _working/output/
- Design Philosophy:
JSON is the canonical source of truth (human-readable, portable)
SQLite is derived data (optimized for queries, regeneratable)
Both formats coexist in data/ subdirectory
TUI prefers SQLite, falls back to JSON