build_tools.corpus_sqlite_builder.schema

SQLite schema definitions for corpus databases.

This module defines the database schema for storing syllable corpus data, including syllables, features, and metadata.

Attributes

CORPUS_SCHEMA_VERSION

CREATE_METADATA_TABLE

CREATE_SYLLABLES_TABLE

CREATE_INDEXES

OPTIMIZATION_PRAGMAS

Functions

create_database(db_path)

Create a new corpus database with the standard schema.

insert_metadata(conn, metadata)

Insert metadata key-value pairs into the database.

get_metadata(conn)

Retrieve all metadata from the database.

verify_schema_version(conn)

Verify the database schema version matches the current version.

Module Contents

build_tools.corpus_sqlite_builder.schema.CORPUS_SCHEMA_VERSION = 1
build_tools.corpus_sqlite_builder.schema.CREATE_METADATA_TABLE = Multiline-String
Show Value
"""
CREATE TABLE metadata (
    key TEXT PRIMARY KEY,
    value TEXT NOT NULL
);
"""
build_tools.corpus_sqlite_builder.schema.CREATE_SYLLABLES_TABLE = Multiline-String
Show Value
"""
CREATE TABLE syllables (
    syllable TEXT PRIMARY KEY,
    frequency INTEGER NOT NULL,
    starts_with_vowel INTEGER NOT NULL,
    starts_with_cluster INTEGER NOT NULL,
    starts_with_heavy_cluster INTEGER NOT NULL,
    contains_plosive INTEGER NOT NULL,
    contains_fricative INTEGER NOT NULL,
    contains_liquid INTEGER NOT NULL,
    contains_nasal INTEGER NOT NULL,
    short_vowel INTEGER NOT NULL,
    long_vowel INTEGER NOT NULL,
    ends_with_vowel INTEGER NOT NULL,
    ends_with_nasal INTEGER NOT NULL,
    ends_with_stop INTEGER NOT NULL
);
"""
build_tools.corpus_sqlite_builder.schema.CREATE_INDEXES = ['CREATE INDEX idx_starts_with_vowel ON syllables(starts_with_vowel);', 'CREATE INDEX...
build_tools.corpus_sqlite_builder.schema.OPTIMIZATION_PRAGMAS = ['PRAGMA journal_mode=WAL;', 'PRAGMA synchronous=NORMAL;', 'PRAGMA cache_size=-64000;', 'PRAGMA...
build_tools.corpus_sqlite_builder.schema.create_database(db_path)[source]

Create a new corpus database with the standard schema.

Parameters:

db_path (pathlib.Path) – Path where the database will be created

Returns:

SQLite connection to the newly created database

Raises:

sqlite3.Error – If database creation fails

Return type:

sqlite3.Connection

build_tools.corpus_sqlite_builder.schema.insert_metadata(conn, metadata)[source]

Insert metadata key-value pairs into the database.

Parameters:
Common metadata keys:
  • schema_version: Database schema version (int as string)

  • source_tool: Name of the tool that created this database

  • source_version: Version of the source tool

  • generated_at: ISO 8601 timestamp of creation

  • total_syllables: Number of syllables in the database (int as string)

  • source_json_path: Path to the source JSON file

build_tools.corpus_sqlite_builder.schema.get_metadata(conn)[source]

Retrieve all metadata from the database.

Parameters:

conn (sqlite3.Connection) – SQLite database connection

Returns:

Dictionary of metadata key-value pairs

Return type:

dict[str, str]

build_tools.corpus_sqlite_builder.schema.verify_schema_version(conn)[source]

Verify the database schema version matches the current version.

Parameters:

conn (sqlite3.Connection) – SQLite database connection

Returns:

Schema version number from the database

Raises:

ValueError – If schema version is missing or incompatible

Return type:

int