build_tools.tui_common.ledger

Shared corpus database ledger helpers for extraction tools.

This module provides a context manager and helper functions for integrating with the corpus database ledger. The ledger is observational only - it records what happened but does not influence extraction behavior.

These utilities eliminate duplicated corpus DB integration patterns across the pyphen and NLTK syllable extractors.

Usage:

from build_tools.tui_common.ledger import ExtractionLedgerContext

with ExtractionLedgerContext(
    extractor_tool="pyphen_syllable_extractor",
    extractor_version="0.5.0",
    min_len=2,
    max_len=8,
    quiet=False,
) as ctx:
    # Record inputs
    ctx.record_input(input_path)

    # ... do extraction ...

    # Record outputs
    ctx.record_output(
        output_path=syllables_path,
        unique_syllable_count=len(syllables),
        meta_path=metadata_path,
    )

    # Mark success or failure
    ctx.set_result(success=True)

Classes

ExtractionLedgerContext

Context manager for corpus database ledger integration.

Module Contents

class build_tools.tui_common.ledger.ExtractionLedgerContext(extractor_tool, extractor_version='unknown', pyphen_lang=None, min_len=None, max_len=None, recursive=False, pattern=None, command_line=None, quiet=False)[source]

Context manager for corpus database ledger integration.

Handles the full lifecycle of ledger operations: - Initialize ledger on entry - Start run with extraction parameters - Record inputs and outputs during extraction - Complete run with success/failure status on exit - Close ledger connection

All operations are safe - failures are logged but don’t block extraction.

extractor_tool

Name of the extraction tool

extractor_version

Version string of the tool

pyphen_lang

Language code for pyphen (None for NLTK)

min_len

Minimum syllable length constraint

max_len

Maximum syllable length constraint

recursive

Whether directory scanning was recursive

pattern

File pattern for directory scanning

command_line

Full command-line invocation

quiet

Suppress warning messages

Example

>>> with ExtractionLedgerContext(
...     extractor_tool="pyphen_syllable_extractor",
...     extractor_version="0.5.0",
...     pyphen_lang="en_US",
...     min_len=2,
...     max_len=8,
... ) as ctx:
...     ctx.record_input(Path("input.txt"))
...     # ... extraction ...
...     ctx.record_output(syllables_path, len(syllables), metadata_path)
...     ctx.set_result(success=True)

Initialize the ledger context.

Parameters:
  • extractor_tool (str) – Name of the extraction tool

  • extractor_version (str) – Version string of the tool

  • pyphen_lang (str | None) – Language code for pyphen (None for NLTK or auto-detect)

  • min_len (int | None) – Minimum syllable length constraint

  • max_len (int | None) – Maximum syllable length constraint

  • recursive (bool) – Whether directory scanning was recursive

  • pattern (str | None) – File pattern for directory scanning

  • command_line (str | None) – Full command-line invocation (defaults to sys.argv)

  • quiet (bool) – Suppress warning messages

extractor_tool
extractor_version = 'unknown'
pyphen_lang = None
min_len = None
max_len = None
recursive = False
pattern = None
command_line = ''
quiet = False
property is_available: bool

Check if corpus DB integration is available and initialized.

property run_id: int | None

Get the current run ID, or None if not initialized.

set_result(success)[source]

Explicitly set the extraction result.

Call this before exiting the context to indicate success or failure. If not called, success is assumed unless an exception occurs.

Parameters:

success (bool) – True if extraction succeeded, False if failed

record_input(source_path, file_count=None)[source]

Record an input source for this run.

Parameters:
  • source_path (pathlib.Path) – Path to input file or directory

  • file_count (int | None) – Number of files if source_path is a directory

record_inputs(files, source_dir=None)[source]

Record multiple input files for this run.

If source_dir is provided, records the directory with file count. Otherwise, records each file individually.

Parameters:
  • files (list[pathlib.Path]) – List of input file paths

  • source_dir (pathlib.Path | None) – Source directory (if files were discovered from a directory)

record_output(output_path, unique_syllable_count=None, meta_path=None)[source]

Record an output file for this run.

Parameters:
  • output_path (pathlib.Path) – Path to generated syllables file

  • unique_syllable_count (int | None) – Number of unique syllables extracted

  • meta_path (pathlib.Path | None) – Path to corresponding metadata file