build_tools.tui_common.ledger
Shared corpus database ledger helpers for extraction tools.
This module provides a context manager and helper functions for integrating with the corpus database ledger. The ledger is observational only - it records what happened but does not influence extraction behavior.
These utilities eliminate duplicated corpus DB integration patterns across the pyphen and NLTK syllable extractors.
Usage:
from build_tools.tui_common.ledger import ExtractionLedgerContext
with ExtractionLedgerContext(
extractor_tool="pyphen_syllable_extractor",
extractor_version="0.5.0",
min_len=2,
max_len=8,
quiet=False,
) as ctx:
# Record inputs
ctx.record_input(input_path)
# ... do extraction ...
# Record outputs
ctx.record_output(
output_path=syllables_path,
unique_syllable_count=len(syllables),
meta_path=metadata_path,
)
# Mark success or failure
ctx.set_result(success=True)
Classes
Context manager for corpus database ledger integration. |
Module Contents
- class build_tools.tui_common.ledger.ExtractionLedgerContext(extractor_tool, extractor_version='unknown', pyphen_lang=None, min_len=None, max_len=None, recursive=False, pattern=None, command_line=None, quiet=False)[source]
Context manager for corpus database ledger integration.
Handles the full lifecycle of ledger operations: - Initialize ledger on entry - Start run with extraction parameters - Record inputs and outputs during extraction - Complete run with success/failure status on exit - Close ledger connection
All operations are safe - failures are logged but don’t block extraction.
- extractor_tool
Name of the extraction tool
- extractor_version
Version string of the tool
- pyphen_lang
Language code for pyphen (None for NLTK)
- min_len
Minimum syllable length constraint
- max_len
Maximum syllable length constraint
- recursive
Whether directory scanning was recursive
- pattern
File pattern for directory scanning
- command_line
Full command-line invocation
- quiet
Suppress warning messages
Example
>>> with ExtractionLedgerContext( ... extractor_tool="pyphen_syllable_extractor", ... extractor_version="0.5.0", ... pyphen_lang="en_US", ... min_len=2, ... max_len=8, ... ) as ctx: ... ctx.record_input(Path("input.txt")) ... # ... extraction ... ... ctx.record_output(syllables_path, len(syllables), metadata_path) ... ctx.set_result(success=True)
Initialize the ledger context.
- Parameters:
extractor_tool (str) – Name of the extraction tool
extractor_version (str) – Version string of the tool
pyphen_lang (str | None) – Language code for pyphen (None for NLTK or auto-detect)
min_len (int | None) – Minimum syllable length constraint
max_len (int | None) – Maximum syllable length constraint
recursive (bool) – Whether directory scanning was recursive
pattern (str | None) – File pattern for directory scanning
command_line (str | None) – Full command-line invocation (defaults to sys.argv)
quiet (bool) – Suppress warning messages
- extractor_tool
- extractor_version = 'unknown'
- pyphen_lang = None
- min_len = None
- max_len = None
- recursive = False
- pattern = None
- command_line = ''
- quiet = False
- set_result(success)[source]
Explicitly set the extraction result.
Call this before exiting the context to indicate success or failure. If not called, success is assumed unless an exception occurs.
- Parameters:
success (bool) – True if extraction succeeded, False if failed
- record_input(source_path, file_count=None)[source]
Record an input source for this run.
- Parameters:
source_path (pathlib.Path) – Path to input file or directory
file_count (int | None) – Number of files if source_path is a directory
- record_inputs(files, source_dir=None)[source]
Record multiple input files for this run.
If source_dir is provided, records the directory with file count. Otherwise, records each file individually.
- Parameters:
files (list[pathlib.Path]) – List of input file paths
source_dir (pathlib.Path | None) – Source directory (if files were discovered from a directory)
- record_output(output_path, unique_syllable_count=None, meta_path=None)[source]
Record an output file for this run.
- Parameters:
output_path (pathlib.Path) – Path to generated syllables file
unique_syllable_count (int | None) – Number of unique syllables extracted
meta_path (pathlib.Path | None) – Path to corresponding metadata file