Corpus Database Viewer

Overview

Corpus Database Viewer - Interactive TUI for Build Provenance

Interactive terminal user interface for viewing corpus database provenance records. This is a build-time tool only - for inspecting extraction run history and outputs.

This tool provides: - Interactive table browsing with pagination - Schema inspection (columns, types, indexes) - Data export to CSV and JSON formats - Keyboard-driven navigation

Replaces: Flask-based web viewer (archived in _working/)

Features: - Browse all database tables interactively - Paginated data display (50 rows per page) - View table schemas and CREATE TABLE statements - Export data to CSV or JSON - Read-only database access (safe inspection) - Keyboard shortcuts for efficient navigation

Main Components: - CorpusDBViewerApp: Main Textual TUI application class - queries: Database query functions (table lists, schema, data) - formatters: Export functions for CSV and JSON - main: CLI entry point with argument parsing

CLI Usage:

# Launch interactive TUI with default database
python -m build_tools.corpus_db_viewer

# Specify custom database path
python -m build_tools.corpus_db_viewer --db /path/to/database.db

# Set custom export directory
python -m build_tools.corpus_db_viewer --export-dir _working/my_exports/
Keyboard Shortcuts (in TUI):

↑/↓ Navigate rows ←/→ Previous/Next page t Switch table i Show schema info e Export data q Quit ? Show help

Python API Usage:
>>> from build_tools.corpus_db_viewer import queries
>>> from pathlib import Path
>>>
>>> # Get list of tables
>>> db_path = Path("data/raw/syllable_extractor.db")
>>> tables = queries.get_tables_list(db_path)
>>>
>>> # Get schema for a table
>>> schema = queries.get_table_schema(db_path, "runs")
>>> print(schema['columns'])
>>>
>>> # Get paginated data
>>> data = queries.get_table_data(db_path, "runs", page=1, limit=50)
>>> print(f"Total rows: {data['total']}")

The Corpus Database Viewer is an interactive terminal user interface (TUI) for inspecting corpus database provenance records. Built with Textual, it provides a keyboard-driven interface for browsing extraction run history.

Replaces: Flask-based web viewer (archived in _working/_archived/pipeworks_db_viewer_flask/)

Key Features:

  • Browse all database tables interactively

  • Paginated data display (50 rows per page)

  • View table schemas and CREATE TABLE statements

  • Export data to CSV or JSON

  • Read-only database access (safe inspection)

  • Keyboard shortcuts for efficient navigation

Command-Line Interface

Interactive TUI for viewing corpus database provenance records

usage: corpus_db_viewer [-h] [--db DB_PATH] [--export-dir EXPORT_DIR]
                        [--page-size PAGE_SIZE]

Named Arguments

--db

Path to corpus database. Default: data/raw/syllable_extractor.db

Default: data/raw/syllable_extractor.db

--export-dir

Directory for exported data. Default: _working/exports/

Default: _working/exports

--page-size

Number of rows per page. Default: 50

Default: 50

# Launch viewer with default database
python -m build_tools.corpus_db_viewer

# Specify custom database path
python -m build_tools.corpus_db_viewer --db /path/to/database.db

# Set custom export directory
python -m build_tools.corpus_db_viewer --export-dir _working/my_exports/
  • ↑/↓: Navigate rows

  • ←/→: Previous/Next page

  • PageUp/Dn: Jump pages

  • Home/End: First/Last page

  • t: Switch table (table selector)

  • i: Show schema info

  • e: Export current view

  • r: Refresh data

  • q: Quit application

  • ?: Show help screen

  • Use arrow keys to navigate through table data

  • Press ‘t’ to open table selector and choose a different table

  • Press ‘i’ to view detailed schema information

  • Press ‘e’ to export the current table or view to CSV/JSON

  • Exports are saved to the export directory (default: _working/exports/)

  • Files are named: <table_name>_<timestamp>.<format>

  • Both CSV and JSON formats are supported

Output Format

Export Files

The viewer can export table data to two formats:

CSV Format:

Comma-separated values with header row:

id,run_timestamp,extractor_tool,status
1,2026-01-09T14:30:22,syllable_extractor,completed
2,2026-01-09T15:12:45,syllable_extractor,completed

JSON Format:

Array of objects with full type preservation:

[
  {
    "id": 1,
    "run_timestamp": "2026-01-09T14:30:22",
    "extractor_tool": "syllable_extractor",
    "status": "completed"
  }
]

Export file naming:

Files are timestamped and named by table:

_working/exports/
├── runs_20260109_143022.csv
├── runs_20260109_143022.json
└── outputs_20260109_143145.csv

Important: Exports include ALL rows, not just the current page.

Database Structure

The corpus database tracks syllable extraction runs:

runs - Extraction run metadata

Tool name, version, status, timestamps, command-line arguments

inputs - Source files processed

Input files or directories used for each run

outputs - Generated files

Output .syllables and .meta files with syllable counts

Integration Guide

Use the viewer to inspect corpus database provenance after extraction runs:

# Step 1: Extract syllables (populates database)
python -m build_tools.pyphen_syllable_extractor \
  --source data/corpus/ \
  --lang en_US

# Step 2: Inspect extraction history with TUI viewer
python -m build_tools.corpus_db_viewer

When to use this tool:

  • To verify extraction runs completed successfully

  • To inspect which corpus files were processed

  • To track provenance of generated syllable files

  • To export extraction history for reporting or analysis

  • To debug failed extraction runs by examining status and error messages

Common workflows:

  1. Browse recent runs: Launch viewer → select “runs” table → sort by timestamp

  2. Find run details: Press i to view schema → browse rows for run metadata

  3. Export history: Press e → select format (CSV/JSON) → save to export directory

  4. Track file provenance: Switch to “outputs” table → identify which run created specific files

Advanced Topics

Keyboard Shortcuts

Navigation:

Key(s)

Action

/

Navigate rows

/

Previous/Next page

PageUp / PageDn

Jump 10 pages

Home / End

First/Last page

Actions:

Key

Action

t

Switch table

i

Show schema info

e

Export data

r

Refresh

?

Show help

q

Quit

Usage Examples

Browsing Tables:

Launch the viewer and it automatically loads the first table. Navigate using:

  1. Press t to focus the table list

  2. Use / to navigate tables

  3. Press Enter to select

Or click directly on table names in the sidebar.

Viewing Schema:

Press i to view detailed schema information:

  • Column definitions (name, type, constraints)

  • Indexes (name, columns, UNIQUE flags)

  • CREATE TABLE statement (original SQL)

Example output:

Schema: runs

Columns:
  • id: INTEGER [PRIMARY KEY]
  • run_timestamp: TEXT NOT NULL
  • extractor_tool: TEXT NOT NULL
  • status: TEXT

Indexes:
  • idx_status (status)

Exporting Data:

Press e to export the current table:

  1. Edit filename (optional)

  2. Choose CSV or JSON format

  3. File saved to export directory (default: _working/exports/)

Design Philosophy

Read-Only Access:

The viewer opens databases in read-only mode (?mode=ro) to prevent accidental modifications.

Observational Tool:

Like the corpus_db ledger, this viewer is observational only - it displays run history but doesn’t control or modify build processes.

Benefits Over Flask Version:

The Textual TUI offers advantages over the previous Flask-based web viewer:

  • No web server overhead (terminal-native)

  • Better build tools integration

  • Reduced dependencies (no Flask, pandas, Werkzeug)

  • Single-language codebase (Python only)

  • Native keyboard navigation

Trade-offs:

  • No SQL query interface (may be added later)

  • No cross-table search (may be added later)

  • Terminal-only (no browser UI)

Notes

Dependencies:

Requires Textual library for TUI functionality. Install with:

pip install -e ".[dev]"

Troubleshooting:

Database Not Found:

Error: Database not found: data/raw/syllable_extractor.db

Solution: Ensure the database exists or specify a different path:

python -m build_tools.corpus_db_viewer --db /path/to/database.db

Textual Not Installed:

Error: Textual library not found

Solution: Install development dependencies:

pip install -e ".[dev]"

Terminal Too Small:

If the layout looks broken, resize your terminal. Minimum recommended: 80 columns × 24 rows.

Database Access:

  • Database opened in read-only mode for safety

  • No modification operations available

  • Safe to run while extraction tools are active

Build-time tool:

This is a build-time inspection tool - not used during runtime name generation.

Related Documentation:

API Reference

Corpus Database Viewer - Interactive TUI for Build Provenance

Interactive terminal user interface for viewing corpus database provenance records. This is a build-time tool only - for inspecting extraction run history and outputs.

This tool provides: - Interactive table browsing with pagination - Schema inspection (columns, types, indexes) - Data export to CSV and JSON formats - Keyboard-driven navigation

Replaces: Flask-based web viewer (archived in _working/)

Features: - Browse all database tables interactively - Paginated data display (50 rows per page) - View table schemas and CREATE TABLE statements - Export data to CSV or JSON - Read-only database access (safe inspection) - Keyboard shortcuts for efficient navigation

Main Components: - CorpusDBViewerApp: Main Textual TUI application class - queries: Database query functions (table lists, schema, data) - formatters: Export functions for CSV and JSON - main: CLI entry point with argument parsing

CLI Usage:

# Launch interactive TUI with default database
python -m build_tools.corpus_db_viewer

# Specify custom database path
python -m build_tools.corpus_db_viewer --db /path/to/database.db

# Set custom export directory
python -m build_tools.corpus_db_viewer --export-dir _working/my_exports/
Keyboard Shortcuts (in TUI):

↑/↓ Navigate rows ←/→ Previous/Next page t Switch table i Show schema info e Export data q Quit ? Show help

Python API Usage:
>>> from build_tools.corpus_db_viewer import queries
>>> from pathlib import Path
>>>
>>> # Get list of tables
>>> db_path = Path("data/raw/syllable_extractor.db")
>>> tables = queries.get_tables_list(db_path)
>>>
>>> # Get schema for a table
>>> schema = queries.get_table_schema(db_path, "runs")
>>> print(schema['columns'])
>>>
>>> # Get paginated data
>>> data = queries.get_table_data(db_path, "runs", page=1, limit=50)
>>> print(f"Total rows: {data['total']}")
build_tools.corpus_db_viewer.main(args=None)[source]

Main CLI entry point.

Return type:

int

Parameters

argslist[str] | None, optional

List of argument strings. If None, uses sys.argv[1:]

Returns

int

Exit code (0 for success, 1 for error)