Corpus Database Viewer
Overview
Corpus Database Viewer - Interactive TUI for Build Provenance
Interactive terminal user interface for viewing corpus database provenance records. This is a build-time tool only - for inspecting extraction run history and outputs.
This tool provides: - Interactive table browsing with pagination - Schema inspection (columns, types, indexes) - Data export to CSV and JSON formats - Keyboard-driven navigation
Replaces: Flask-based web viewer (archived in _working/)
Features: - Browse all database tables interactively - Paginated data display (50 rows per page) - View table schemas and CREATE TABLE statements - Export data to CSV or JSON - Read-only database access (safe inspection) - Keyboard shortcuts for efficient navigation
Main Components: - CorpusDBViewerApp: Main Textual TUI application class - queries: Database query functions (table lists, schema, data) - formatters: Export functions for CSV and JSON - main: CLI entry point with argument parsing
CLI Usage:
# Launch interactive TUI with default database python -m build_tools.corpus_db_viewer # Specify custom database path python -m build_tools.corpus_db_viewer --db /path/to/database.db # Set custom export directory python -m build_tools.corpus_db_viewer --export-dir _working/my_exports/
- Keyboard Shortcuts (in TUI):
↑/↓ Navigate rows ←/→ Previous/Next page t Switch table i Show schema info e Export data q Quit ? Show help
- Python API Usage:
>>> from build_tools.corpus_db_viewer import queries >>> from pathlib import Path >>> >>> # Get list of tables >>> db_path = Path("data/raw/syllable_extractor.db") >>> tables = queries.get_tables_list(db_path) >>> >>> # Get schema for a table >>> schema = queries.get_table_schema(db_path, "runs") >>> print(schema['columns']) >>> >>> # Get paginated data >>> data = queries.get_table_data(db_path, "runs", page=1, limit=50) >>> print(f"Total rows: {data['total']}")
The Corpus Database Viewer is an interactive terminal user interface (TUI) for inspecting corpus database provenance records. Built with Textual, it provides a keyboard-driven interface for browsing extraction run history.
Replaces: Flask-based web viewer (archived in _working/_archived/pipeworks_db_viewer_flask/)
Key Features:
Browse all database tables interactively
Paginated data display (50 rows per page)
View table schemas and CREATE TABLE statements
Export data to CSV or JSON
Read-only database access (safe inspection)
Keyboard shortcuts for efficient navigation
Command-Line Interface
Interactive TUI for viewing corpus database provenance records
usage: corpus_db_viewer [-h] [--db DB_PATH] [--export-dir EXPORT_DIR]
[--page-size PAGE_SIZE]
Named Arguments
- --db
Path to corpus database. Default: data/raw/syllable_extractor.db
Default:
data/raw/syllable_extractor.db- --export-dir
Directory for exported data. Default: _working/exports/
Default:
_working/exports- --page-size
Number of rows per page. Default: 50
Default:
50
# Launch viewer with default database
python -m build_tools.corpus_db_viewer
# Specify custom database path
python -m build_tools.corpus_db_viewer --db /path/to/database.db
# Set custom export directory
python -m build_tools.corpus_db_viewer --export-dir _working/my_exports/
↑/↓: Navigate rows
←/→: Previous/Next page
PageUp/Dn: Jump pages
Home/End: First/Last page
t: Switch table (table selector)
i: Show schema info
e: Export current view
r: Refresh data
q: Quit application
?: Show help screen
Use arrow keys to navigate through table data
Press ‘t’ to open table selector and choose a different table
Press ‘i’ to view detailed schema information
Press ‘e’ to export the current table or view to CSV/JSON
Exports are saved to the export directory (default: _working/exports/)
Files are named: <table_name>_<timestamp>.<format>
Both CSV and JSON formats are supported
Output Format
Export Files
The viewer can export table data to two formats:
CSV Format:
Comma-separated values with header row:
id,run_timestamp,extractor_tool,status
1,2026-01-09T14:30:22,syllable_extractor,completed
2,2026-01-09T15:12:45,syllable_extractor,completed
JSON Format:
Array of objects with full type preservation:
[
{
"id": 1,
"run_timestamp": "2026-01-09T14:30:22",
"extractor_tool": "syllable_extractor",
"status": "completed"
}
]
Export file naming:
Files are timestamped and named by table:
_working/exports/
├── runs_20260109_143022.csv
├── runs_20260109_143022.json
└── outputs_20260109_143145.csv
Important: Exports include ALL rows, not just the current page.
Database Structure
The corpus database tracks syllable extraction runs:
- runs - Extraction run metadata
Tool name, version, status, timestamps, command-line arguments
- inputs - Source files processed
Input files or directories used for each run
- outputs - Generated files
Output .syllables and .meta files with syllable counts
Integration Guide
Use the viewer to inspect corpus database provenance after extraction runs:
# Step 1: Extract syllables (populates database)
python -m build_tools.pyphen_syllable_extractor \
--source data/corpus/ \
--lang en_US
# Step 2: Inspect extraction history with TUI viewer
python -m build_tools.corpus_db_viewer
When to use this tool:
To verify extraction runs completed successfully
To inspect which corpus files were processed
To track provenance of generated syllable files
To export extraction history for reporting or analysis
To debug failed extraction runs by examining status and error messages
Common workflows:
Browse recent runs: Launch viewer → select “runs” table → sort by timestamp
Find run details: Press
ito view schema → browse rows for run metadataExport history: Press
e→ select format (CSV/JSON) → save to export directoryTrack file provenance: Switch to “outputs” table → identify which run created specific files
Advanced Topics
Keyboard Shortcuts
Navigation:
Key(s) |
Action |
|---|---|
|
Navigate rows |
|
Previous/Next page |
|
Jump 10 pages |
|
First/Last page |
Actions:
Key |
Action |
|---|---|
|
Switch table |
|
Show schema info |
|
Export data |
|
Refresh |
|
Show help |
|
Quit |
Usage Examples
Browsing Tables:
Launch the viewer and it automatically loads the first table. Navigate using:
Press
tto focus the table listUse
↑/↓to navigate tablesPress
Enterto select
Or click directly on table names in the sidebar.
Viewing Schema:
Press i to view detailed schema information:
Column definitions (name, type, constraints)
Indexes (name, columns, UNIQUE flags)
CREATE TABLE statement (original SQL)
Example output:
Schema: runs
Columns:
• id: INTEGER [PRIMARY KEY]
• run_timestamp: TEXT NOT NULL
• extractor_tool: TEXT NOT NULL
• status: TEXT
Indexes:
• idx_status (status)
Exporting Data:
Press e to export the current table:
Edit filename (optional)
Choose CSV or JSON format
File saved to export directory (default:
_working/exports/)
Design Philosophy
Read-Only Access:
The viewer opens databases in read-only mode (?mode=ro) to prevent accidental modifications.
Observational Tool:
Like the corpus_db ledger, this viewer is observational only - it displays run history but doesn’t control or modify build processes.
Benefits Over Flask Version:
The Textual TUI offers advantages over the previous Flask-based web viewer:
No web server overhead (terminal-native)
Better build tools integration
Reduced dependencies (no Flask, pandas, Werkzeug)
Single-language codebase (Python only)
Native keyboard navigation
Trade-offs:
No SQL query interface (may be added later)
No cross-table search (may be added later)
Terminal-only (no browser UI)
Notes
Dependencies:
Requires Textual library for TUI functionality. Install with:
pip install -e ".[dev]"
Troubleshooting:
Database Not Found:
Error: Database not found: data/raw/syllable_extractor.db
Solution: Ensure the database exists or specify a different path:
python -m build_tools.corpus_db_viewer --db /path/to/database.db
Textual Not Installed:
Error: Textual library not found
Solution: Install development dependencies:
pip install -e ".[dev]"
Terminal Too Small:
If the layout looks broken, resize your terminal. Minimum recommended: 80 columns × 24 rows.
Database Access:
Database opened in read-only mode for safety
No modification operations available
Safe to run while extraction tools are active
Build-time tool:
This is a build-time inspection tool - not used during runtime name generation.
Related Documentation:
Corpus Database - Build provenance ledger that this tool reads
Pyphen Syllable Extractor - Pyphen tool that populates the database
NLTK Syllable Extractor - NLTK tool that populates the database
API Reference
Corpus Database Viewer - Interactive TUI for Build Provenance
Interactive terminal user interface for viewing corpus database provenance records. This is a build-time tool only - for inspecting extraction run history and outputs.
This tool provides: - Interactive table browsing with pagination - Schema inspection (columns, types, indexes) - Data export to CSV and JSON formats - Keyboard-driven navigation
Replaces: Flask-based web viewer (archived in _working/)
Features: - Browse all database tables interactively - Paginated data display (50 rows per page) - View table schemas and CREATE TABLE statements - Export data to CSV or JSON - Read-only database access (safe inspection) - Keyboard shortcuts for efficient navigation
Main Components: - CorpusDBViewerApp: Main Textual TUI application class - queries: Database query functions (table lists, schema, data) - formatters: Export functions for CSV and JSON - main: CLI entry point with argument parsing
CLI Usage:
# Launch interactive TUI with default database python -m build_tools.corpus_db_viewer # Specify custom database path python -m build_tools.corpus_db_viewer --db /path/to/database.db # Set custom export directory python -m build_tools.corpus_db_viewer --export-dir _working/my_exports/
- Keyboard Shortcuts (in TUI):
↑/↓ Navigate rows ←/→ Previous/Next page t Switch table i Show schema info e Export data q Quit ? Show help
- Python API Usage:
>>> from build_tools.corpus_db_viewer import queries >>> from pathlib import Path >>> >>> # Get list of tables >>> db_path = Path("data/raw/syllable_extractor.db") >>> tables = queries.get_tables_list(db_path) >>> >>> # Get schema for a table >>> schema = queries.get_table_schema(db_path, "runs") >>> print(schema['columns']) >>> >>> # Get paginated data >>> data = queries.get_table_data(db_path, "runs", page=1, limit=50) >>> print(f"Total rows: {data['total']}")