Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.4.0 (2026-01-10)
Features
build_tools: Add corpus_db tracking to interactive mode (d1fa97f)
build_tools: Add corpus_db_viewer TUI for database inspection (7df65c3)
build_tools: Add extractor identifiers to output directory names (5bbe49b)
build_tools: Add NLTK syllable extractor for phonetic syllabification (16c5237)
build_tools: Add NLTK syllable normaliser with fragment cleaning (baf223a)
build_tools: Add pyphen_ prefix to syllable normaliser outputs (c1c2b8a)
build_tools: Make pyphen extractor language argument optional (deeb4c7)
Documentation
build_tools: Add Basic Usage sections to CLI documentation (a482971)
build_tools: Add documentation for NLTK syllable extractor (8121d04)
build_tools: Add NLTK normaliser documentation (e06da84)
build_tools: Add Sphinx documentation for corpus_db_viewer (0681e4d)
build_tools: Fix bash command formatting in CLI documentation (4dfbd87)
build_tools: Fix bash formatting in module docstrings (8162ef6)
build_tools: Improve corpus_db_viewer documentation formatting (b652ac1)
build_tools: Standardize RST documentation and eliminate redundancy (12bb279)
build_tools: Update NLTK extractor docs for duplicate preservation (3971524)
Fix broken cross-references in syllable_walk.rst (f1c582a)
Update documentation titles and references for pyphen tools (955c90b)
Update README and CLAUDE.md with NLTK extractor setup instructions (8beb55c)
0.3.0 (2026-01-08)
Features
Add batch processing CLI for syllable extractor (1b6f1f8)
Add CLI support for automatic language detection (60b1dd2)
Add comprehensive badges and codecov token to CI (8cc155a)
Add comprehensive CI/CD infrastructure and syllable extractor enhancements (8487f9f)
Add interactive HTML visualization to t-SNE visualizer (8df6fa7)
Add language code to output filenames for multi-language support (ba1c3bf)
Add optional language auto-detection for syllable extraction (705261d)
Add parameter logging and optional mapping to t-SNE visualizer (f877d10)
Add syllable walker for phonetic feature space exploration (9d1b7e8)
Add t-SNE visualization tool for feature signature space (5c8b44a)
build_tools: Add corpus_db ledger for extraction run provenance (53894ee)
build_tools: Add syllable walker for phonetic space exploration (058dec5)
build_tools: Integrate corpus_db into syllable_extractor CLI (28c0ee9)
Complete Phase 1-4 of analysis refactoring - add dimensionality modules (c8035e3)
Complete Phase 5-6 of analysis refactoring - add plotting modules and refactor tsne_visualizer (d8d4097)
Improve test coverage for syllable_extractor from 41% to 43% (a149cb4)
Make interactive t-SNE visualization responsive with min-width constraint (9b8305a)
Bug Fixes
Add missing sphinx-argparse dependency for ReadTheDocs (a73bf2f)
Clean up test_tsne_visualizer to keep only integration tests (74b0cef)
Configure matplotlib to use non-interactive backend for CI (d7e0450)
corpus_db: Store paths in POSIX format for cross-platform compatibility (d4a19e5)
corpus_db: Update test to use POSIX path format for comparison (20c96f2)
Create extractor instance after auto-detection for file saving (3d5b13f)
docs: Link changelog.rst to auto-generated CHANGELOG.md (dea252a)
docs: Replace invalid JSON placeholder with valid example (51e3b05)
Expand mypy coverage to include tests and build_tools (eb875c2)
Fix colorbar title overlapping with values in interactive t-SNE visualization (4506aef)
Increase documentation warning threshold to accommodate dataclass warnings (8d9657e)
Make dimensionality modules optional and update CI dependencies (c596a08)
Make pyphen import optional for documentation builds (ecb33f1)
Make t-SNE visualizer dependencies optional for CI (d14eac4)
Remove _working/analysis_refactor.md from version control (dc27bff)
Remove imported-members from autoapi to prevent duplicate warnings (976cc36)
Resolve 3 CI test failures (964c2d7)
Resolve Black formatting and mypy type errors in syllable extractor (c3eb953)
Resolve markdownlint issues across documentation files (56479a7)
Skip permission test on Windows (different permission model) (6ddef3c)
Suppress expected Sphinx warnings for dataclass attributes and underscores (495d5e3)
Use time.perf_counter() for higher precision timing on Windows (8d32817)
Documentation
Add corpus_db to Claude Code documentation (a261826)
Add corpus_db to README.md Build Tools section (68912b1)
Add documentation content rules to CLAUDE.md (23d9234)
Add pre-commit hook for CLI documentation sync reminders (4a064f3)
Add table of contents and navigation links to README (199e34b)
Automate CLI documentation with sphinx-argparse (9a993e7)
build_tools: Add corpus_db to Sphinx documentation (65faef7)
Complete Phase 7-8 of analysis refactoring with full documentation (b70373d)
pilot: Refactor syllable_extractor to use auto-generated docs (38fbfb2)
Refactor CLAUDE.md into modular documentation structure (054ce99)
Remove redundant API Reference section from analysis_tools.rst (48fead8)
rollout: Complete auto-generated documentation refactor for all build tools (97cdd8f)
Streamline README to focus on quick start and overview (7bfad15)
Update README with new syllable_extractor package usage (3f06376)
0.2.1 (2026-01-08)
Features
Add batch processing CLI for syllable extractor (1b6f1f8)
Add CLI support for automatic language detection (60b1dd2)
Add comprehensive badges and codecov token to CI (8cc155a)
Add comprehensive CI/CD infrastructure and syllable extractor enhancements (8487f9f)
Add interactive HTML visualization to t-SNE visualizer (8df6fa7)
Add language code to output filenames for multi-language support (ba1c3bf)
Add optional language auto-detection for syllable extraction (705261d)
Add parameter logging and optional mapping to t-SNE visualizer (f877d10)
Add syllable walker for phonetic feature space exploration (9d1b7e8)
Add t-SNE visualization tool for feature signature space (5c8b44a)
build_tools: Add corpus_db ledger for extraction run provenance (53894ee)
build_tools: Add syllable walker for phonetic space exploration (058dec5)
build_tools: Integrate corpus_db into syllable_extractor CLI (28c0ee9)
Complete Phase 1-4 of analysis refactoring - add dimensionality modules (c8035e3)
Complete Phase 5-6 of analysis refactoring - add plotting modules and refactor tsne_visualizer (d8d4097)
Improve test coverage for syllable_extractor from 41% to 43% (a149cb4)
Make interactive t-SNE visualization responsive with min-width constraint (9b8305a)
Bug Fixes
Add missing sphinx-argparse dependency for ReadTheDocs (a73bf2f)
Clean up test_tsne_visualizer to keep only integration tests (74b0cef)
Configure matplotlib to use non-interactive backend for CI (d7e0450)
corpus_db: Store paths in POSIX format for cross-platform compatibility (d4a19e5)
corpus_db: Update test to use POSIX path format for comparison (20c96f2)
Create extractor instance after auto-detection for file saving (3d5b13f)
docs: Link changelog.rst to auto-generated CHANGELOG.md (dea252a)
docs: Replace invalid JSON placeholder with valid example (51e3b05)
Expand mypy coverage to include tests and build_tools (eb875c2)
Fix colorbar title overlapping with values in interactive t-SNE visualization (4506aef)
Increase documentation warning threshold to accommodate dataclass warnings (8d9657e)
Make dimensionality modules optional and update CI dependencies (c596a08)
Make pyphen import optional for documentation builds (ecb33f1)
Make t-SNE visualizer dependencies optional for CI (d14eac4)
Remove _working/analysis_refactor.md from version control (dc27bff)
Remove imported-members from autoapi to prevent duplicate warnings (976cc36)
Resolve 3 CI test failures (964c2d7)
Resolve Black formatting and mypy type errors in syllable extractor (c3eb953)
Resolve markdownlint issues across documentation files (56479a7)
Skip permission test on Windows (different permission model) (6ddef3c)
Suppress expected Sphinx warnings for dataclass attributes and underscores (495d5e3)
Use time.perf_counter() for higher precision timing on Windows (8d32817)
Documentation
Add corpus_db to Claude Code documentation (a261826)
Add corpus_db to README.md Build Tools section (68912b1)
Add documentation content rules to CLAUDE.md (23d9234)
Add pre-commit hook for CLI documentation sync reminders (4a064f3)
Add table of contents and navigation links to README (199e34b)
Automate CLI documentation with sphinx-argparse (9a993e7)
build_tools: Add corpus_db to Sphinx documentation (65faef7)
Complete Phase 7-8 of analysis refactoring with full documentation (b70373d)
pilot: Refactor syllable_extractor to use auto-generated docs (38fbfb2)
Refactor CLAUDE.md into modular documentation structure (054ce99)
Remove redundant API Reference section from analysis_tools.rst (48fead8)
rollout: Complete auto-generated documentation refactor for all build tools (97cdd8f)
Streamline README to focus on quick start and overview (7bfad15)
Update README with new syllable_extractor package usage (3f06376)
0.2.1 (2026-01-08)
Features
Add batch processing CLI for syllable extractor (1b6f1f8)
Add CLI support for automatic language detection (60b1dd2)
Add comprehensive badges and codecov token to CI (8cc155a)
Add comprehensive CI/CD infrastructure and syllable extractor enhancements (8487f9f)
Add interactive HTML visualization to t-SNE visualizer (8df6fa7)
Add language code to output filenames for multi-language support (ba1c3bf)
Add optional language auto-detection for syllable extraction (705261d)
Add parameter logging and optional mapping to t-SNE visualizer (f877d10)
Add syllable walker for phonetic feature space exploration (9d1b7e8)
Add t-SNE visualization tool for feature signature space (5c8b44a)
build_tools: Add corpus_db ledger for extraction run provenance (53894ee)
build_tools: Add syllable walker for phonetic space exploration (058dec5)
build_tools: Integrate corpus_db into syllable_extractor CLI (28c0ee9)
Complete Phase 1-4 of analysis refactoring - add dimensionality modules (c8035e3)
Complete Phase 5-6 of analysis refactoring - add plotting modules and refactor tsne_visualizer (d8d4097)
Improve test coverage for syllable_extractor from 41% to 43% (a149cb4)
Make interactive t-SNE visualization responsive with min-width constraint (9b8305a)
Bug Fixes
Add missing sphinx-argparse dependency for ReadTheDocs (a73bf2f)
Clean up test_tsne_visualizer to keep only integration tests (74b0cef)
Configure matplotlib to use non-interactive backend for CI (d7e0450)
corpus_db: Store paths in POSIX format for cross-platform compatibility (d4a19e5)
corpus_db: Update test to use POSIX path format for comparison (20c96f2)
Create extractor instance after auto-detection for file saving (3d5b13f)
docs: Link changelog.rst to auto-generated CHANGELOG.md (dea252a)
docs: Replace invalid JSON placeholder with valid example (51e3b05)
Expand mypy coverage to include tests and build_tools (eb875c2)
Fix colorbar title overlapping with values in interactive t-SNE visualization (4506aef)
Increase documentation warning threshold to accommodate dataclass warnings (8d9657e)
Make dimensionality modules optional and update CI dependencies (c596a08)
Make pyphen import optional for documentation builds (ecb33f1)
Make t-SNE visualizer dependencies optional for CI (d14eac4)
Remove _working/analysis_refactor.md from version control (dc27bff)
Remove imported-members from autoapi to prevent duplicate warnings (976cc36)
Resolve 3 CI test failures (964c2d7)
Resolve Black formatting and mypy type errors in syllable extractor (c3eb953)
Resolve markdownlint issues across documentation files (56479a7)
Skip permission test on Windows (different permission model) (6ddef3c)
Suppress expected Sphinx warnings for dataclass attributes and underscores (495d5e3)
Use time.perf_counter() for higher precision timing on Windows (8d32817)
Documentation
Add corpus_db to Claude Code documentation (a261826)
Add corpus_db to README.md Build Tools section (68912b1)
Add documentation content rules to CLAUDE.md (23d9234)
Add pre-commit hook for CLI documentation sync reminders (4a064f3)
Add table of contents and navigation links to README (199e34b)
Automate CLI documentation with sphinx-argparse (9a993e7)
build_tools: Add corpus_db to Sphinx documentation (65faef7)
Complete Phase 7-8 of analysis refactoring with full documentation (b70373d)
pilot: Refactor syllable_extractor to use auto-generated docs (38fbfb2)
Refactor CLAUDE.md into modular documentation structure (054ce99)
Remove redundant API Reference section from analysis_tools.rst (48fead8)
rollout: Complete auto-generated documentation refactor for all build tools (97cdd8f)
Streamline README to focus on quick start and overview (7bfad15)
Update README with new syllable_extractor package usage (3f06376)
[0.2.0] - 2026-01-08
This release represents a significant expansion of the build tools infrastructure while maintaining the Phase 1 proof-of-concept generator. The focus has been on creating a robust corpus linguistics pipeline for syllable extraction, normalization, feature annotation, and phonetic space analysis.
Features
Build Tools Suite
Syllable Extractor: Dictionary-based hyphenation using pyphen (LibreOffice dictionaries)
Support for 40+ languages
Automatic language detection with langdetect
Batch processing capabilities
Configurable syllable length constraints
Multi-language output file support
Syllable Normalizer: 3-step normalization pipeline
Character decomposition and normalization
Phoneme-based normalization
Length and structure filtering
Syllable Feature Annotator: Phonetic feature detection system
12 phonetic feature detectors (consonant clusters, vowel patterns, etc.)
Binary feature signatures for each syllable
JSON output with metadata
Syllable Walker: Phonetic space exploration tool
Navigate through similar syllables based on feature signatures
Step-by-step phonetic transformations
Interactive exploration of syllable relationships
Analysis Tools
Feature Signature Analysis: Statistical analysis of annotated syllables
Feature frequency distributions
Correlation analysis
Comprehensive reporting
t-SNE Visualization: Dimensionality reduction and visualization
Interactive HTML visualizations with plotly
Static matplotlib plots
Parameter logging and syllable mapping
Responsive design with min-width constraints
Optional dependencies for CI compatibility
Random Sampler: Stratified random sampling of annotated syllables
Documentation
Automated CLI Documentation: Integration with sphinx-argparse for auto-generated command-line reference
Modular Documentation Structure: Reorganized CLAUDE.md into topic-specific files in
claude/directoryArchitecture and Design
Development Guide
CI/CD Pipeline
Build Tools Documentation
Documentation Content Rules: Single source of truth policy for docstrings vs RST files
Pre-commit Hook: Reminders for CLI documentation synchronization
Internal Changes
Analysis Tools Reorganization: Moved to top-level
build_tools/syllable_analysis/structureSyllable Extractor Modularization: Extracted into proper package structure
CI Improvements: Optional dependencies handling for matplotlib and dimensionality modules
Test Coverage Improvements: Expanded coverage across build tools
Fixes
Platform compatibility fixes (Windows permission handling, matplotlib backend configuration)
Sphinx documentation warnings resolution
ReadTheDocs build improvements (optional pyphen import, dependency handling)
Interactive visualization improvements (colorbar overlap fix, responsive design)
Test suite cleanup and CI stability improvements
[0.1.0] - Initial Release
Initial proof-of-concept release with Phase 1 generator:
Basic
NameGeneratorclass with deterministic seeding“simple” pattern with hardcoded syllables
Zero runtime dependencies
Comprehensive CI/CD infrastructure (GitHub Actions, pre-commit hooks)
Sphinx documentation with ReadTheDocs integration
GPL-3.0-or-later license