Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.4.0 (2026-01-10)

Features

  • build_tools: Add corpus_db tracking to interactive mode (d1fa97f)

  • build_tools: Add corpus_db_viewer TUI for database inspection (7df65c3)

  • build_tools: Add extractor identifiers to output directory names (5bbe49b)

  • build_tools: Add NLTK syllable extractor for phonetic syllabification (16c5237)

  • build_tools: Add NLTK syllable normaliser with fragment cleaning (baf223a)

  • build_tools: Add pyphen_ prefix to syllable normaliser outputs (c1c2b8a)

  • build_tools: Make pyphen extractor language argument optional (deeb4c7)

Documentation

  • build_tools: Add Basic Usage sections to CLI documentation (a482971)

  • build_tools: Add documentation for NLTK syllable extractor (8121d04)

  • build_tools: Add NLTK normaliser documentation (e06da84)

  • build_tools: Add Sphinx documentation for corpus_db_viewer (0681e4d)

  • build_tools: Fix bash command formatting in CLI documentation (4dfbd87)

  • build_tools: Fix bash formatting in module docstrings (8162ef6)

  • build_tools: Improve corpus_db_viewer documentation formatting (b652ac1)

  • build_tools: Standardize RST documentation and eliminate redundancy (12bb279)

  • build_tools: Update NLTK extractor docs for duplicate preservation (3971524)

  • Fix broken cross-references in syllable_walk.rst (f1c582a)

  • Update documentation titles and references for pyphen tools (955c90b)

  • Update README and CLAUDE.md with NLTK extractor setup instructions (8beb55c)

0.3.0 (2026-01-08)

Features

  • Add batch processing CLI for syllable extractor (1b6f1f8)

  • Add CLI support for automatic language detection (60b1dd2)

  • Add comprehensive badges and codecov token to CI (8cc155a)

  • Add comprehensive CI/CD infrastructure and syllable extractor enhancements (8487f9f)

  • Add interactive HTML visualization to t-SNE visualizer (8df6fa7)

  • Add language code to output filenames for multi-language support (ba1c3bf)

  • Add optional language auto-detection for syllable extraction (705261d)

  • Add parameter logging and optional mapping to t-SNE visualizer (f877d10)

  • Add syllable walker for phonetic feature space exploration (9d1b7e8)

  • Add t-SNE visualization tool for feature signature space (5c8b44a)

  • build_tools: Add corpus_db ledger for extraction run provenance (53894ee)

  • build_tools: Add syllable walker for phonetic space exploration (058dec5)

  • build_tools: Integrate corpus_db into syllable_extractor CLI (28c0ee9)

  • Complete Phase 1-4 of analysis refactoring - add dimensionality modules (c8035e3)

  • Complete Phase 5-6 of analysis refactoring - add plotting modules and refactor tsne_visualizer (d8d4097)

  • Improve test coverage for syllable_extractor from 41% to 43% (a149cb4)

  • Make interactive t-SNE visualization responsive with min-width constraint (9b8305a)

Bug Fixes

  • Add missing sphinx-argparse dependency for ReadTheDocs (a73bf2f)

  • Clean up test_tsne_visualizer to keep only integration tests (74b0cef)

  • Configure matplotlib to use non-interactive backend for CI (d7e0450)

  • corpus_db: Store paths in POSIX format for cross-platform compatibility (d4a19e5)

  • corpus_db: Update test to use POSIX path format for comparison (20c96f2)

  • Create extractor instance after auto-detection for file saving (3d5b13f)

  • docs: Link changelog.rst to auto-generated CHANGELOG.md (dea252a)

  • docs: Replace invalid JSON placeholder with valid example (51e3b05)

  • Expand mypy coverage to include tests and build_tools (eb875c2)

  • Fix colorbar title overlapping with values in interactive t-SNE visualization (4506aef)

  • Increase documentation warning threshold to accommodate dataclass warnings (8d9657e)

  • Make dimensionality modules optional and update CI dependencies (c596a08)

  • Make pyphen import optional for documentation builds (ecb33f1)

  • Make t-SNE visualizer dependencies optional for CI (d14eac4)

  • Remove _working/analysis_refactor.md from version control (dc27bff)

  • Remove imported-members from autoapi to prevent duplicate warnings (976cc36)

  • Resolve 3 CI test failures (964c2d7)

  • Resolve Black formatting and mypy type errors in syllable extractor (c3eb953)

  • Resolve markdownlint issues across documentation files (56479a7)

  • Skip permission test on Windows (different permission model) (6ddef3c)

  • Suppress expected Sphinx warnings for dataclass attributes and underscores (495d5e3)

  • Use time.perf_counter() for higher precision timing on Windows (8d32817)

Documentation

  • Add corpus_db to Claude Code documentation (a261826)

  • Add corpus_db to README.md Build Tools section (68912b1)

  • Add documentation content rules to CLAUDE.md (23d9234)

  • Add pre-commit hook for CLI documentation sync reminders (4a064f3)

  • Add table of contents and navigation links to README (199e34b)

  • Automate CLI documentation with sphinx-argparse (9a993e7)

  • build_tools: Add corpus_db to Sphinx documentation (65faef7)

  • Complete Phase 7-8 of analysis refactoring with full documentation (b70373d)

  • pilot: Refactor syllable_extractor to use auto-generated docs (38fbfb2)

  • Refactor CLAUDE.md into modular documentation structure (054ce99)

  • Remove redundant API Reference section from analysis_tools.rst (48fead8)

  • rollout: Complete auto-generated documentation refactor for all build tools (97cdd8f)

  • Streamline README to focus on quick start and overview (7bfad15)

  • Update README with new syllable_extractor package usage (3f06376)

0.2.1 (2026-01-08)

Features

  • Add batch processing CLI for syllable extractor (1b6f1f8)

  • Add CLI support for automatic language detection (60b1dd2)

  • Add comprehensive badges and codecov token to CI (8cc155a)

  • Add comprehensive CI/CD infrastructure and syllable extractor enhancements (8487f9f)

  • Add interactive HTML visualization to t-SNE visualizer (8df6fa7)

  • Add language code to output filenames for multi-language support (ba1c3bf)

  • Add optional language auto-detection for syllable extraction (705261d)

  • Add parameter logging and optional mapping to t-SNE visualizer (f877d10)

  • Add syllable walker for phonetic feature space exploration (9d1b7e8)

  • Add t-SNE visualization tool for feature signature space (5c8b44a)

  • build_tools: Add corpus_db ledger for extraction run provenance (53894ee)

  • build_tools: Add syllable walker for phonetic space exploration (058dec5)

  • build_tools: Integrate corpus_db into syllable_extractor CLI (28c0ee9)

  • Complete Phase 1-4 of analysis refactoring - add dimensionality modules (c8035e3)

  • Complete Phase 5-6 of analysis refactoring - add plotting modules and refactor tsne_visualizer (d8d4097)

  • Improve test coverage for syllable_extractor from 41% to 43% (a149cb4)

  • Make interactive t-SNE visualization responsive with min-width constraint (9b8305a)

Bug Fixes

  • Add missing sphinx-argparse dependency for ReadTheDocs (a73bf2f)

  • Clean up test_tsne_visualizer to keep only integration tests (74b0cef)

  • Configure matplotlib to use non-interactive backend for CI (d7e0450)

  • corpus_db: Store paths in POSIX format for cross-platform compatibility (d4a19e5)

  • corpus_db: Update test to use POSIX path format for comparison (20c96f2)

  • Create extractor instance after auto-detection for file saving (3d5b13f)

  • docs: Link changelog.rst to auto-generated CHANGELOG.md (dea252a)

  • docs: Replace invalid JSON placeholder with valid example (51e3b05)

  • Expand mypy coverage to include tests and build_tools (eb875c2)

  • Fix colorbar title overlapping with values in interactive t-SNE visualization (4506aef)

  • Increase documentation warning threshold to accommodate dataclass warnings (8d9657e)

  • Make dimensionality modules optional and update CI dependencies (c596a08)

  • Make pyphen import optional for documentation builds (ecb33f1)

  • Make t-SNE visualizer dependencies optional for CI (d14eac4)

  • Remove _working/analysis_refactor.md from version control (dc27bff)

  • Remove imported-members from autoapi to prevent duplicate warnings (976cc36)

  • Resolve 3 CI test failures (964c2d7)

  • Resolve Black formatting and mypy type errors in syllable extractor (c3eb953)

  • Resolve markdownlint issues across documentation files (56479a7)

  • Skip permission test on Windows (different permission model) (6ddef3c)

  • Suppress expected Sphinx warnings for dataclass attributes and underscores (495d5e3)

  • Use time.perf_counter() for higher precision timing on Windows (8d32817)

Documentation

  • Add corpus_db to Claude Code documentation (a261826)

  • Add corpus_db to README.md Build Tools section (68912b1)

  • Add documentation content rules to CLAUDE.md (23d9234)

  • Add pre-commit hook for CLI documentation sync reminders (4a064f3)

  • Add table of contents and navigation links to README (199e34b)

  • Automate CLI documentation with sphinx-argparse (9a993e7)

  • build_tools: Add corpus_db to Sphinx documentation (65faef7)

  • Complete Phase 7-8 of analysis refactoring with full documentation (b70373d)

  • pilot: Refactor syllable_extractor to use auto-generated docs (38fbfb2)

  • Refactor CLAUDE.md into modular documentation structure (054ce99)

  • Remove redundant API Reference section from analysis_tools.rst (48fead8)

  • rollout: Complete auto-generated documentation refactor for all build tools (97cdd8f)

  • Streamline README to focus on quick start and overview (7bfad15)

  • Update README with new syllable_extractor package usage (3f06376)

0.2.1 (2026-01-08)

Features

  • Add batch processing CLI for syllable extractor (1b6f1f8)

  • Add CLI support for automatic language detection (60b1dd2)

  • Add comprehensive badges and codecov token to CI (8cc155a)

  • Add comprehensive CI/CD infrastructure and syllable extractor enhancements (8487f9f)

  • Add interactive HTML visualization to t-SNE visualizer (8df6fa7)

  • Add language code to output filenames for multi-language support (ba1c3bf)

  • Add optional language auto-detection for syllable extraction (705261d)

  • Add parameter logging and optional mapping to t-SNE visualizer (f877d10)

  • Add syllable walker for phonetic feature space exploration (9d1b7e8)

  • Add t-SNE visualization tool for feature signature space (5c8b44a)

  • build_tools: Add corpus_db ledger for extraction run provenance (53894ee)

  • build_tools: Add syllable walker for phonetic space exploration (058dec5)

  • build_tools: Integrate corpus_db into syllable_extractor CLI (28c0ee9)

  • Complete Phase 1-4 of analysis refactoring - add dimensionality modules (c8035e3)

  • Complete Phase 5-6 of analysis refactoring - add plotting modules and refactor tsne_visualizer (d8d4097)

  • Improve test coverage for syllable_extractor from 41% to 43% (a149cb4)

  • Make interactive t-SNE visualization responsive with min-width constraint (9b8305a)

Bug Fixes

  • Add missing sphinx-argparse dependency for ReadTheDocs (a73bf2f)

  • Clean up test_tsne_visualizer to keep only integration tests (74b0cef)

  • Configure matplotlib to use non-interactive backend for CI (d7e0450)

  • corpus_db: Store paths in POSIX format for cross-platform compatibility (d4a19e5)

  • corpus_db: Update test to use POSIX path format for comparison (20c96f2)

  • Create extractor instance after auto-detection for file saving (3d5b13f)

  • docs: Link changelog.rst to auto-generated CHANGELOG.md (dea252a)

  • docs: Replace invalid JSON placeholder with valid example (51e3b05)

  • Expand mypy coverage to include tests and build_tools (eb875c2)

  • Fix colorbar title overlapping with values in interactive t-SNE visualization (4506aef)

  • Increase documentation warning threshold to accommodate dataclass warnings (8d9657e)

  • Make dimensionality modules optional and update CI dependencies (c596a08)

  • Make pyphen import optional for documentation builds (ecb33f1)

  • Make t-SNE visualizer dependencies optional for CI (d14eac4)

  • Remove _working/analysis_refactor.md from version control (dc27bff)

  • Remove imported-members from autoapi to prevent duplicate warnings (976cc36)

  • Resolve 3 CI test failures (964c2d7)

  • Resolve Black formatting and mypy type errors in syllable extractor (c3eb953)

  • Resolve markdownlint issues across documentation files (56479a7)

  • Skip permission test on Windows (different permission model) (6ddef3c)

  • Suppress expected Sphinx warnings for dataclass attributes and underscores (495d5e3)

  • Use time.perf_counter() for higher precision timing on Windows (8d32817)

Documentation

  • Add corpus_db to Claude Code documentation (a261826)

  • Add corpus_db to README.md Build Tools section (68912b1)

  • Add documentation content rules to CLAUDE.md (23d9234)

  • Add pre-commit hook for CLI documentation sync reminders (4a064f3)

  • Add table of contents and navigation links to README (199e34b)

  • Automate CLI documentation with sphinx-argparse (9a993e7)

  • build_tools: Add corpus_db to Sphinx documentation (65faef7)

  • Complete Phase 7-8 of analysis refactoring with full documentation (b70373d)

  • pilot: Refactor syllable_extractor to use auto-generated docs (38fbfb2)

  • Refactor CLAUDE.md into modular documentation structure (054ce99)

  • Remove redundant API Reference section from analysis_tools.rst (48fead8)

  • rollout: Complete auto-generated documentation refactor for all build tools (97cdd8f)

  • Streamline README to focus on quick start and overview (7bfad15)

  • Update README with new syllable_extractor package usage (3f06376)

[0.2.0] - 2026-01-08

This release represents a significant expansion of the build tools infrastructure while maintaining the Phase 1 proof-of-concept generator. The focus has been on creating a robust corpus linguistics pipeline for syllable extraction, normalization, feature annotation, and phonetic space analysis.

Features

Build Tools Suite

  • Syllable Extractor: Dictionary-based hyphenation using pyphen (LibreOffice dictionaries)

    • Support for 40+ languages

    • Automatic language detection with langdetect

    • Batch processing capabilities

    • Configurable syllable length constraints

    • Multi-language output file support

  • Syllable Normalizer: 3-step normalization pipeline

    • Character decomposition and normalization

    • Phoneme-based normalization

    • Length and structure filtering

  • Syllable Feature Annotator: Phonetic feature detection system

    • 12 phonetic feature detectors (consonant clusters, vowel patterns, etc.)

    • Binary feature signatures for each syllable

    • JSON output with metadata

  • Syllable Walker: Phonetic space exploration tool

    • Navigate through similar syllables based on feature signatures

    • Step-by-step phonetic transformations

    • Interactive exploration of syllable relationships

Analysis Tools

  • Feature Signature Analysis: Statistical analysis of annotated syllables

    • Feature frequency distributions

    • Correlation analysis

    • Comprehensive reporting

  • t-SNE Visualization: Dimensionality reduction and visualization

    • Interactive HTML visualizations with plotly

    • Static matplotlib plots

    • Parameter logging and syllable mapping

    • Responsive design with min-width constraints

    • Optional dependencies for CI compatibility

  • Random Sampler: Stratified random sampling of annotated syllables

Documentation

  • Automated CLI Documentation: Integration with sphinx-argparse for auto-generated command-line reference

  • Modular Documentation Structure: Reorganized CLAUDE.md into topic-specific files in claude/ directory

    • Architecture and Design

    • Development Guide

    • CI/CD Pipeline

    • Build Tools Documentation

  • Documentation Content Rules: Single source of truth policy for docstrings vs RST files

  • Pre-commit Hook: Reminders for CLI documentation synchronization

Internal Changes

  • Analysis Tools Reorganization: Moved to top-level build_tools/syllable_analysis/ structure

  • Syllable Extractor Modularization: Extracted into proper package structure

  • CI Improvements: Optional dependencies handling for matplotlib and dimensionality modules

  • Test Coverage Improvements: Expanded coverage across build tools

Fixes

  • Platform compatibility fixes (Windows permission handling, matplotlib backend configuration)

  • Sphinx documentation warnings resolution

  • ReadTheDocs build improvements (optional pyphen import, dependency handling)

  • Interactive visualization improvements (colorbar overlap fix, responsive design)

  • Test suite cleanup and CI stability improvements

[0.1.0] - Initial Release

Initial proof-of-concept release with Phase 1 generator:

  • Basic NameGenerator class with deterministic seeding

  • “simple” pattern with hardcoded syllables

  • Zero runtime dependencies

  • Comprehensive CI/CD infrastructure (GitHub Actions, pre-commit hooks)

  • Sphinx documentation with ReadTheDocs integration

  • GPL-3.0-or-later license