build_tools.nltk_syllable_extractor.models

Data models for NLTK-based syllable extraction results.

This module defines the data structures used to represent extraction results and their associated metadata for the NLTK syllable extractor.

Classes

`ExtractionResult`	Container for syllable extraction results and associated metadata.
`FileProcessingResult`	Result of processing a single file in batch mode.
`BatchResult`	Aggregate results from a batch processing operation.

Module Contents

class build_tools.nltk_syllable_extractor.models.ExtractionResult[source]

Container for syllable extraction results and associated metadata.

This dataclass stores both the extracted syllables and all relevant metadata about the extraction process for reporting and persistence.

syllables: List of all syllables extracted (includes duplicates)

language_code: Language code used (always “en_US” for NLTK extractor)

min_syllable_length: Minimum syllable length constraint

max_syllable_length: Maximum syllable length constraint

input_path: Path to the input text file

timestamp: When the extraction was performed

only_hyphenated: Whether whole words were excluded

length_distribution: Map of syllable length to count

sample_syllables: Representative sample of extracted syllables

total_words: Total words found in source text

fallback_count: Words not in CMUDict (used fallback heuristics)

rejected_syllables: Syllables rejected due to length constraints

processed_words: Words that were successfully processed

syllables: list[str]

language_code: str

min_syllable_length: int

max_syllable_length: int

input_path: pathlib.Path

timestamp: datetime.datetime

only_hyphenated: bool = True

length_distribution: dict[int, int]

sample_syllables: list[str] = []

total_words: int = 0

fallback_count: int = 0

rejected_syllables: int = 0

processed_words: int = 0

format_metadata()[source]

Format extraction metadata as a human-readable string.

Returns:: Multi-line string containing all extraction metadata formatted for display or file output.
Return type:: str

class build_tools.nltk_syllable_extractor.models.FileProcessingResult[source]

Result of processing a single file in batch mode.

This dataclass stores the outcome of processing one file during batch operations, including success status, extracted syllables count, and any error information if processing failed.

input_path: Path to the input file that was processed

success: Whether processing completed successfully

syllables_count: Number of unique syllables extracted (0 if failed)

language_code: Language code used (always “en_US”)

syllables_output_path: Path where syllables were saved (None if failed)

metadata_output_path: Path where metadata was saved (None if failed)

error_message: Error message if processing failed (None if success)

processing_time: Time taken to process this file in seconds

Example

>>> result = FileProcessingResult(
...     input_path=Path("book.txt"),
...     success=True,
...     syllables_count=245,
...     language_code="en_US",
...     syllables_output_path=Path("output.syllables.en_US.txt"),
...     metadata_output_path=Path("output.meta.en_US.txt"),
...     processing_time=2.45
... )
>>> print(f"Processed {result.syllables_count} syllables")
Processed 245 syllables

input_path: pathlib.Path

success: bool

syllables_count: int

language_code: str

syllables_output_path: pathlib.Path | None = None

metadata_output_path: pathlib.Path | None = None

error_message: str | None = None

processing_time: float = 0.0

class build_tools.nltk_syllable_extractor.models.BatchResult[source]

Aggregate results from a batch processing operation.

This dataclass stores summary statistics and individual file results from processing multiple files in batch mode.

total_files: Total number of files attempted in the batch

successful: Number of files processed successfully

failed: Number of files that failed to process

results: List of individual FileProcessingResult objects

total_time: Total time taken for entire batch operation in seconds

output_directory: Directory where all outputs were saved

Example

>>> result = BatchResult(
...     total_files=5,
...     successful=4,
...     failed=1,
...     results=[...],
...     total_time=12.34,
...     output_directory=Path("_working/output")
... )
>>> print(f"Success rate: {result.successful/result.total_files*100:.1f}%")
Success rate: 80.0%

total_files: int

successful: int

failed: int

results: list[FileProcessingResult]

total_time: float

output_directory: pathlib.Path

format_summary()[source]

Format batch processing summary as a human-readable string.

Creates a detailed summary report showing overall statistics, successful extractions with details, and failed files with error messages.

Returns:: Multi-line formatted string with batch statistics and results
Return type:: str

Example

>>> summary = batch_result.format_summary()
>>> print(summary)
======================================================================
BATCH PROCESSING SUMMARY
======================================================================
Total Files:        5
Successful:         4 (80.0%)
...