build_tools.nltk_syllable_extractor.models

Data models for NLTK-based syllable extraction results.

This module defines the data structures used to represent extraction results and their associated metadata for the NLTK syllable extractor.

Classes

ExtractionResult

Container for syllable extraction results and associated metadata.

FileProcessingResult

Result of processing a single file in batch mode.

BatchResult

Aggregate results from a batch processing operation.

Module Contents

class build_tools.nltk_syllable_extractor.models.ExtractionResult[source]

Container for syllable extraction results and associated metadata.

This dataclass stores both the extracted syllables and all relevant metadata about the extraction process for reporting and persistence.

syllables

List of all syllables extracted (includes duplicates)

language_code

Language code used (always “en_US” for NLTK extractor)

min_syllable_length

Minimum syllable length constraint

max_syllable_length

Maximum syllable length constraint

input_path

Path to the input text file

timestamp

When the extraction was performed

only_hyphenated

Whether whole words were excluded

length_distribution

Map of syllable length to count

sample_syllables

Representative sample of extracted syllables

total_words

Total words found in source text

fallback_count

Words not in CMUDict (used fallback heuristics)

rejected_syllables

Syllables rejected due to length constraints

processed_words

Words that were successfully processed

syllables: list[str]
language_code: str
min_syllable_length: int
max_syllable_length: int
input_path: pathlib.Path
timestamp: datetime.datetime
only_hyphenated: bool = True
length_distribution: dict[int, int]
sample_syllables: list[str] = []
total_words: int = 0
fallback_count: int = 0
rejected_syllables: int = 0
processed_words: int = 0
format_metadata()[source]

Format extraction metadata as a human-readable string.

Returns:

Multi-line string containing all extraction metadata formatted for display or file output.

Return type:

str

class build_tools.nltk_syllable_extractor.models.FileProcessingResult[source]

Result of processing a single file in batch mode.

This dataclass stores the outcome of processing one file during batch operations, including success status, extracted syllables count, and any error information if processing failed.

input_path

Path to the input file that was processed

success

Whether processing completed successfully

syllables_count

Number of unique syllables extracted (0 if failed)

language_code

Language code used (always “en_US”)

syllables_output_path

Path where syllables were saved (None if failed)

metadata_output_path

Path where metadata was saved (None if failed)

error_message

Error message if processing failed (None if success)

processing_time

Time taken to process this file in seconds

Example

>>> result = FileProcessingResult(
...     input_path=Path("book.txt"),
...     success=True,
...     syllables_count=245,
...     language_code="en_US",
...     syllables_output_path=Path("output.syllables.en_US.txt"),
...     metadata_output_path=Path("output.meta.en_US.txt"),
...     processing_time=2.45
... )
>>> print(f"Processed {result.syllables_count} syllables")
Processed 245 syllables
input_path: pathlib.Path
success: bool
syllables_count: int
language_code: str
syllables_output_path: pathlib.Path | None = None
metadata_output_path: pathlib.Path | None = None
error_message: str | None = None
processing_time: float = 0.0
class build_tools.nltk_syllable_extractor.models.BatchResult[source]

Aggregate results from a batch processing operation.

This dataclass stores summary statistics and individual file results from processing multiple files in batch mode.

total_files

Total number of files attempted in the batch

successful

Number of files processed successfully

failed

Number of files that failed to process

results

List of individual FileProcessingResult objects

total_time

Total time taken for entire batch operation in seconds

output_directory

Directory where all outputs were saved

Example

>>> result = BatchResult(
...     total_files=5,
...     successful=4,
...     failed=1,
...     results=[...],
...     total_time=12.34,
...     output_directory=Path("_working/output")
... )
>>> print(f"Success rate: {result.successful/result.total_files*100:.1f}%")
Success rate: 80.0%
total_files: int
successful: int
failed: int
results: list[FileProcessingResult]
total_time: float
output_directory: pathlib.Path
format_summary()[source]

Format batch processing summary as a human-readable string.

Creates a detailed summary report showing overall statistics, successful extractions with details, and failed files with error messages.

Returns:

Multi-line formatted string with batch statistics and results

Return type:

str

Example

>>> summary = batch_result.format_summary()
>>> print(summary)
======================================================================
BATCH PROCESSING SUMMARY
======================================================================
Total Files:        5
Successful:         4 (80.0%)
...