build_tools.nltk_syllable_extractor.file_io
File I/O operations for NLTK-based syllable extraction.
This module handles all file reading, writing, and output generation for the NLTK syllable extractor.
Attributes
Functions
|
Generate output filenames in run-based subdirectory structure. |
|
Save extraction metadata to a text file. |
Module Contents
- build_tools.nltk_syllable_extractor.file_io.DEFAULT_OUTPUT_DIR
- build_tools.nltk_syllable_extractor.file_io.generate_output_filename(output_dir=None, language_code=None, run_timestamp=None, input_filename=None)[source]
Generate output filenames in run-based subdirectory structure.
Creates a run directory with timestamp and ‘nltk’ identifier, then organizes outputs into syllables/ and meta/ subdirectories: - output_dir/YYYYMMDD_HHMMSS_nltk/syllables/filename.txt - output_dir/YYYYMMDD_HHMMSS_nltk/meta/filename.txt
This structure groups each extraction run’s outputs together, making it easier to manage, archive, or delete complete runs as atomic units.
- Parameters:
output_dir (Optional[pathlib.Path]) – Base output directory. Defaults to _working/output/
language_code (Optional[str]) – Optional language code (e.g., ‘en_US’). Used for filename if input_filename not provided.
run_timestamp (Optional[str]) – Optional timestamp string (YYYYMMDD_HHMMSS format). If provided, uses this timestamp for the run directory name. If not provided, generates a new timestamp using datetime.now(). Critical for batch processing - pass the same timestamp to group all files from a batch into one run directory.
input_filename (Optional[str]) – Optional input filename to use for output naming. If provided, output files will use this name (e.g., ‘alice.txt’). Takes precedence over language_code for naming.
- Returns:
Tuple of (syllables_path, metadata_path)
- Return type:
Example
>>> # Interactive mode - single file with language code >>> syllables_path, meta_path = generate_output_filename(language_code='en_US') >>> print(syllables_path) _working/output/20260110_153022_nltk/syllables/en_US.txt
>>> # Batch mode - multiple files sharing one run directory >>> timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") >>> s1, m1 = generate_output_filename( ... run_timestamp=timestamp, ... input_filename='alice.txt' ... ) >>> s2, m2 = generate_output_filename( ... run_timestamp=timestamp, ... input_filename='middlemarch.txt' ... ) >>> print(s1) _working/output/20260110_153022_nltk/syllables/alice.txt >>> print(s2) _working/output/20260110_153022_nltk/syllables/middlemarch.txt >>> # Both files share the same run directory
Note
For batch processing, always pass the same run_timestamp to group all outputs into a single run directory. This represents one logical batch operation, regardless of how many input files are processed.
- build_tools.nltk_syllable_extractor.file_io.save_metadata(result, output_path)[source]
Save extraction metadata to a text file.
- Parameters:
result (build_tools.nltk_syllable_extractor.models.ExtractionResult) – ExtractionResult containing metadata to save
output_path (pathlib.Path) – Path to the output metadata file
- Raises:
IOError – If there’s an error writing the file
Example
>>> result = ExtractionResult(...) >>> save_metadata(result, Path("output.meta.txt"))