build_tools.nltk_syllable_extractor.file_io

File I/O operations for NLTK-based syllable extraction.

This module handles all file reading, writing, and output generation for the NLTK syllable extractor.

Attributes

DEFAULT_OUTPUT_DIR

Functions

generate_output_filename([output_dir, language_code, ...])

Generate output filenames in run-based subdirectory structure.

save_metadata(result, output_path)

Save extraction metadata to a text file.

Module Contents

build_tools.nltk_syllable_extractor.file_io.DEFAULT_OUTPUT_DIR
build_tools.nltk_syllable_extractor.file_io.generate_output_filename(output_dir=None, language_code=None, run_timestamp=None, input_filename=None)[source]

Generate output filenames in run-based subdirectory structure.

Creates a run directory with timestamp and ‘nltk’ identifier, then organizes outputs into syllables/ and meta/ subdirectories: - output_dir/YYYYMMDD_HHMMSS_nltk/syllables/filename.txt - output_dir/YYYYMMDD_HHMMSS_nltk/meta/filename.txt

This structure groups each extraction run’s outputs together, making it easier to manage, archive, or delete complete runs as atomic units.

Parameters:
  • output_dir (pathlib.Path | None) – Base output directory. Defaults to _working/output/

  • language_code (str | None) – Optional language code (e.g., ‘en_US’). Used for filename if input_filename not provided.

  • run_timestamp (str | None) – Optional timestamp string (YYYYMMDD_HHMMSS format). If provided, uses this timestamp for the run directory name. If not provided, generates a new timestamp using datetime.now(). Critical for batch processing - pass the same timestamp to group all files from a batch into one run directory.

  • input_filename (str | None) – Optional input filename to use for output naming. If provided, output files will use this name (e.g., ‘alice.txt’). Takes precedence over language_code for naming.

Returns:

Tuple of (syllables_path, metadata_path)

Return type:

tuple[pathlib.Path, pathlib.Path]

Example

>>> # Interactive mode - single file with language code
>>> syllables_path, meta_path = generate_output_filename(language_code='en_US')
>>> print(syllables_path)
_working/output/20260110_153022_nltk/syllables/en_US.txt
>>> # Batch mode - multiple files sharing one run directory
>>> timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
>>> s1, m1 = generate_output_filename(
...     run_timestamp=timestamp,
...     input_filename='alice.txt'
... )
>>> s2, m2 = generate_output_filename(
...     run_timestamp=timestamp,
...     input_filename='middlemarch.txt'
... )
>>> print(s1)
_working/output/20260110_153022_nltk/syllables/alice.txt
>>> print(s2)
_working/output/20260110_153022_nltk/syllables/middlemarch.txt
>>> # Both files share the same run directory

Note

For batch processing, always pass the same run_timestamp to group all outputs into a single run directory. This represents one logical batch operation, regardless of how many input files are processed.

build_tools.nltk_syllable_extractor.file_io.save_metadata(result, output_path)[source]

Save extraction metadata to a text file.

Parameters:
Raises:

IOError – If there’s an error writing the file

Example

>>> result = ExtractionResult(...)
>>> save_metadata(result, Path("output.meta.txt"))