build_tools.pyphen_syllable_normaliser.frequency
================================================

.. py:module:: build_tools.pyphen_syllable_normaliser.frequency

.. autoapi-nested-parse::

   Frequency analysis for canonical syllables.

   This module handles Step 3 of the normalization pipeline: analyzing frequency
   distribution of canonical syllables and generating frequency intelligence
   data structures. This captures "how often each canonical syllable occurs
   before we collapse identity" - essential for understanding natural language
   patterns in the source corpus.


Classes
-------

.. autoapisummary::

   build_tools.pyphen_syllable_normaliser.frequency.FrequencyAnalyzer


Functions
---------

.. autoapisummary::

   build_tools.pyphen_syllable_normaliser.frequency.load_frequencies_from_file
   build_tools.pyphen_syllable_normaliser.frequency.load_unique_syllables_from_file


Module Contents
---------------

.. py:class:: FrequencyAnalyzer

   Analyzes frequency distribution of canonical syllables.

   This class handles the intelligence capture phase of the normalization
   pipeline. It counts occurrences of each canonical syllable, creates
   frequency rankings, and generates output files for downstream analysis
   and feature annotation.

   .. admonition:: Example

      >>> from pathlib import Path
      >>> analyzer = FrequencyAnalyzer()
      >>> syllables = ['ka', 'ra', 'mi', 'ka', 'ta', 'ka']
      >>> frequencies = analyzer.calculate_frequencies(syllables)
      >>> frequencies
      {'ka': 3, 'ra': 1, 'mi': 1, 'ta': 1}
      >>> analyzer.save_frequencies(frequencies, Path("syllables_frequencies.json"))
      >>> unique = analyzer.extract_unique_syllables(syllables)
      >>> unique
      ['ka', 'mi', 'ra', 'ta']


   .. py:method:: calculate_frequencies(syllables)

      Calculate frequency counts for canonical syllables.

      Counts how many times each unique syllable appears in the input list.
      This captures the natural frequency distribution from the source corpus
      before deduplication.

      :param syllables: List of canonical syllables (may contain duplicates).

      :returns: Dictionary mapping each unique syllable to its occurrence count.

      .. admonition:: Example

         >>> analyzer = FrequencyAnalyzer()
         >>> syllables = ['ka', 'ra', 'mi', 'ka', 'ta', 'ka', 'ra']
         >>> frequencies = analyzer.calculate_frequencies(syllables)
         >>> frequencies
         {'ka': 3, 'ra': 2, 'mi': 1, 'ta': 1}
         >>> sum(frequencies.values())  # Total syllable count
         7

      .. note::

         The returned dictionary is not sorted. Use create_frequency_entries()
         to generate sorted frequency rankings.


   .. py:method:: create_frequency_entries(frequencies)

      Create ranked frequency entries from frequency counts.

      Converts a frequency dictionary into a list of FrequencyEntry objects
      with ranking information and percentage calculations. Entries are
      sorted by frequency (descending) then alphabetically (ascending).

      :param frequencies: Dictionary mapping syllable to occurrence count.

      :returns: List of FrequencyEntry objects sorted by frequency (highest first),
                with alphabetical secondary sort for ties.

      .. admonition:: Example

         >>> analyzer = FrequencyAnalyzer()
         >>> frequencies = {'ka': 187, 'ra': 162, 'mi': 145, 'ta': 98}
         >>> entries = analyzer.create_frequency_entries(frequencies)
         >>> entries[0]
         FrequencyEntry(canonical='ka', frequency=187, rank=1, percentage=31.5)
         >>> entries[0].canonical
         'ka'
         >>> entries[0].rank
         1

      .. note::

         Percentage is calculated as (frequency / total_count) * 100.
         Ranks start at 1 (most frequent syllable has rank=1).


   .. py:method:: extract_unique_syllables(syllables)

      Extract unique syllables and return in sorted order.

      Removes duplicates from the syllable list and returns a sorted list
      of unique canonical syllables. This creates the authoritative syllable
      inventory for downstream feature annotation.

      :param syllables: List of canonical syllables (may contain duplicates).

      :returns: Sorted list of unique syllable strings (alphabetical order).

      .. admonition:: Example

         >>> analyzer = FrequencyAnalyzer()
         >>> syllables = ['ka', 'ra', 'mi', 'ka', 'ta', 'ka', 'ra']
         >>> unique = analyzer.extract_unique_syllables(syllables)
         >>> unique
         ['ka', 'mi', 'ra', 'ta']
         >>> len(unique)
         4

      .. note::

         Sorting is alphabetical (a-z) for deterministic output.
         Empty syllable lists return an empty list.


   .. py:method:: save_frequencies(frequencies, output_path)

      Save frequency dictionary to JSON file.

      Writes the frequency intelligence to a JSON file for downstream
      analysis. The output is formatted with indentation for readability
      and sorted by key for deterministic output.

      :param frequencies: Dictionary mapping syllable to occurrence count.
      :param output_path: Path where the JSON file should be saved.

      :raises PermissionError: If the output file cannot be written.
      :raises OSError: If there are filesystem issues (disk full, etc.).

      .. admonition:: Example

         >>> analyzer = FrequencyAnalyzer()
         >>> frequencies = {'ka': 187, 'ra': 162, 'mi': 145}
         >>> analyzer.save_frequencies(frequencies, Path("syllables_frequencies.json"))
         # File contains:
         # {
         #   "ka": 187,
         #   "mi": 145,
         #   "ra": 162
         # }

      .. note::

         The JSON is formatted with 2-space indentation and keys are
         sorted alphabetically for consistent diffs in version control.


   .. py:method:: save_unique_syllables(unique_syllables, output_path)

      Save unique syllables to text file.

      Writes the deduplicated canonical syllable inventory to a text file,
      one syllable per line. This creates the authoritative syllable list
      for feature annotation and downstream processing.

      :param unique_syllables: Sorted list of unique canonical syllables.
      :param output_path: Path where the text file should be saved.

      :raises PermissionError: If the output file cannot be written.
      :raises OSError: If there are filesystem issues (disk full, etc.).

      .. admonition:: Example

         >>> analyzer = FrequencyAnalyzer()
         >>> unique = ['ka', 'mi', 'ra', 'ta']
         >>> analyzer.save_unique_syllables(unique, Path("syllables_unique.txt"))
         # File contains:
         # ka
         # mi
         # ra
         # ta

      .. note::

         Syllables should be pre-sorted (alphabetically) before calling
         this method. Use extract_unique_syllables() which returns
         sorted output, or sort manually.


.. py:function:: load_frequencies_from_file(file_path)

   Load frequency dictionary from JSON file.

   Reads a previously saved syllables_frequencies.json file and returns
   the frequency dictionary. Useful for analysis and inspection of
   normalization results.

   :param file_path: Path to the JSON frequency file.

   :returns: Dictionary mapping syllable to occurrence count.

   :raises FileNotFoundError: If the file does not exist.
   :raises json.JSONDecodeError: If the file is not valid JSON.
   :raises PermissionError: If the file cannot be read.

   .. admonition:: Example

      >>> from pathlib import Path
      >>> frequencies = load_frequencies_from_file(Path("syllables_frequencies.json"))
      >>> frequencies['ka']
      187
      >>> len(frequencies)
      412

   .. note::

      The JSON file must have been created by save_frequencies() or
      follow the same format: {"syllable": count, ...}


.. py:function:: load_unique_syllables_from_file(file_path)

   Load unique syllables from text file.

   Reads a previously saved syllables_unique.txt file and returns the
   syllable list. Useful for loading the authoritative syllable inventory
   for feature annotation or analysis.

   :param file_path: Path to the text file containing unique syllables.

   :returns: List of syllable strings (one per line from file).

   :raises FileNotFoundError: If the file does not exist.
   :raises PermissionError: If the file cannot be read.
   :raises UnicodeDecodeError: If the file contains invalid UTF-8.

   .. admonition:: Example

      >>> from pathlib import Path
      >>> syllables = load_unique_syllables_from_file(Path("syllables_unique.txt"))
      >>> syllables[:5]
      ['ka', 'mi', 'ra', 'ta', 'wa']
      >>> len(syllables)
      412

   .. note::

      Empty lines are skipped. Leading/trailing whitespace is stripped
      from each line.