build_tools.pyphen_syllable_extractor.language_detection ======================================================== .. py:module:: build_tools.pyphen_syllable_extractor.language_detection .. autoapi-nested-parse:: Language auto-detection for syllable extraction. This module provides automatic language detection functionality using the langdetect library. It maps ISO 639-1/639-3 language codes to pyphen-compatible locale codes for seamless integration with the syllable extractor. The language detection is optional and only used when explicitly requested. It requires the langdetect package to be installed separately. Typical Usage: >>> from build_tools.pyphen_syllable_extractor import detect_language_code >>> text = "Bonjour le monde, comment allez-vous aujourd'hui?" >>> code = detect_language_code(text) >>> print(code) 'fr' >>> # With custom default >>> code = detect_language_code("???", default='en_US') >>> print(code) 'en_US' >>> # Check if available >>> from build_tools.pyphen_syllable_extractor.language_detection import is_detection_available >>> if is_detection_available(): ... code = detect_language_code(text) .. note:: Language detection requires at least 20-50 characters for reliable results. Very short text may produce inaccurate detections. Attributes ---------- .. autoapisummary:: build_tools.pyphen_syllable_extractor.language_detection.LANGDETECT_AVAILABLE build_tools.pyphen_syllable_extractor.language_detection.ISO_TO_PYPHEN_MAP build_tools.pyphen_syllable_extractor.language_detection.ALTERNATIVE_LOCALES Functions --------- .. autoapisummary:: build_tools.pyphen_syllable_extractor.language_detection.is_detection_available build_tools.pyphen_syllable_extractor.language_detection.detect_language_code build_tools.pyphen_syllable_extractor.language_detection.get_alternative_locales build_tools.pyphen_syllable_extractor.language_detection.get_default_locale build_tools.pyphen_syllable_extractor.language_detection.list_supported_languages Module Contents --------------- .. py:data:: LANGDETECT_AVAILABLE :value: True .. py:data:: ISO_TO_PYPHEN_MAP :type: Dict[str, str] .. py:data:: ALTERNATIVE_LOCALES :type: Dict[str, list[str]] .. py:function:: is_detection_available() Check if language detection is available. :returns: True if langdetect is installed and functional, False otherwise. .. admonition:: Example >>> if is_detection_available(): ... print("Language detection is available") ... else: ... print("Install langdetect: pip install langdetect") .. py:function:: detect_language_code(text, default = 'en_US', min_confidence_length = 20, suppress_warnings = False) Auto-detect language from text and return pyphen-compatible language code. This function analyzes the input text using langdetect and maps the detected ISO 639-1 language code to a pyphen-compatible locale code (e.g., "en" -> "en_US"). The function requires at least `min_confidence_length` characters for reliable detection. Shorter text will return the default language with a warning. :param text: Input text to analyze. Should be at least 20-50 characters for reliable detection. Mixed-language text may produce unpredictable results. :param default: Default language code to return if detection fails or langdetect is not installed (default: "en_US"). :param min_confidence_length: Minimum text length (in characters) required for detection attempt (default: 20). Text shorter than this returns the default language. :param suppress_warnings: If True, suppress warning messages when detection fails or langdetect is unavailable (default: False). :returns: A pyphen-compatible language code (e.g., "en_US", "de_DE", "fr"). Returns `default` if detection fails, text is too short, or langdetect is not available. :raises ImportError: If langdetect is not installed (only when suppress_warnings=False) .. admonition:: Example >>> # Detect English text >>> text = "Hello world, this is a test of language detection" >>> detect_language_code(text) 'en_US' >>> # Detect French text >>> text = "Bonjour le monde, comment allez-vous aujourd'hui?" >>> detect_language_code(text) 'fr' >>> # Short text falls back to default >>> detect_language_code("Hello") 'en_US' >>> # Custom default for unknown language >>> detect_language_code("???", default='de_DE') 'de_DE' >>> # Suppress warnings for production use >>> code = detect_language_code("abc", default='en_US', suppress_warnings=True) .. note:: - Detection accuracy decreases significantly with text shorter than 50 chars - Mixed-language text detection is unreliable - Some languages may map to different locales than expected (e.g., "pt" -> "pt_PT") - Use get_alternative_locales() to see all available variants for a language - Requires langdetect: pip install langdetect .. py:function:: get_alternative_locales(iso_code) Get alternative pyphen locale codes for a given ISO language code. Some languages have multiple regional variants (e.g., English has en_US and en_GB). This function returns all available pyphen locales for a language. :param iso_code: ISO 639-1 language code (e.g., "en", "de", "pt") :returns: List of pyphen locale codes for the language, or None if not available. Returns None if the language has no alternatives (only one locale). .. admonition:: Example >>> get_alternative_locales("en") ['en_US', 'en_GB'] >>> get_alternative_locales("de") ['de_DE', 'de_AT', 'de_CH'] >>> get_alternative_locales("pt") ['pt_PT', 'pt_BR'] >>> get_alternative_locales("fr") # Only one variant None >>> get_alternative_locales("xx") # Unknown language None .. py:function:: get_default_locale(iso_code) Get the default pyphen locale for an ISO language code. This is the locale that will be used by detect_language_code() when the specified language is detected. :param iso_code: ISO 639-1 language code (e.g., "en", "de", "pt") :returns: Default pyphen locale code (e.g., "en_US"), or None if language is not supported. .. admonition:: Example >>> get_default_locale("en") 'en_US' >>> get_default_locale("pt") 'pt_PT' >>> get_default_locale("de") 'de_DE' >>> get_default_locale("xx") # Unknown language None .. py:function:: list_supported_languages() Get a dictionary of all ISO codes and their default pyphen locales. :returns: Dictionary mapping ISO 639-1 codes to pyphen locale codes. .. admonition:: Example >>> langs = list_supported_languages() >>> print(f"English: {langs['en']}") English: en_US >>> print(f"German: {langs['de']}") German: de_DE >>> print(f"Total languages: {len(langs)}") Total languages: 40+