build_tools.pyphen_syllable_extractor.language_detection
Language auto-detection for syllable extraction.
This module provides automatic language detection functionality using the langdetect library. It maps ISO 639-1/639-3 language codes to pyphen-compatible locale codes for seamless integration with the syllable extractor.
The language detection is optional and only used when explicitly requested. It requires the langdetect package to be installed separately.
- Typical Usage:
>>> from build_tools.pyphen_syllable_extractor import detect_language_code >>> text = "Bonjour le monde, comment allez-vous aujourd'hui?" >>> code = detect_language_code(text) >>> print(code) 'fr'
>>> # With custom default >>> code = detect_language_code("???", default='en_US') >>> print(code) 'en_US'
>>> # Check if available >>> from build_tools.pyphen_syllable_extractor.language_detection import is_detection_available >>> if is_detection_available(): ... code = detect_language_code(text)
Note
Language detection requires at least 20-50 characters for reliable results. Very short text may produce inaccurate detections.
Attributes
Functions
Check if language detection is available. |
|
|
Auto-detect language from text and return pyphen-compatible language code. |
|
Get alternative pyphen locale codes for a given ISO language code. |
|
Get the default pyphen locale for an ISO language code. |
Get a dictionary of all ISO codes and their default pyphen locales. |
Module Contents
- build_tools.pyphen_syllable_extractor.language_detection.LANGDETECT_AVAILABLE = True
- build_tools.pyphen_syllable_extractor.language_detection.is_detection_available()[source]
Check if language detection is available.
- Returns:
True if langdetect is installed and functional, False otherwise.
- Return type:
Example
>>> if is_detection_available(): ... print("Language detection is available") ... else: ... print("Install langdetect: pip install langdetect")
- build_tools.pyphen_syllable_extractor.language_detection.detect_language_code(text, default='en_US', min_confidence_length=20, suppress_warnings=False)[source]
Auto-detect language from text and return pyphen-compatible language code.
This function analyzes the input text using langdetect and maps the detected ISO 639-1 language code to a pyphen-compatible locale code (e.g., “en” -> “en_US”).
The function requires at least min_confidence_length characters for reliable detection. Shorter text will return the default language with a warning.
- Parameters:
text (str) – Input text to analyze. Should be at least 20-50 characters for reliable detection. Mixed-language text may produce unpredictable results.
default (str) – Default language code to return if detection fails or langdetect is not installed (default: “en_US”).
min_confidence_length (int) – Minimum text length (in characters) required for detection attempt (default: 20). Text shorter than this returns the default language.
suppress_warnings (bool) – If True, suppress warning messages when detection fails or langdetect is unavailable (default: False).
- Returns:
A pyphen-compatible language code (e.g., “en_US”, “de_DE”, “fr”). Returns default if detection fails, text is too short, or langdetect is not available.
- Raises:
ImportError – If langdetect is not installed (only when suppress_warnings=False)
- Return type:
Example
>>> # Detect English text >>> text = "Hello world, this is a test of language detection" >>> detect_language_code(text) 'en_US'
>>> # Detect French text >>> text = "Bonjour le monde, comment allez-vous aujourd'hui?" >>> detect_language_code(text) 'fr'
>>> # Short text falls back to default >>> detect_language_code("Hello") 'en_US'
>>> # Custom default for unknown language >>> detect_language_code("???", default='de_DE') 'de_DE'
>>> # Suppress warnings for production use >>> code = detect_language_code("abc", default='en_US', suppress_warnings=True)
Note
Detection accuracy decreases significantly with text shorter than 50 chars
Mixed-language text detection is unreliable
Some languages may map to different locales than expected (e.g., “pt” -> “pt_PT”)
Use get_alternative_locales() to see all available variants for a language
Requires langdetect: pip install langdetect
- build_tools.pyphen_syllable_extractor.language_detection.get_alternative_locales(iso_code)[source]
Get alternative pyphen locale codes for a given ISO language code.
Some languages have multiple regional variants (e.g., English has en_US and en_GB). This function returns all available pyphen locales for a language.
- Parameters:
iso_code (str) – ISO 639-1 language code (e.g., “en”, “de”, “pt”)
- Returns:
List of pyphen locale codes for the language, or None if not available. Returns None if the language has no alternatives (only one locale).
- Return type:
Example
>>> get_alternative_locales("en") ['en_US', 'en_GB']
>>> get_alternative_locales("de") ['de_DE', 'de_AT', 'de_CH']
>>> get_alternative_locales("pt") ['pt_PT', 'pt_BR']
>>> get_alternative_locales("fr") # Only one variant None
>>> get_alternative_locales("xx") # Unknown language None
- build_tools.pyphen_syllable_extractor.language_detection.get_default_locale(iso_code)[source]
Get the default pyphen locale for an ISO language code.
This is the locale that will be used by detect_language_code() when the specified language is detected.
- Parameters:
iso_code (str) – ISO 639-1 language code (e.g., “en”, “de”, “pt”)
- Returns:
Default pyphen locale code (e.g., “en_US”), or None if language is not supported.
- Return type:
str | None
Example
>>> get_default_locale("en") 'en_US'
>>> get_default_locale("pt") 'pt_PT'
>>> get_default_locale("de") 'de_DE'
>>> get_default_locale("xx") # Unknown language None
- build_tools.pyphen_syllable_extractor.language_detection.list_supported_languages()[source]
Get a dictionary of all ISO codes and their default pyphen locales.
Example
>>> langs = list_supported_languages() >>> print(f"English: {langs['en']}") English: en_US >>> print(f"German: {langs['de']}") German: de_DE >>> print(f"Total languages: {len(langs)}") Total languages: 40+