build_tools.pyphen_syllable_extractor.language_detection

Language auto-detection for syllable extraction.

This module provides automatic language detection functionality using the langdetect library. It maps ISO 639-1/639-3 language codes to pyphen-compatible locale codes for seamless integration with the syllable extractor.

The language detection is optional and only used when explicitly requested. It requires the langdetect package to be installed separately.

Typical Usage:
>>> from build_tools.pyphen_syllable_extractor import detect_language_code
>>> text = "Bonjour le monde, comment allez-vous aujourd'hui?"
>>> code = detect_language_code(text)
>>> print(code)
'fr'
>>> # With custom default
>>> code = detect_language_code("???", default='en_US')
>>> print(code)
'en_US'
>>> # Check if available
>>> from build_tools.pyphen_syllable_extractor.language_detection import is_detection_available
>>> if is_detection_available():
...     code = detect_language_code(text)

Note

Language detection requires at least 20-50 characters for reliable results. Very short text may produce inaccurate detections.

Attributes

LANGDETECT_AVAILABLE

ISO_TO_PYPHEN_MAP

ALTERNATIVE_LOCALES

Functions

is_detection_available()

Check if language detection is available.

detect_language_code(text[, default, ...])

Auto-detect language from text and return pyphen-compatible language code.

get_alternative_locales(iso_code)

Get alternative pyphen locale codes for a given ISO language code.

get_default_locale(iso_code)

Get the default pyphen locale for an ISO language code.

list_supported_languages()

Get a dictionary of all ISO codes and their default pyphen locales.

Module Contents

build_tools.pyphen_syllable_extractor.language_detection.LANGDETECT_AVAILABLE = True
build_tools.pyphen_syllable_extractor.language_detection.ISO_TO_PYPHEN_MAP: Dict[str, str]
build_tools.pyphen_syllable_extractor.language_detection.ALTERNATIVE_LOCALES: Dict[str, list[str]]
build_tools.pyphen_syllable_extractor.language_detection.is_detection_available()[source]

Check if language detection is available.

Returns:

True if langdetect is installed and functional, False otherwise.

Return type:

bool

Example

>>> if is_detection_available():
...     print("Language detection is available")
... else:
...     print("Install langdetect: pip install langdetect")
build_tools.pyphen_syllable_extractor.language_detection.detect_language_code(text, default='en_US', min_confidence_length=20, suppress_warnings=False)[source]

Auto-detect language from text and return pyphen-compatible language code.

This function analyzes the input text using langdetect and maps the detected ISO 639-1 language code to a pyphen-compatible locale code (e.g., “en” -> “en_US”).

The function requires at least min_confidence_length characters for reliable detection. Shorter text will return the default language with a warning.

Parameters:
  • text (str) – Input text to analyze. Should be at least 20-50 characters for reliable detection. Mixed-language text may produce unpredictable results.

  • default (str) – Default language code to return if detection fails or langdetect is not installed (default: “en_US”).

  • min_confidence_length (int) – Minimum text length (in characters) required for detection attempt (default: 20). Text shorter than this returns the default language.

  • suppress_warnings (bool) – If True, suppress warning messages when detection fails or langdetect is unavailable (default: False).

Returns:

A pyphen-compatible language code (e.g., “en_US”, “de_DE”, “fr”). Returns default if detection fails, text is too short, or langdetect is not available.

Raises:

ImportError – If langdetect is not installed (only when suppress_warnings=False)

Return type:

str

Example

>>> # Detect English text
>>> text = "Hello world, this is a test of language detection"
>>> detect_language_code(text)
'en_US'
>>> # Detect French text
>>> text = "Bonjour le monde, comment allez-vous aujourd'hui?"
>>> detect_language_code(text)
'fr'
>>> # Short text falls back to default
>>> detect_language_code("Hello")
'en_US'
>>> # Custom default for unknown language
>>> detect_language_code("???", default='de_DE')
'de_DE'
>>> # Suppress warnings for production use
>>> code = detect_language_code("abc", default='en_US', suppress_warnings=True)

Note

  • Detection accuracy decreases significantly with text shorter than 50 chars

  • Mixed-language text detection is unreliable

  • Some languages may map to different locales than expected (e.g., “pt” -> “pt_PT”)

  • Use get_alternative_locales() to see all available variants for a language

  • Requires langdetect: pip install langdetect

build_tools.pyphen_syllable_extractor.language_detection.get_alternative_locales(iso_code)[source]

Get alternative pyphen locale codes for a given ISO language code.

Some languages have multiple regional variants (e.g., English has en_US and en_GB). This function returns all available pyphen locales for a language.

Parameters:

iso_code (str) – ISO 639-1 language code (e.g., “en”, “de”, “pt”)

Returns:

List of pyphen locale codes for the language, or None if not available. Returns None if the language has no alternatives (only one locale).

Return type:

Optional[list[str]]

Example

>>> get_alternative_locales("en")
['en_US', 'en_GB']
>>> get_alternative_locales("de")
['de_DE', 'de_AT', 'de_CH']
>>> get_alternative_locales("pt")
['pt_PT', 'pt_BR']
>>> get_alternative_locales("fr")  # Only one variant
None
>>> get_alternative_locales("xx")  # Unknown language
None
build_tools.pyphen_syllable_extractor.language_detection.get_default_locale(iso_code)[source]

Get the default pyphen locale for an ISO language code.

This is the locale that will be used by detect_language_code() when the specified language is detected.

Parameters:

iso_code (str) – ISO 639-1 language code (e.g., “en”, “de”, “pt”)

Returns:

Default pyphen locale code (e.g., “en_US”), or None if language is not supported.

Return type:

Optional[str]

Example

>>> get_default_locale("en")
'en_US'
>>> get_default_locale("pt")
'pt_PT'
>>> get_default_locale("de")
'de_DE'
>>> get_default_locale("xx")  # Unknown language
None
build_tools.pyphen_syllable_extractor.language_detection.list_supported_languages()[source]

Get a dictionary of all ISO codes and their default pyphen locales.

Returns:

Dictionary mapping ISO 639-1 codes to pyphen locale codes.

Return type:

Dict[str, str]

Example

>>> langs = list_supported_languages()
>>> print(f"English: {langs['en']}")
English: en_US
>>> print(f"German: {langs['de']}")
German: de_DE
>>> print(f"Total languages: {len(langs)}")
Total languages: 40+