Follow-up to #85 / #184 (patronymic_name_order). That feature handles East-Slavic patronymics (-ovich/-ovna, Latin + Cyrillic) but deliberately left out Turkic patronymic markers common in Azerbaijani and Central-Asian names, e.g.:
oglu/ogly ("son of"), qizi/kizi/kyzy/gyzy ("daughter of"), uly, uulu — plus Cyrillic forms (оглу, кызы, улы…).
Why this is a separate, larger piece (not just adding a regex): Slavic patronymics are a suffix on a single token (Ivanovich), so the existing rule keys off one trailing word. Turkic markers are standalone words, so the patronymic spans two tokens ("Said oglu"). That breaks the current "single-token last/middle" detection rule and requires multi-token patronymic handling — exactly the redistribution complexity kept out of the initial design.
Extension point: the is_patronymic() helper in parser.py is where Turkic detection would join; the flag is already named agnostically (patronymic_name_order) so it can cover this without renaming.
PR #154 contains reference regexes (turkic_patronymic_suffixes + Cyrillic) and example handling that can inform the approach.
Follow-up to #85 / #184 (
patronymic_name_order). That feature handles East-Slavic patronymics (-ovich/-ovna, Latin + Cyrillic) but deliberately left out Turkic patronymic markers common in Azerbaijani and Central-Asian names, e.g.:oglu/ogly("son of"),qizi/kizi/kyzy/gyzy("daughter of"),uly,uulu— plus Cyrillic forms (оглу,кызы,улы…).Why this is a separate, larger piece (not just adding a regex): Slavic patronymics are a suffix on a single token (
Ivanovich), so the existing rule keys off one trailing word. Turkic markers are standalone words, so the patronymic spans two tokens ("Said oglu"). That breaks the current "single-tokenlast/middle" detection rule and requires multi-token patronymic handling — exactly the redistribution complexity kept out of the initial design.Extension point: the
is_patronymic()helper inparser.pyis where Turkic detection would join; the flag is already named agnostically (patronymic_name_order) so it can cover this without renaming.PR #154 contains reference regexes (
turkic_patronymic_suffixes+ Cyrillic) and example handling that can inform the approach.