Can you please split the thread @rockedge so we don't polute it with this?
MochiMoppel wrote: ↑Sun Jul 18, 2021 2:20 ammisko_2083 wrote: ↑Fri Jul 16, 2021 6:35 pmOur language is completely vocalized with 30 sounds and 30 Cyrilic characters.
However we also have a Latin script with 27 chars and 3 diagraphs.Serbian?
Sometimes people use English keyboards that don't have some letters (or lack a different keyborad input, or too lazy to make a switch).
Instead they type the closest match and the software transliterates to a completely different letter in Cyrilic.I assume that the "software" is not iconv and that there is no codepage around that can convert Basic Latin (ASCII) to Cyrillic equivalents, e.g. changing an ASCII L to a Cyrillic Л. On the other side a more or less sophisticated search/replace script should be able to do it, starting with the diagraphs and leaving only ambiguous characters like Z (could be Ж or З ) for manual correction. Don't know if this describes your task, it's just my imagination. A real life example would help.
Serbian language.
lj, nj, and dž are diagraphs, though sometimes people write dj https://en.wikipedia.org/wiki/Novak_Djokovic
instead of đ which may cause confusion with words written in ijekavian dialect.
The characters missing in english alphabet are š đ č ć ž
People sometimes type
s for s and š, с and ш
z for z and ž, з and ж
c for c, ć, č ц, ћ, ч
example:
Mozemo li da idemo na rucak?
When transliterated:
Моземо ли да идемо на руцак?
How it should be:
Možemo li da idemo na ručak?
Можемо ли да идемо на ручак?
Perhaps it's easier to fix the latin text and transliterate.
That Fred's (@fredx181) copy-code-paste-from-clipboard script with yad UI would be ideal for this.
koze, kože
Cyrillic - Latin - English / explanation
з, ж - z, ž
козе - koze - goats
коже - kože - skins, leathers
kuce, kuće, kuče
Cyrillic - Latin - English / explanation
ц, ч, ћ - c, č, ć
куче - kuče - dog
куце - kuce - doggies / diminutive of the word kuče is 'куца - kuca'; in plural 'куце - kuce'
куће - kuće - house / plural of the word 'кућа - kuća'
Something like this would change the characters
Code: Select all
#!/bin/bash
declare -A LO_J=(["ј"]="j")
declare -A UP_J=(["Ј"]="J")
declare -A LO_SR_CYR_TO_LAT_DICT=(
["а"]="a"
["б"]="b"
["в"]="v"
["г"]="g"
["д"]="d"
["ђ"]="đ"
["е"]="e"
["ж"]="ž"
["з"]="z"
["и"]="i"
["к"]="k"
["л"]="l"
["љ"]="lj"
["м"]="m"
["н"]="n"
["њ"]="nj"
["о"]="o"
["п"]="p"
["р"]="r"
["с"]="s"
["т"]="t"
["ћ"]="ć"
["у"]="u"
["ф"]="f"
["х"]="h"
["ц"]="c"
["ч"]="č"
["џ"]="dž"
["ш"]="š"
)
declare -A UP_SR_CYR_TO_LAT_DICT=(
["А"]="A"
["Б"]="B"
["В"]="V"
["Г"]="G"
["Д"]="D"
["Ђ"]="Đ"
["Е"]="E"
["Ж"]="Ž"
["З"]="Z"
["И"]="I"
["К"]="K"
["Л"]="L"
["Љ"]="Lj"
["М"]="M"
["Н"]="N"
["Њ"]="Nj"
["О"]="O"
["П"]="P"
["Р"]="R"
["С"]="S"
["Т"]="T"
["Ћ"]="Ć"
["У"]="U"
["Ф"]="F"
["Х"]="H"
["Ц"]="C"
["Ч"]="Č"
["Џ"]="Dž"
["Ш"]="Š"
)
string="абвгдђежзијклљмнњопрстћуфхцчџш"
echo "String: $string"
for letter in "${!LO_SR_CYR_TO_LAT_DICT[@]}"; do
string="${string//$letter/${LO_SR_CYR_TO_LAT_DICT[$letter]}}"
done
# j must be replaced last because of the diagraphs lj,nj
string="${string//${!LO_J[@]}/${LO_J[@]}}"
echo "Romanaised: $string"
for letter in "${!LO_SR_CYR_TO_LAT_DICT[@]}"; do
string="${string//${LO_SR_CYR_TO_LAT_DICT[$letter]}/$letter}"
done
# j must be replaced last because of the diagraphs lj,nj
string="${string//${LO_J[@]}/${!LO_J[@]}}"
echo "Cyrillic: ${string}"
Character per character is easy but there are strings I would not want to transliterate like web and email addresses.
https://forum.puppylinux.com transliterates to хттпс://форум.пуппyлинуx.цом
Hm, just thinking, ih I could only think of an easy way to select the words in a text which should be reverted back to latin or overall skip conversion to latin.
Mouse click would be ideal, maybe a combo box or a popup list dialog?