TRANSTEXT - Complimentary alternative to gettext

Language Packs


Moderator: Forum moderators

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

I've done some testing using your "transtext-0.1.tar.gz"

In a test Puppy, I dropped the "gettext" script into the savefolder as "/usr/bin/gettext", effectively replacing the installed "gettext".
Used an "export lng=fr" line at the beginning of the script.

I then ran FrugalPup v40, which calls "gettext":

1) "gettext" failed with syntax error, unexpected "else", so I commented out the following lines:

Code: Select all

#			else
#				echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
#			fi	
#			unset WORK ALT_WORK ORIGINAL
#			rm -f "$track"/"$lng"-options

2) I got errors about a non-existent file or directory "/usr/share/locale/transtext/fr":
So I manually created it.
But the real problem is that the "transtext" function does not create this directory,
while the actual translation code does.

3) I got confused when confronted with French text in the "gettext" dialog,
so I got "transtext" to do nothing, just return "$1".
(My test case is a little odd because I am translating away from my native language.)

4) The resulting FrugalPup screen contained no text, but if I exited FrugalPup and ran it again,
I got the translated screen.
Problem is that after storing a "translation", nothing is echoed back to the calling script.
I added:

Code: Select all

		WORK=$(cat /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext)
		echo "$WORK"

after the "esac", and it worked as expected.

5) It would probably be easier to use if the check boxes were radio buttons, but yad does not support that.

6) Lots of scripts in Puppy call "gettext", so it became frustrating when they all wanted to be translated.
So I disabled the transtext "gettext" before doing anything else.

PS: I'm working on a "transtext-g", which will be my take on "transtext" for use in the automatic translation of scripts.

Later I still intend to write a utility which translates a ".po" file by simply calling the interactive "transtext" version of "gettext" for each "msgid". It will generate only the ".transtext" files, it will not write a "translated" ".po" file.

User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

Thanks for testing and feedback!
Yes everything you pointed out I 'observed' but didn't tackle. So I merged your adjustments.

Re-Using TEXTDOMAIN might provide two advantages
1) quicker retrieval due to a fuller path reducing search.
2) By specifying a TEXTDOMAIN it would be possible to have editor open only when translating for given TEXTDOMAIN, while other tasks completed in the background.

So it should look like this!
Except for not translating own language en > en, fr > fr ....which cannot depend on system locale as any given script can be in any other language. I thought about this and testing it but in the end decided it's better to self translate and have the language pack, as those strings will get re-used at some point. In the case where $1 is same as $WORK then don't open editor.

So now there needs to be a preference to turn on and off the popup editor, although it does timeout!

transtext as a script

Code: Select all

#!/bin/sh
# Copyright (C) 2023 Marcos Contant (stemsee)
# with valuable input and debugging by gyro
# transtext
# depends on trans script (a google api implementation)and google online translation services
# gpl 3 license for personal use. <http://www.opensource.org/licenses/GPL-3.0/>
# A Unique Arbitrary License applies for occupational usage from Copyright Holder.

function transtext {
	[ ! "$lng" ] && lng=$(locale | grep LANG= | cut -f -d'_') || lng="$lng"
	[[ ! -d /usr/share/locale/transtext/"$lng" ]] && mkdir -p /usr/share/locale/transtext/"$lng"
	unset WORK
			HANDLE=$(echo "$1" | md5sum | awk '{print $1}')
		if [ -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext ]; then
			WORK=$(cat /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext)
			echo "$WORK"
		elif [[ $(type -p trans) != "" ]]; then
			 WORK=$(echo -e "$1" | trans -e google -no-auto -b -tl "$lng" &)
			[ "$WORK" ] && echo "$WORK" || echo "$1"
			[ ! -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext ] && echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext 
		else
				echo "$1"
		fi
}; export -f transtext
export original="$1"
	[[ ! "$lng" ]] && lng=$(echo $LANG | cut -f1 -d'_')
	[ ! -d /usr/share/locale/transtext/"$lng" ] && mkdir -p /usr/share/locale/transtext/"$lng"
	unset WORK
	HANDLE=$(echo "$1" | md5sum | awk '{print $1}')
	if [[ -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext ]]; then
		WORK=$(cat /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext)
		echo "$WORK"
	elif [[ $(type -p trans) != "" ]]; then
		[ ! -d /usr/share/locale/transtext/"$lng" ] && mkdir -p /usr/share/locale/transtext/"$lng"
			export	WORK=$(echo -e "$1" | trans -e google -no-auto -no-warn -b -tl "$lng")
			[ "$alt_lng" ] && export ALT_WORK=$(echo -e "$2" | trans -e google -no-auto -no-warn -b -tl "$alt_lng")
			[[ "$WORK"  != "$1" ]] && OPTIONS=$(yad --title="$(transtext 'Translation Text Editor')" --window-icon="$camino/icons/trans.png" --form --field="$(transtext 'Original')":txt "$original" --field="$(transtext "$lng Translation")":txt "$WORK" --field="$(transtext 'Alternative')":txt "$ALT_WORK" --field="$(transtext 'Accept Translation')":chk "TRUE" --field="$(transtext 'Save un-translated')":chk "FALSE" --field="$(transtext 'Alternative Language Save')":chk "FALSE" --timeout 20 &)
			case "$?" in
70) 	echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
;;
*) ORIGINAL="$(echo "$OPTIONS" | cut -f1 -d'|')"
			WORK="$(echo "$OPTIONS" | cut -f2 -d'|')"
			ALT_WORK="$(echo "$OPTIONS" | cut -f3 -d'|')"
			SAVE="$(echo "$OPTIONS" | cut -f4 -d'|')"
			SAVE_UNTRANSLATED="$(echo "$OPTIONS" | cut -f5 -d'|')"
			alt_save="$(echo "$OPTIONS" | cut -f6 -d'|')"	
				if [ "$SAVE_UNTRANSLATED" = TRUE ]; then
	            echo -e "$ORIGINAL" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
	            fi
				if [ "$ALT_SAVE" = TRUE ]; then
				 echo -e "$ALT_WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
				fi
				if [ "$SAVE" = TRUE ]; then
				 echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
				fi
				if [ "$ORIGINAL" = "" ]; then
				echo ""  > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
				fi
;;
esac
WORK=$(cat /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext)
echo "$WORK"
gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

stemsee wrote: Fri Jun 16, 2023 8:07 pm

Re-Using TEXTDOMAIN might provide two advantages
1) quicker retrieval due to a fuller path reducing search.
2) By specifying a TEXTDOMAIN it would be possible to have editor open only when translating for given TEXTDOMAIN, while other tasks completed in the background.

Having TEXTDOMAIN in the path is a 2-edged sword.
It has the advantages you mentioned, but also has the downside that the translations are no longer shared across different TEXTDOMAIN. Thus producing more translations.
To help towards 1), an old hash-key file trick, could be employed, replace:

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext

with

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:10}"/"${HANDLE:10}".transtext

The transtext files are separated into sub-directories based on the first 10 (or some other number) characters of $HANDLE. The base filename becomes the remainder of $HANDLE.
The full path of any transtext file can still be accurately generated from just $lng and the "msgid".

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

I have attached 'transtext_g-0.1.sfs'.
This contains my take on transtext.

Unfortunately I have ended up significantly "hacking" your code.
Please look on it as suggestions of some alternate possibilities, and one way of implementing them.

1) There is a '/bin/gettext' which provides a mechanism to enable/disable the transtext 'gettext'.
If a flag file '/var/local/use_transtext_gettext' exists, it executes '/sbin/gettext',
otherwise it executes '/usr/bin/gettext'.
The Puppy PATH starts with "/bin:/usr/bin:/sbin:/usr/sbin".
The installed '/usr/bin/gettext' remains available for use.

2) In '/sbin' as well as 'gettext' there is also 'enable-transtext', 'disable-transtext' and 'trans.png'.
(I don't have a "$camino/icons/trans.png")
'disable-transtext' simply deletes the flag file.
'enable-transtext' writes $lng and $txtLng to the flag file.
($txtLng is the language to translate from.)
If 'enable-transtext' is called without any paramater,
$lng is derived from $LANG and $txtLng='en'.
If it is called with a simple parameter, e.g. 'fr',
$lng='fr' and $txtLng='en'.
if it is called with a complex parameter, e.g. 'fr:en'
$lng='en' and $txtLng='fr'
'gettext' now includes the flag file if it exists, thus setting the 2 variables.
So any translation direction can be preset before 'gettext' is called.

I want to be able to run a "setup" script, then run the program(s) I want to translate,
and then run an "un-setup" script before running other Puppy scripts that call 'gettext'.
Using just "export" did not work.
Hmmm...If I used an "AppRun" script it could export some variable(s), run the program.
I might try this as a possible alternate to a flag file.

3) '/sbin/gettext' will now run 'disable-transtext' if it decides that no useful translating can occur.
This includes, if $lng = $txtLng.

4) '/sbin/gettext' always writes an empty string to the transtext file if the "Original" is chosen.
Only the contents of the "$lng Translation" txt field are used, the contents of the 'Original' txt field are ignored.
If an empty string is returned from the transtext file, "$original" is returned.
A translation only occurs if there is no transtext file.

5) 'trans' is called with "${txtLng}:${lng}" rather than "-tl $lng".
Don't rely on google to guess the source language.

6) The support for an alternate language is removed.
(Basicly because I cannot envision a usage.)

7) The "transtext" function does nothing if $txtLng = 'en', the 'gettext' dialog is already in English.
(Helps with my testing.)

8) The dialog uses buttons to select the options instead of chk boxes.
"Cancel", "Original" and "$lng Translation".
The "Cancel" button will 'disable-transtext'.

9) The scripts all use "#!/bin/ash", it's rumoured to be faster.

10) Some formating changes.

Attachments
transtext_g-0.1.sfs
should work if loaded with sfs_load
(4 KiB) Downloaded 38 times
gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

Concerning

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:10}"/"${HANDLE:10}".transtext

it would be better as

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:2}"/"${HANDLE:2}".transtext

or even

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:2}"/"${HANDLE:2:2}"/"${HANDLE:4}".transtext

Since $HANDLE is a hex value, the 2nd one divides it into a theoretical 256 sub-directories,
and the 3rd one divides it into a theoretical 65,536 sub-directories.

For a ridiculus theoretical 16,777,216 sub-directories,

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:2}"/"${HANDLE:2:2}"/"${HANDLE:4:2}"/"${HANDLE:6}".transtext

Concerning the language of the original text:

I want the original language to be tied to a script/program, because the developer knows what language they are using for their text messages.
There is no need for this to be "guessed" by the translator.
There should not be an assumption that the original language is always English.

Programers could be encouraged to "export TEXTLANG=<language-id>" along with TEXTDOMAIN.

A "$TEXTLANG" variable could be stored in "/usr/share/locale/transtext/config/$TEXTDOMAIN.txt".
This provides a mechanism that does not require changing the source code of the appropriate script/program.
This file could be independently created by some "setup" script by a user who knows,
or by using "trans -b -identify $msgid".

In transtext 'gettext':
if it finds that $TEXTLANG is not defined as a variable,
it includes "/usr/share/locale/transtext/config/$TEXTDOMAIN.txt", if the file exists.

If $TEXTLANG is still not defined, simply assume that it is 'en'.

When $TEXTLANG is defined,

Code: Select all

WORK=$(echo -e "$1" | trans -e google -no-auto -no-warn -b "${TEXTLANG}:${lng}")

otherwise

Code: Select all

WORK=$(echo -e "$1" | trans -e google -no-auto -no-warn -b -tl "$lng")
gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

I have attached 'transtext_g-0.2.sfs'.
It implements another few ideas.

1) It implements splitting the transtext files over 2 levels of sub-directories,
with a theoretical 65,536 sub-directories.
When I tested this on the FrugalPup main screen, it worked fine,
but I never got more than 1 file per sub-directory, and only a small number of top level sub-directories contained more that 1 sub-directory.

2) Implements the $TEXTLANG concept.
This introduces a new utility script, 'setup-textlang'.
A simple script that takes 2 parameters and writes a "/usr/share/locale/transtext/config/$TEXTDOMAIN.txt" file.
Of course unless I have something in a non-English language to translate to English,
this doesn't make much difference.
But it does simplify 'enable-transtext' which now only deals with the "$lng" variable.

3) I've moved the transtext files from '/sbin' to '/usr/local/transtext'.
The '/bin/gettext' file is modified to reference the new location.
The utility scripts have symbolic-links to them in '/root/my-applications/bin',
so they are still available via "$PATH".
(I pulled the plug when I noticed I had introduced a 'trans.png' file into '/sbin'.)

4) I have included 'trans' version 0.9.7.1-release in '/root/my-applications/bin',
so the whole thing should work out of the box.

Attachments
transtext_g-0.2.sfs
(48 KiB) Downloaded 37 times
User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

I'll wait for the outcome of your experiments.

It should be noted that "transtext" is essentially a text storage and retrieval system using hashes of the string itself. The trans script itself implements several translation APIs, google is the most functional. And GNU-gettext is a wasteful (only $TEXTDOMAIN has access) text storage and retrieval system with more complexity and interim operations.

The assumption of 'en' has been removed from the code, it relies on google detection.

The complex languages argument is a good idea. Gives the coder more control.

Buttons in the popup are better than checkboxes.

$camino, should be defined or replaced with /usr/share/pixmaps

I added " -download-audio-as /usr/share/locale/transtext/"$lng"/"$HANDLE".mp3 "... I need to research how to let have a keybinding trigger menu or under cursor text to generate the md5sum enabling the retrieval and play of back the translation audio.

But I don't understand the benefit of this

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:2}"/"${HANDLE:2:2}"/"${HANDLE:4}".transtext

as the $HANDLE is generated by the original string to be translated, so it is unique. With your proposal one string could get so many sub-directories ... but for what?

$TEXTDOMAIN is already used by script writers using gettext, it's backwards compatibility. It can be 'used' by transtext gettext to single out returned strings for editing. Apart from that $TEXTDOMAIN would have no other use....no directory gets created using $TEXTDOMAIN as is the case with GNU-gettext.

Last edited by stemsee on Wed Jun 21, 2023 7:36 am, edited 1 time in total.
gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

stemsee wrote: Sun Jun 18, 2023 11:52 am

But I don't understand the benefit of this

Code: Select all

echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"${HANDLE:0:2}"/"${HANDLE:2:2}"/"${HANDLE:4}".transtext

as the $HANDLE is generated by the original string to be translated, so it is unique. With your proposal one string could get so many sub-directories ... but for what?

This is about performance.
If you have 100,000 translated messages for a single language, that's an awful lot of files to house in a single directory.
But using the method above, this can be split over many directories.

As an example:
I have a message with a HANDLE of "0c7ab368606ff1ed99acb06dc1de9428", the normal approach is to store this as,
"/usr/share/locale/transtext/$lng/0c7ab368606ff1ed99acb06dc1de9428.transtext".
But with the new approach it gets stored as,
"/usr/share/locale/transtext/$lng/0c/7a/b368606ff1ed99acb06dc1de9428.transtext".
The first 4 characters have been stripped from the base filename and used as a relative path to the file.
It will share it's diretory, "$lng/0c/7a/", with only other files whose HANDLE starts with "0c7a".

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

I have uploaded 'po2transtext' a script to generae transtext files from the msgid's in a .po file.
It calls '/usr/local/transtext/gettext' to do the translation, and provide the gui.

'po2transtext' takes 3 parameters, a .po file, a from-language and a to-language,e.g.

Code: Select all

./po2transtext frugalpup.po en fr

It's rather basic, but it works.

It's an example of a script that 'export's TEXTLANG.

I ran this against a 'frugalpup.po' and this produced a bit over 240 translations.
The 2 layers of sub-directories worked fine, but are obviously over-kill for this small number of messages.
I could not find any directory with more than 1 transtext file in it.
So for testing it's probably best to start with just 1 layer of sub-directories.
It should not be too difficult to write a script to add an extra layer of sub-directories.

The timeout on the 'gettext' dialog gets in the road a bit when you would like to edit a translation.
After a while I gave up, I just chose "Original" or "Translation", and wait until I've got the "review" utility working.

I've gone off the idea of a complex parameter to 'enable-transtext' in favour of $TEXTLANG.
I could release a version of FrugalPup that exports TEXTLANG='en', and that remains true for everyone, everywhere.
But the $lng variable depends on the requirements of the current user.

It might be useful to have a separate file with some basic functions:
1) Given a msgid, generate a HANDLE.
2) Given a HANDLE, read a transtext file.
3) Given a HANDLE and a msgstr, write a transtext file.
4) Given a msgid, generate a msgstr.
This file would be included into 'gettext', and any other possible scripts.

Attachments
po2transtext.gz
remove fake ".gz" to produce script
(952 Bytes) Downloaded 39 times
User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

gyrog wrote: Mon Jun 19, 2023 10:50 am

I ran this against a 'frugalpup.po' and this produced a bit over 240 translations.
The 2 layers of sub-directories worked fine, but are obviously over-kill for this small number of messages.
I could not find any directory with more than 1 transtext file in it.
So for testing it's probably best to start with just 1 layer of sub-directories.
It should not be too difficult to write a script to add an extra layer of sub-directories

Of the 240 translations how many will be re-used by other apps ... 'File', 'Save As', 'Open'....these examples are so ubiquitous across $TEXTDOMAINS the 'savings' of work over time across millions of servers and devices is undeniable.

The timeout on the 'gettext' dialog gets in the road a bit when you would like to edit a translation.
After a while I gave up, I just chose "Original" or "Translation", and wait until I've got the "review" utility working.

A professional translator and proficient typist (like me :lol: ) finds it adequate, it's really just a visual quick check of the translation, but a button putting it on a review list for later would be useful, and allow a shorter timeout.

I've gone off the idea of a complex parameter to 'enable-transtext' in favour of $TEXTLANG.
I could release a version of FrugalPup that exports TEXTLANG='en', and that remains true for everyone, everywhere.
But the $lng variable depends on the requirements of the current user.

It might be useful to have a separate file with some basic functions:
1) Given a msgid, generate a HANDLE.
2) Given a HANDLE, read a transtext file.
3) Given a HANDLE and a msgstr, write a transtext file.
4) Given a msgid, generate a msgstr.
This file would be included into 'gettext', and any other possible scripts.

Interesting.

I am going to take the google API commands and directly implement them in the gettext function, thus removing the need for the trans script, which is overkill for the purposes at hand.

Ideally if Cambridge, Oxford and other dictionary maintainers, hashed all vocabularies, and entries then a universal retrieval system not requiring translation, would be in place. For a computer department this should be trivial. Even gettext could be modified to use it, and of course it is already native to transtext.

User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

Hello Aliya Kahn. Welcome!

What's your interest in transtext?

stemsee

User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

To easily integrate with GNU-gettext, so as to get a .po file which is compliant with both GNU-gettext (GGT) and transtext-gettext (TGT). It would be necessary to change code in the xgettext source. So that as it searched a script for 'gettext' calls and builds a .po file, we would want to pipe and tee off to TGT for translation and saving. Ideally the transtext hashes would be added to the .po per string as easily distinguishable comments something like #TGT#0556874bcb9894bc3c059c2893e3443b

User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

Updated

removed alt_lng.
adopted buttons instead of checkboxes in pre-save editor.
pre-save editor is active if /tmp/TGT flag is present.
retrieves also audio.
commented out audio playback but function is in place.

The post editor function is complete in my system nonsuch app.

Selects a $lng directory full of $HANDLE.transtext files and $HANDLE.mp3 files.

Shows contents or plays back audio on selection in list.

Allows search by generating hashes from text string ... could be extended to search file contents for string.

Allows easy editing and deleting of .transtext files.

Could conceivably be used to merge transtext to .po and vice versa, but not yet coded.

Just needs to be standalone.

1687823627.png
1687823627.png (153.07 KiB) Viewed 2373 times

Code: Select all

#!/bin/sh
# Transtext Copyright (C) 2023 Marcos Contant (stemsee)
# with input and debugging by gyro
#
# currently depends on trans script (a google/bing/yandex api implementations)and google online translation services
# gpl 3 license for personal use. <http://www.opensource.org/licenses/GPL-3.0/>
# A Unique Arbitrary License applies for occupational usage from Copyright Holder.

function transtext {
	[ ! "$lng" ] && lng=$(locale | grep LANG= | cut -f -d'_') || lng="$lng"
	[[ ! -d /usr/share/locale/transtext/"$lng" ]] && mkdir -p /usr/share/locale/transtext/"$lng"
			unset WORK
			HANDLE=$(echo "$1" | md5sum | awk '{print $1}')
		if [[ -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext ]]; then
			WORK=$(cat /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext)
			echo "$WORK"
		elif [[ $(type -p trans) != "" ]]; then
			 WORK=$(echo -e "$1" | trans -e google -no-auto -b -tl "$lng" -download-audio-as /usr/share/locale/transtext/$lng/$HANDLE.mp3 &)
			[ "$WORK" ] && echo "$WORK" || echo "$1"
			[[  ! -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext ]] && echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext 
		else
				echo "$1"
		fi
}; export -f transtext

function audiofn {
		[[ -f /usr/share/locale/transtext/"$lng/$1".mp3 ]] && mpv /usr/share/locale/transtext/"$lng/$1".mp3 &
}; export -f audiofn

export ORIGINAL="$1"
	[[ ! "$lng" ]] && lng="$(echo $LANG | cut -f1 -d'_')"
	[ ! -d /usr/share/locale/transtext/"$lng" ] && mkdir -p /usr/share/locale/transtext/"$lng"
	unset WORK
	HANDLE=$(echo "$1" | md5sum | awk '{print $1}')
	if [[ -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext ]]; then
		WORK=$(cat /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext)
		echo "$WORK"
			#if [[  -f /usr/share/locale/transtext/"$lng"/"$HANDLE".mp3 ]]; then
				#audiofn $HANDLE
			#else
			#	export	AUDIOWORK=$(echo -e "$1" | trans -e google -no-auto -no-warn -b -tl "$lng" -download-audio-as /usr/share/locale/transtext/$lng/$HANDLE.mp3 &)
				#audiofn $HANDLE
			#fi
		exit
	elif [[ !  -f /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext  ]]; then
	export	WORK=$(echo -e "$1" | trans -e google -no-auto -no-warn -b -tl "$lng" -download-audio-as /usr/share/locale/transtext/$lng/$HANDLE.mp3)

if [[ -f /tmp/TGT ]]; then
 OPTIONS=$(yad --title="$(transtext 'Translation Text Editor') $LANG" --window-icon="/usr/local/transtext/trans.png" --timeout 12 --form \
		--field="$(transtext 'Original')":txt "$1" \
		--field="$(transtext "$lng Translation")":txt "$WORK" \
		--button="$(transtext 'Cancel')"!gtk-cancel:1 --button="$(transtext 'Original')"!gtk-media-previous:6 --button="$(transtext 'Translation')"!gtk-ok:2)
		
		ORIGINAL="$(echo "$OPTIONS" | cut -f1 -d'|')"
		WORK="$(echo "$OPTIONS" | cut -f2 -d'|')"

	STATUS=$?
	if [ "$STATUS" -eq 1 ]; then
		echo "$ORIGINAL"
	else
		case "$STATUS" in
		252) exit;;
			70) echo -e "$WORK" >  /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
				echo "$WORK"
				;;
			2)  	echo -e "$WORK"  > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
				echo "$WORK"
				;;
			6)  echo -e "$ORIGINAL"  > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
				echo "$ORIGINAL"
				;;
		esac
		fi
else
echo -e "$WORK" >  /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
echo "$WORK"
fi
fi
			#if [[  -f /usr/share/locale/transtext/"$lng"/"$HANDLE".mp3 ]]; then
			#audiofn $HANDLE
		#	else
		#		export	WORK=$(echo -e "$1" | trans -e google -no-auto -no-warn -b -tl "$lng" -download-audio-as /usr/share/locale//transtext/$lng/$HANDLE.mp3)
				#audiofn $HANDLE
			#fi
			#else
				#echo -e "$WORK" > /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

going forward maybe the script and function should be named 'transtext' that way users if they want can link transtext to gettext in path, or prepare scripts with $(transtext "$TEXT") instead of gettext. Any thoughts?

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

stemsee wrote: Fri Jul 07, 2023 11:11 am

going forward maybe the script and function should be named 'transtext' that way users if they want can link transtext to gettext in path, or prepare scripts with $(transtext "$TEXT") instead of gettext. Any thoughts?

This would make things a lot less daunting for testers.
There are a lot of scripts in Puppy that run "gettext", many times. Simply replacing the normal "gettext" program with transtext as "gettext" could be rather daunting for a first-time tester.
During the testing phase, there needs to be a way to enable and disable transtext.
I intend to develop some utilities to enable "priming the pump" with transtext translations before trying a full scale replacement.

Changing subject to the performance of the translation store:
Since transtext stores all the translations for a single language in a single directory, there could be performance issues for that directory if all of a Puppy was translated with transtext. (I have already mentioned a method of alleviating this.)
But it seems that it depends on the filesystem that the transtext files are stored on.
If ext4 is used, a directory should be able to cope readily with random access to many thousands of files, because it indexes directories.
(Listing all entries in such a directory will still be slow, but transtext does not do that.)
Filesystems that don't index directories, but depend on linear searches, could struggle with a complete transtext load.
I'm using ext4 so I'm not going to worry about that.
(I still intend to write a utility to add subdirectories to the transtext file store.)

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

Suggestion:
Replace

Code: Select all

HANDLE=$(echo "$1" | md5sum | awk '{print $1}')

with

Code: Select all

HANDLE=$(echo -en "$1" | md5sum | awk '{print $1}')

Why?
For the case where the same text is being translated, except that one contains a "\n" sequence,
and the other contains a real newline character instead; both end up with the same "transtext" handle.
And to provide consistency with possible future implementations in programminng languages.

I've been playing with programming with Python3 in general, and implementing transtext in particular.
With this Python program:

Code: Select all

#!/usr/bin/env python3
from hashlib import md5
hstr = "hello\nworld"
print(hstr)
print(md5(hstr.encode('utf-8')).hexdigest())

I get this output:

Code: Select all

hello
world
9195d0beb2a889e1be05ed6bb1954837

With this script:

Code: Select all

#!/bin/bash
hstr="hello\nworld"
echo "$hstr"
echo "$hstr" | md5sum | cut -f1 -d' '
echo -en "$hstr" | md5sum | cut -f1 -d' '

I get this output:

Code: Select all

hello\nworld
bb05e0a9587549a57c1d797f6bdf7701
9195d0beb2a889e1be05ed6bb1954837

The point is that only the "echo -en" script md5sum matches the Python md5sum.

What's going on?
1) Python, like many programming languages, embeds a real newline character in a string when a "\n" is specified.
Bash does not, as demostrated by the different output from "printing" the variable.
2) By default "echo" appends a newline character to any string before outputing it.
So, "-e" ensures that any "\n" sequences are changed to real newline characters,
and "-n" stops the appending of a newline character.

Note: I get the same result with Perl as with Python.

User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

In some countries google translate is not available, bing is. Still better would be to find another translation service.

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

To implement TransText in python, I use this module:https://pypi.org/project/deep-translator/.
It supports many translation services, some of which require a free "key".
The documentatioin for this module has some information on the various services, (as well as how to use them in python code).
I have only used it with google translator.

An interesting thing about the package, is that it provides direct command line access to using it,
although I suspect that this still needs a python interpreter to be available.
Althouigh there are supposed to be ways of producing a real binary from python, but I haven't sorted this out yet.

I haven't done any thing on this project for a while, and I am currently sidetracked.

But I'm still Ok with helping you to bring this thing to fruition, if I can.

User avatar
stemsee
Posts: 656
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 0
Has thanked: 160 times
Been thanked: 104 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by stemsee »

Just thought of an idea to allow manual additions to local transtext hashed content. Simply writing lists of words or strings of text and having the function hash each line and save to transtext, for local creation and retrieval, without online translation.

Ultimately there should be a universal transtext database and retrieval system with each string translated once for everyone.

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

@stemsee,
Bold concept.

The repository might have to get rather large before it was seriously useful.
(The number of possible sentences in English alone, is virtually endless.
So the translators would need to concentrate on "common" sentences and phrases.)

The logistics of someone doing this on their own computer, not a big issue.
The logistics of creating a single data-base "in the cloud", a big challange.
But it's not impossible, after-all "Wikipedia" works, actually their model of allowing folk to update the content, might work.
Probably need web pages to submit/update the stored translations.
And an API for software like 'gettext' to access it, which includes a local cache.

It would have to be very reliable, for people to use it.
The implementation would need to consider "large-scale" storage, not just md5 keys in a single directory.
The hashing function would need to be fast with a minimum chance of "collisions",
but the surrounding software would need to cater for the case of possible "collisions".
Maybe look at data-base software.
Use a local server that queries a remote server if it doesn't have the answer,
a bit like DNS servers.

If the project suceeds, it may be necessary to completely re-invent the backend, i.e. up-scale.
To this end, it would help if the "transtext" repository coud be walked to retrieve key,translation pairs.
(Where a translation might be, <lang>=<translated text>)
i.e. "walk" the keys, and retrieve each tanslation.
This could then be stored in a new repository using completely different hashing/storage technologies.

Again, could "prime the pump" with a utility that could accept translated ".po" files.

gyrog
Posts: 594
Joined: Thu Oct 01, 2020 8:17 am
Location: Australia
Has thanked: 14 times
Been thanked: 180 times
Contact:

Re: TRANSTEXT - Complimentary alternative to gettext

Post by gyrog »

A posssible way to store the "msgid" so they can be walked:

Let's assume that the current md5 for hash, and single directory for each language is ok.

The translated messages, ("msgstr") are stored in:
/usr/share/locale/transtext/"$lng"/"$HANDLE".transtext

But it would be possible to also store the "msgid" in:
/usr/share/locale/transtext/keys/"$HANDLE".transtext
i.e. like having lng='keys'

To "walk" the repository:

Code: Select all

for each file in /usr/share/locale/transtext/keys/:
    extract $HANDLE from the filename
    extract $msgid from the contents of the file, (and possibly also $TEXTLANG)
    for each "$lng" in /usr/share/locale/transtext/"$lng" (but not "keys"):
        if /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext exists:
          process the "$lng", "$msgid", "$msgstr" data. (e.g. re-hash to get the new $HANDLE, and add to new repository)
          (e.g. It could be used to convert a transtext repository using md5 to one using blake3.)

Notes:

1) While '"$HANDLE".transtext' can appear many times, once in each "$lng" directory,
It only needs to appear once in "keys" because there can be only one "msgid".
(And the implementation does not support any more.)

2) When writting a new translation to /usr/share/locale/transtext/"$lng"/"$HANDLE".transtext
if /usr/share/locale/transtext/keys/"$HANDLE".transtext does not exist:
write "msgid" to /usr/share/locale/transtext/keys/"$HANDLE".transtext

3) /usr/share/locale/transtext/keys/"$HANDLE".transtext is not used in normal gettext procesing.

4) This also means that all the translations for a particular "$lng" could be reviewed.
For each '"$HANDLE".transtext' file in /usr/share/locale/transtext/"$lng":
find the "msgid" for the current "msgstr" by extracting the $HANDLE from the filename,
and reading /usr/share/locale/transtext/keys/"$HANDLE".transtext.
(I'm not sure if this would be useful.)

5) If it were deemed useful, "$lng=$msgid" could be stored in /usr/share/locale/transtext/keys/"$HANDLE".transtext.
So the original language ($TEXTLANG) for the "msgid" could also be retrieved.

Post Reply

Return to “Internationalization”