Page 2 of 2
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 9:10 am
by MochiMoppel
For those readers who wonder what this thread was all about and want to verify some of the claims that were made, I have created a demo file with part of the text used by the OP. I also attach a better readable screenshot which hopefully makes clear that the "gibberish" contains valuable information, namely the hex value of the character that can't be interpreted by the reading application.
When removing the fake gz extension and allowing Abiword or Geany to open the file with automatic encoding detection, 3 of the characters will appear as blocks with a number in it. I briefly checked with other applications and can confirm that Leafpad can read the file correctly. Worse than Abiword and Geany were the commandline editors e3 and mp who would either skip the three problematic characters or replace them with useless ASCII characters like ~V.
Worst of all was LibreOffice 4.3.0.4. Unlike all other applications LibreOffice Writer failed with the international characters in the demo file. I couldn't even find an option to select an encoding. Maybe a newer version works better.
The demo file also contains some explanations why I claim that the screenshot presented by the OP is a WINDOWS encoded file and can not be an ISO or IBM file.
- WINDOWS_encoded.png (42.57 KiB) Viewed 513 times
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 9:18 am
by puppy_apprentice
I have from time to time problem with polish text files (especially made on Windows, sometimes Geany don't show anything like for binary files) in Geany and AbiWord. My solution is to open those files with leafpad and save them again. After this Geany and AbiWord opens those files without problems.
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 9:50 am
by greengeek
MochiMoppel wrote: ↑Sat May 01, 2021 9:10 am
When removing the fake gz extension and allowing Abiword or Geany to open the file with automatic encoding detection, 3 of the characters will appear as blocks with a number in it.
Here is the comparison in my retouched Tahr 6.0.6 :
- WINDOWS-1252-demo_TahrGeany.jpg (118.63 KiB) Viewed 504 times
- WINDOWS-1252-demo_TahrLeafpad.jpg (115.64 KiB) Viewed 504 times
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 11:01 am
by amethyst
why I claim that the screenshot presented by the OP is a WINDOWS encoded file and can not be an ISO or IBM file
I claim you are talking bullshit. Cheers.
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 12:21 pm
by JakeSFR
MochiMoppel wrote:Worst of all was LibreOffice 4.3.0.4. Unlike all other applications LibreOffice Writer failed with the international characters in the demo file. I couldn't even find an option to select an encoding. Maybe a newer version works better.
LO-6.2.8.1: 'File -> Open -> Text - Choose Encoding (from the long list at the bottom right corner)'.
___________
puppy_apprentice wrote:I have from time to time problem with polish text files (especially made on Windows, sometimes Geany don't show anything like for binary files) in Geany and AbiWord. My solution is to open those files with leafpad and save them again. After this Geany and AbiWord opens those files without problems.
I looked into Leafpad's source and it takes into account the current locale (LANG), when choosing which codepage to use (see: src/encoding.c, get_encoding_code()).
So, if LANG is e.g. en_US-UTF.8 instead of pl_PL.UTF-8, Leafpad also fails to display Polish (non-UTF8) text correctly.
Seems like Geany and LO/Abiword don't use that method...
Another solution (that I use in Geany) is to 'Reload As -> East European -> Central European (WINDOWS-1250) [or ISO-8859-2]' and then 'Document -> Set Encoding -> Unicode -> Unicode (UTF-8)', and save it.
___________
amethyst wrote:why I claim that the screenshot presented by the OP is a WINDOWS encoded file and can not be an ISO or IBM file
I claim you are talking bullshit. Cheers.
This screenshot actually suggests that it's not ISO-8859-1 encoding.
If it really was, the text would not contain those faulty characters.
It only means that Geany *thinks* it's ISO-8859-1 and interprets it as such.
Anyway, here's some random, online charset detector: https://nlp.fi.muni.cz/projects/chared/
It correctly detects Mochi's CP1252 demo, as well as CP1250 and ISO-8859-2 encoded Polish text files.
You could submit there a relevant snippet and see what it says.
Greetings!
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 12:34 pm
by amethyst
This screenshot actually suggests that it's not ISO-8859-1 encoding.
If it really was, the text would not contain those faulty characters.
It only means that Geany *thinks* it's ISO-8859-1 and interprets it as such.
Reloading the file as Windows-1252= No gibberish (displays correctly)
Reloading file as ISO-88589-1= Gibberish
So we must conclude that Abiword and Geany do not identify the encoding (or non-encoding) correctly. That's what my point was with this thread from the start. ROX identifies the file as non-iso extended ASCII English text which is probably what it is supposed to be.
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sat May 01, 2021 12:55 pm
by MochiMoppel
amethyst wrote: ↑Sat May 01, 2021 12:34 pmReloading file as ISO-88589-1= Gibberish
In other words: ISO-88589-1 =
Good, you are making progress.
amethyst wrote: ↑Sat May 01, 2021 12:34 pmROX identifies the file as non-iso extended ASCII English text which is probably what it is supposed to be.
Not "probably", it's exactly what ROX says: a non-iso extended ASCII. You quoted this ROX message earlier and it should have rang a bell.
amethyst wrote: ↑Sat May 01, 2021 11:01 amI claim you are talking bullshit.
You may want to rephrase your embarrassing comment. It's never too late to admit a mistake.
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sun May 02, 2021 5:46 am
by amethyst
What mistake? Geany says it's ISO whatever. You said I must give you evidence of this and I did. Go fight with the Geany and Abiword people about their crap applications which can't recognize charsets correctly . Besides, we have seen many embarrassing moments from YOU on this thread. Gems like: "these are not apostrophes" and not knowing the difference between ANSI and ASCII. Cheers.
Re: Abiword and Geany character coding issues (SOLVED)
Posted: Sun May 02, 2021 8:13 am
by amethyst
Hi, rockedge. Can you lock this thread now please. Thanks.