Abiword and Geany character coding issues (SOLVED)

Issues and / or general discussion relating to Puppy

Moderator: Forum moderators

User avatar
MochiMoppel
Posts: 1231
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 21 times
Been thanked: 436 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by MochiMoppel »

For those readers who wonder what this thread was all about and want to verify some of the claims that were made, I have created a demo file with part of the text used by the OP. I also attach a better readable screenshot which hopefully makes clear that the "gibberish" contains valuable information, namely the hex value of the character that can't be interpreted by the reading application.

When removing the fake gz extension and allowing Abiword or Geany to open the file with automatic encoding detection, 3 of the characters will appear as blocks with a number in it. I briefly checked with other applications and can confirm that Leafpad can read the file correctly. Worse than Abiword and Geany were the commandline editors e3 and mp who would either skip the three problematic characters or replace them with useless ASCII characters like ~V.

Worst of all was LibreOffice 4.3.0.4. Unlike all other applications LibreOffice Writer failed with the international characters in the demo file. I couldn't even find an option to select an encoding. Maybe a newer version works better.

The demo file also contains some explanations why I claim that the screenshot presented by the OP is a WINDOWS encoded file and can not be an ISO or IBM file.

WINDOWS-1252-demo.txt.fake.gz
(2.22 KiB) Downloaded 26 times
WINDOWS_encoded.png
WINDOWS_encoded.png (42.57 KiB) Viewed 510 times
User avatar
puppy_apprentice
Posts: 691
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 5 times
Been thanked: 115 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by puppy_apprentice »

I have from time to time problem with polish text files (especially made on Windows, sometimes Geany don't show anything like for binary files) in Geany and AbiWord. My solution is to open those files with leafpad and save them again. After this Geany and AbiWord opens those files without problems.

User avatar
greengeek
Posts: 1383
Joined: Thu Jul 16, 2020 11:06 pm
Has thanked: 534 times
Been thanked: 192 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by greengeek »

MochiMoppel wrote: Sat May 01, 2021 9:10 am

When removing the fake gz extension and allowing Abiword or Geany to open the file with automatic encoding detection, 3 of the characters will appear as blocks with a number in it.

Here is the comparison in my retouched Tahr 6.0.6 :

WINDOWS-1252-demo_TahrGeany.jpg
WINDOWS-1252-demo_TahrGeany.jpg (118.63 KiB) Viewed 501 times
WINDOWS-1252-demo_TahrLeafpad.jpg
WINDOWS-1252-demo_TahrLeafpad.jpg (115.64 KiB) Viewed 501 times
User avatar
amethyst
Posts: 2414
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 504 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by amethyst »

why I claim that the screenshot presented by the OP is a WINDOWS encoded file and can not be an ISO or IBM file

I claim you are talking bullshit. Cheers.

Attachments
Screenshot.jpg
Screenshot.jpg (147.61 KiB) Viewed 493 times
User avatar
JakeSFR
Posts: 276
Joined: Wed Jul 15, 2020 2:23 pm
Been thanked: 159 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by JakeSFR »

MochiMoppel wrote:

Worst of all was LibreOffice 4.3.0.4. Unlike all other applications LibreOffice Writer failed with the international characters in the demo file. I couldn't even find an option to select an encoding. Maybe a newer version works better.

LO-6.2.8.1: 'File -> Open -> Text - Choose Encoding (from the long list at the bottom right corner)'.
___________

puppy_apprentice wrote:

I have from time to time problem with polish text files (especially made on Windows, sometimes Geany don't show anything like for binary files) in Geany and AbiWord. My solution is to open those files with leafpad and save them again. After this Geany and AbiWord opens those files without problems.

I looked into Leafpad's source and it takes into account the current locale (LANG), when choosing which codepage to use (see: src/encoding.c, get_encoding_code()).
So, if LANG is e.g. en_US-UTF.8 instead of pl_PL.UTF-8, Leafpad also fails to display Polish (non-UTF8) text correctly.
Seems like Geany and LO/Abiword don't use that method...

Another solution (that I use in Geany) is to 'Reload As -> East European -> Central European (WINDOWS-1250) [or ISO-8859-2]' and then 'Document -> Set Encoding -> Unicode -> Unicode (UTF-8)', and save it.
___________

amethyst wrote:

why I claim that the screenshot presented by the OP is a WINDOWS encoded file and can not be an ISO or IBM file

I claim you are talking bullshit. Cheers.
Image

This screenshot actually suggests that it's not ISO-8859-1 encoding.
If it really was, the text would not contain those faulty characters.
It only means that Geany *thinks* it's ISO-8859-1 and interprets it as such.

Anyway, here's some random, online charset detector: https://nlp.fi.muni.cz/projects/chared/
It correctly detects Mochi's CP1252 demo, as well as CP1250 and ISO-8859-2 encoded Polish text files.
You could submit there a relevant snippet and see what it says.

Greetings!

[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
User avatar
amethyst
Posts: 2414
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 504 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by amethyst »

This screenshot actually suggests that it's not ISO-8859-1 encoding.
If it really was, the text would not contain those faulty characters.
It only means that Geany *thinks* it's ISO-8859-1 and interprets it as such.

Reloading the file as Windows-1252= No gibberish (displays correctly)
Reloading file as ISO-88589-1= Gibberish
So we must conclude that Abiword and Geany do not identify the encoding (or non-encoding) correctly. That's what my point was with this thread from the start. ROX identifies the file as non-iso extended ASCII English text which is probably what it is supposed to be.

User avatar
MochiMoppel
Posts: 1231
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 21 times
Been thanked: 436 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by MochiMoppel »

amethyst wrote: Sat May 01, 2021 12:34 pm

Reloading file as ISO-88589-1= Gibberish

In other words: ISO-88589-1 = 🐂💩
Good, you are making progress.
 

amethyst wrote: Sat May 01, 2021 12:34 pm

ROX identifies the file as non-iso extended ASCII English text which is probably what it is supposed to be.

Not "probably", it's exactly what ROX says: a non-iso extended ASCII. You quoted this ROX message earlier and it should have rang a bell.
 

amethyst wrote: Sat May 01, 2021 11:01 am

I claim you are talking bullshit.

You may want to rephrase your embarrassing comment. It's never too late to admit a mistake.

User avatar
amethyst
Posts: 2414
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 504 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by amethyst »

What mistake? Geany says it's ISO whatever. You said I must give you evidence of this and I did. Go fight with the Geany and Abiword people about their crap applications which can't recognize charsets correctly . Besides, we have seen many embarrassing moments from YOU on this thread. Gems like: "these are not apostrophes" and not knowing the difference between ANSI and ASCII. Cheers.

User avatar
amethyst
Posts: 2414
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 504 times

Re: Abiword and Geany character coding issues (SOLVED)

Post by amethyst »

Hi, rockedge. Can you lock this thread now please. Thanks.

Locked

Return to “Users”