Glitches in oldforum.puppylinux

Ideas and discussion


Post Reply
User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Glitches in oldforum.puppylinux

Post by MochiMoppel »

@rockedge Unicode characters are not rendered properly (see the post titles and messages of the French and German subforums). Is this fixable?

In the original Murga forum only the last forum "Truly off-topic conversations" was for members only, now also "Programming" and "Security" need a login. These forums should be open to the public and should still be accessible for Google links.

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel I am not sure yet! I noticed this and it occurred during the actual conversion. Even though I specifically set everything in UTF8-unicode-ci.

Looks like I need to see if I can do something to fix them all at once....or we're going to have to fix them manually.

The conversion tool did it somehow.

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

@rockedge I think I understand now what happened and I doubt that the conversion can be reversed.
Your conversion tool did not treat the source text as Unicode. It interpreted each single byte as an Extended ASCII character (apparently codepage 1252). That sounds very odd but I found this pattern even for converted Japanese characters.

Here an example from a French post:
française
Should render 2-byte character ç (Unicode U+00E7 hex \xc3\xa7)
Tool interpreted each of the 2 bytes as the hex value of a single byte character:
\xc3 => Ã
\xa7 => §

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel there is a solution and we're going to have to script this ourselves. What if we wrote a PHP program that looks at the topic and post tables, identifies and locates the incorrectly encoded characters by directly accessing the phpBB database? The program would scan those tables looking for the known patterns and or identify the encoding and swap in the correction. I think it can be done, a fix tool we create. Something we can research and with some help I'll probably be able to write such a tool.

Well, at least give it a try. I have spare databases that are exact copies of the converted one, so mistakes can be made and recovered from without involving the production version

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

rockedge wrote: Tue Apr 13, 2021 7:07 pm

@MochiMoppel there is a solution and we're going to have to script this ourselves. What if we wrote a PHP program that looks at the topic and post tables, identifies and locates the incorrectly encoded characters by directly accessing the phpBB database?

I'm afraid that only 95 ASCII7 + 3 control characters = 98 characters survived the conversion, which means that even if we exclude the majority of Unicode characters (Asian and those that can't be displayed by DejaVus Sans ) this still leaves thousands of patterns, and these patterns may even overlap (e.g. ㉇ and ㉠may represent 2 different characters)

The program would scan those tables looking for the known patterns and or identify the encoding and swap in the correction. I think it can be done, a fix tool we create.

With abundant time nothing is impossible, but is this realistic? You would need a huge database of potential patterns and even then I see a risk of wrecking the database. I understand that it is tempting to focus only on a few characters. E.g. in German language there are only 7 characters affected and probably a few more in French and Spanish, but this would fix only the most frequent glitches and they are not even serious, because the correct characters can be guessed from context. More serious are those that can't be guessed or - heaven forbid - appear in code listings.

I have spare databases that are exact copies of the converted one, so mistakes can be made and recovered from without involving the production version

Do you have a copy of the original one? Honestly I think the cleanest and probably fastest approach would be to redo the conversion. Since the Murga forum is frozen in time this would not be urgent and can be done whenever you have time. Feasible?

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

MochiMoppel wrote:

probably fastest approach would be to redo the conversion

Yes I do have the original database. How else would I have done this with out it? The conversion takes days to do and in 3-4 separate conversions going up a chain of versions. I have to go in steps.

But it is possible and once most of the glitches are under control I will perform another attempt. Just uploading the mysql dump takes a lot of time. The import is touchy because of the size of it all.

I would need to look at the converter tool to change some parameter to ensure UTF8-unicode

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

rockedge wrote: Wed Apr 14, 2021 5:14 am

Yes I do have the original database. How else would I have done this with out it?

Lost in translation. You made a point that you have copies of the "converted" database, which I understood being the database after conversion, which wouldn't suffice and made me worry if you still have access to the original. Never mind, it's clear now.

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

the other problem is the original forum version runs on PHP 5.4 and already by PHP 5.9 that no longer is the case. So the first step is from 2.0.21 -> 2.0.23 and the "oldest" web server I have on hand ready to go is PHP 5.9. Of course 2.0.21 does not run on 5.9. So the conversion starts and each next version the database structure is changing. So basically it's on server #1 for 2.0.21 -> 2.0.23 -> 3.0.7pl1 -> 3.0.14

Now that newly converted database and entire file system needs to be transferred to the next web server on a totally different machine.

So now on web server #2 with a Hiawatha web server and PHP 7.2, I take 3.0.14 (which does not run cleanly on PHP 7) and convert it to 3.3.3 which took several tries. As a matter of fact, all of the steps took multiple tries with mistakes and corrections and improvements

Importing the database from machine #1 to machine #2 took 27 hours and that's doing it directly from the command line.

Then comes the kicker......transferring the 3.3.3 database to the host in England. No command line possible so I run a PHP mysql importer that does it in stages. Also hours and hours. Each step the slightest error the smallest mistake will crash the conversion.

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

😱Looks like your initial idea is more realistic.

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel I sent you a PM! I need some of your expertise to help write an Apache URL rewrite.

And HELLO everyone! You might see some weird errors as I am now on the murga VPS and working on some experiments to keep Links working throughout this transition

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel

This is what we need to do I think: detecting-utf8-broken-characters-in-mysql

this is a beginning of what looks promising.

Code: Select all

update `table` set `field` = replace(`field`, 'â€', '"');
update `table` set `field` = replace(`field`, '–', '–');
update `table` set `field` = replace(`field`, '•', '-');
update `table` set `field` = replace(`field`, '“', '"');

update `table` set `field` = replace(`field`, '¡', '¡');
update `table` set `field` = replace(`field`, '¢', '¢');
update `table` set `field` = replace(`field`, '£', '£');
update `table` set `field` = replace(`field`, '¤', '¤');
update `table` set `field` = replace(`field`, 'Â¥', '¥');
update `table` set `field` = replace(`field`, '¦', '¦');
update `table` set `field` = replace(`field`, '§', '§');
update `table` set `field` = replace(`field`, '¨', '¨');
update `table` set `field` = replace(`field`, '©', '©');
update `table` set `field` = replace(`field`, 'ª', 'ª');
update `table` set `field` = replace(`field`, '«', '«');
update `table` set `field` = replace(`field`, '¬', '¬');
# This one looks like it's missing a character, but it's there. 0xad
update `table` set `field` = replace(`field`, '­', '­');
update `table` set `field` = replace(`field`, '®', '®');
update `table` set `field` = replace(`field`, '¯', '¯');
update `table` set `field` = replace(`field`, '°', '°');
update `table` set `field` = replace(`field`, '±', '±');
update `table` set `field` = replace(`field`, '²', '²');
update `table` set `field` = replace(`field`, '³', '³');
update `table` set `field` = replace(`field`, '´', '´');
update `table` set `field` = replace(`field`, 'µ', 'µ');
update `table` set `field` = replace(`field`, '¶', '¶');
update `table` set `field` = replace(`field`, '·', '·');
update `table` set `field` = replace(`field`, '¸', '¸');
update `table` set `field` = replace(`field`, '¹', '¹');
update `table` set `field` = replace(`field`, 'º', 'º');
update `table` set `field` = replace(`field`, '»', '»');
update `table` set `field` = replace(`field`, '¼', '¼');
update `table` set `field` = replace(`field`, '½', '½');
update `table` set `field` = replace(`field`, '¾', '¾');
update `table` set `field` = replace(`field`, '¿', '¿');

update `table` set `field` = replace(`field`, 'À', 'À');
# This one looks like it's missing a character, but it's there. 0x81
update `table` set `field` = replace(`field`, 'Ã', 'Á');
update `table` set `field` = replace(`field`, 'Â', 'Â');
update `table` set `field` = replace(`field`, 'Ã', 'Ã');
update `table` set `field` = replace(`field`, 'Ä', 'Ä');
update `table` set `field` = replace(`field`, 'Ã…', 'Å');
update `table` set `field` = replace(`field`, 'Æ', 'Æ');
update `table` set `field` = replace(`field`, 'Ç', 'Ç');
update `table` set `field` = replace(`field`, 'È', 'È');
update `table` set `field` = replace(`field`, 'É', 'É');
update `table` set `field` = replace(`field`, 'Ê', 'Ê');
update `table` set `field` = replace(`field`, 'Ë', 'Ë');
update `table` set `field` = replace(`field`, 'ÃŒ', 'Ì');
# This one looks like it's missing a character, but it's there. 0x8d
update `table` set `field` = replace(`field`, 'Ã', 'Í');
update `table` set `field` = replace(`field`, 'ÃŽ', 'Î');
# This one looks like it's missing a character, but it's there. 0x8f
update `table` set `field` = replace(`field`, 'Ã', 'Ï');
# This one looks like it's missing a character, but it's there. 0x90
update `table` set `field` = replace(`field`, 'Ã', 'Ð');
update `table` set `field` = replace(`field`, 'Ñ', 'Ñ');
update `table` set `field` = replace(`field`, 'Ã’', 'Ò');
update `table` set `field` = replace(`field`, 'Ó', 'Ó');
update `table` set `field` = replace(`field`, 'Ô', 'Ô');
update `table` set `field` = replace(`field`, 'Õ', 'Õ');
update `table` set `field` = replace(`field`, 'Ö', 'Ö');
update `table` set `field` = replace(`field`, '×', '×');
update `table` set `field` = replace(`field`, 'Ø', 'Ø');
update `table` set `field` = replace(`field`, 'Ù', 'Ù');
update `table` set `field` = replace(`field`, 'Ú', 'Ú');
update `table` set `field` = replace(`field`, 'Û', 'Û');
update `table` set `field` = replace(`field`, 'Ãœ', 'Ü');
# This one looks like it's missing a character, but it's there. 0x9d
update `table` set `field` = replace(`field`, 'Ã', 'Ý');
update `table` set `field` = replace(`field`, 'Þ', 'Þ');
update `table` set `field` = replace(`field`, 'ß', 'ß');
update `table` set `field` = replace(`field`, 'Ã ', 'à');
update `table` set `field` = replace(`field`, 'á', 'á');
update `table` set `field` = replace(`field`, 'â', 'â');
update `table` set `field` = replace(`field`, 'ã', 'ã');
update `table` set `field` = replace(`field`, 'ä', 'ä');
update `table` set `field` = replace(`field`, 'Ã¥', 'å');
update `table` set `field` = replace(`field`, 'æ', 'æ');
update `table` set `field` = replace(`field`, 'ç', 'ç');
update `table` set `field` = replace(`field`, 'è', 'è');
update `table` set `field` = replace(`field`, 'é', 'é');
update `table` set `field` = replace(`field`, 'ê', 'ê');
update `table` set `field` = replace(`field`, 'ë', 'ë');
update `table` set `field` = replace(`field`, 'ì', 'ì');
# This one looks like it's missing a character, but it's there. 0xad
update `table` set `field` = replace(`field`, '­­Ã', 'í');
update `table` set `field` = replace(`field`, 'î', 'î');
update `table` set `field` = replace(`field`, 'ï', 'ï');
update `table` set `field` = replace(`field`, 'ð', 'ð');
update `table` set `field` = replace(`field`, 'ñ', 'ñ');
update `table` set `field` = replace(`field`, 'ò', 'ò');
update `table` set `field` = replace(`field`, 'ó', 'ó');
update `table` set `field` = replace(`field`, 'ô', 'ô');
update `table` set `field` = replace(`field`, 'õ', 'õ');
update `table` set `field` = replace(`field`, 'ö', 'ö');
update `table` set `field` = replace(`field`, '÷', '÷');
update `table` set `field` = replace(`field`, 'ø', 'ø');
update `table` set `field` = replace(`field`, 'ù', 'ù');
update `table` set `field` = replace(`field`, 'ú', 'ú');
update `table` set `field` = replace(`field`, 'û', 'û');
update `table` set `field` = replace(`field`, 'ü', 'ü');
update `table` set `field` = replace(`field`, 'ý', 'ý');
update `table` set `field` = replace(`field`, 'þ', 'þ');
update `table` set `field` = replace(`field`, 'ÿ', 'ÿ');

update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'í','í');
update `table` set `field` = replace(`field` ,'õ','õ');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'Á','Á');
update `table` set `field` = replace(`field` ,'Â','Â');
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'Í','Í');
update `table` set `field` = replace(`field` ,'Õ','Õ');
update `table` set `field` = replace(`field` ,'Ú','Ú');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
update `table` set `field` = replace(`field` ,'À','À');
update `table` set `field` = replace(`field` ,'Ê','Ê');
update `table` set `field` = replace(`field` ,'Ó','Ó');
update `table` set `field` = replace(`field` ,'Ô','Ô');
update `table` set `field` = replace(`field` ,'Ü','Ü');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'à','à');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'ô','ô');
update `table` set `field` = replace(`field` ,'ü','ü');
update `table` set `field` = replace(`field` ,'&','&');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'&lt;','<');
update `table` set `field` = replace(`field` ,'&circ;','ˆ');
update `table` set `field` = replace(`field` ,'&tilde;','˜');
update `table` set `field` = replace(`field` ,'&uml;','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'&cedil;','¸');
update `table` set `field` = replace(`field` ,'&quot;','"');
update `table` set `field` = replace(`field` ,'&ldquo;','“');
update `table` set `field` = replace(`field` ,'&rdquo;','”');
update `table` set `field` = replace(`field` ,'&lsquo;','‘');
update `table` set `field` = replace(`field` ,'&rsquo;','’');
update `table` set `field` = replace(`field` ,'&lsaquo;','‹');
update `table` set `field` = replace(`field` ,'&rsaquo;','›');
update `table` set `field` = replace(`field` ,'&laquo;','«');
update `table` set `field` = replace(`field` ,'&raquo;','»');
update `table` set `field` = replace(`field` ,'&ordm;','º');
update `table` set `field` = replace(`field` ,'&ordf;','ª');
update `table` set `field` = replace(`field` ,'&ndash;','–');
update `table` set `field` = replace(`field` ,'&mdash;','—');
update `table` set `field` = replace(`field` ,'&macr;','¯');
update `table` set `field` = replace(`field` ,'&hellip;','…');
update `table` set `field` = replace(`field` ,'&brvbar;','¦');
update `table` set `field` = replace(`field` ,'&bull;','•');
update `table` set `field` = replace(`field` ,'&para;','¶');
update `table` set `field` = replace(`field` ,'&sect;','§');
update `table` set `field` = replace(`field` ,'&sup1;','¹');
update `table` set `field` = replace(`field` ,'&sup2;','²');
update `table` set `field` = replace(`field` ,'&sup3;','³');
update `table` set `field` = replace(`field` ,'&frac12;','½');
update `table` set `field` = replace(`field` ,'&frac14;','¼');
update `table` set `field` = replace(`field` ,'&frac34;','¾');
update `table` set `field` = replace(`field` ,'&#8539;','⅛');
update `table` set `field` = replace(`field` ,'&#8540;','⅜');
update `table` set `field` = replace(`field` ,'&#8541;','⅝');
update `table` set `field` = replace(`field` ,'&#8542;','⅞');
update `table` set `field` = replace(`field` ,'&gt;','>');
update `table` set `field` = replace(`field` ,'&lt;','<');
update `table` set `field` = replace(`field` ,'&plusmn;','±');
update `table` set `field` = replace(`field` ,'&minus;','−');
update `table` set `field` = replace(`field` ,'&times;','×');
update `table` set `field` = replace(`field` ,'&divide;','÷');
update `table` set `field` = replace(`field` ,'&lowast;','∗');
update `table` set `field` = replace(`field` ,'&frasl;','⁄');
update `table` set `field` = replace(`field` ,'&permil;','‰');
update `table` set `field` = replace(`field` ,'&int;','∫');
update `table` set `field` = replace(`field` ,'&sum;','∑');
update `table` set `field` = replace(`field` ,'&prod;','∏');
update `table` set `field` = replace(`field` ,'&radic;','√');
update `table` set `field` = replace(`field` ,'&infin;','∞');
update `table` set `field` = replace(`field` ,'&asymp;','≈');
update `table` set `field` = replace(`field` ,'&cong;','≅');
update `table` set `field` = replace(`field` ,'&prop;','∝');
update `table` set `field` = replace(`field` ,'&equiv;','≡');
update `table` set `field` = replace(`field` ,'&ne;','≠');
update `table` set `field` = replace(`field` ,'&le;','≤');
update `table` set `field` = replace(`field` ,'&ge;','≥');
update `table` set `field` = replace(`field` ,'&there4;','∴');
update `table` set `field` = replace(`field` ,'&sdot;','⋅');
update `table` set `field` = replace(`field` ,'&middot;','·');
update `table` set `field` = replace(`field` ,'&part;','∂');
update `table` set `field` = replace(`field` ,'&image;','ℑ');
update `table` set `field` = replace(`field` ,'&real;','ℜ');
update `table` set `field` = replace(`field` ,'&prime;','′');
update `table` set `field` = replace(`field` ,'&Prime;','″');
update `table` set `field` = replace(`field` ,'&deg;','°');
update `table` set `field` = replace(`field` ,'&ang;','∠');
update `table` set `field` = replace(`field` ,'&perp;','⊥');
update `table` set `field` = replace(`field` ,'&nabla;','∇');
update `table` set `field` = replace(`field` ,'&oplus;','⊕');
update `table` set `field` = replace(`field` ,'&otimes;','⊗');
update `table` set `field` = replace(`field` ,'&alefsym;','ℵ');
update `table` set `field` = replace(`field` ,'&oslash;','ø');
update `table` set `field` = replace(`field` ,'&Oslash;','Ø');
update `table` set `field` = replace(`field` ,'&isin;','∈');
update `table` set `field` = replace(`field` ,'&notin;','∉');
update `table` set `field` = replace(`field` ,'&cap;','∩');
update `table` set `field` = replace(`field` ,'&cup;','∪');
update `table` set `field` = replace(`field` ,'&sub;','⊂');
update `table` set `field` = replace(`field` ,'&sup;','⊃');
update `table` set `field` = replace(`field` ,'&sube;','⊆');
update `table` set `field` = replace(`field` ,'&supe;','⊇');
update `table` set `field` = replace(`field` ,'&exist;','∃');
update `table` set `field` = replace(`field` ,'&forall;','∀');
update `table` set `field` = replace(`field` ,'&empty;','∅');
update `table` set `field` = replace(`field` ,'&not;','¬');
update `table` set `field` = replace(`field` ,'&and;','∧');
update `table` set `field` = replace(`field` ,'&or;','∨');
update `table` set `field` = replace(`field` ,'&crarr;','↵');
User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

@rockedge I don't understand the need for the first 4 patterns and I doubt that they work (after replacing †is will be impossible to find “) and I also don't understand the need to convert HTML entities (if such entities are used they normally are used on purpose and should not be converted to the characters they represent. I haven't seen a case where the database conversion turned let's say a 'ü' into a '&uuml;' )

The rest in the middle looks good and should be able to take care of common Roman characters :thumbup2:

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel

I did some fixes with encoding. Some worked well but the French language sub forum titles are still wrong. What might be happening is double or triple conversion. That the uft8 was converted again to utf8. This is the SQL code I used directly in phpmyadmin SQL console to convert the individual tables:

Code: Select all

UPDATE `phpbb_posts` SET `post_text`= CONVERT(CAST(CONVERT(`post_text` USING latin1) AS BINARY) USING utf8)
UPDATE `phpbb_forums` SET `forum_name`= CONVERT(CAST(CONVERT(`forum_name` USING latin1) AS BINARY) USING utf8)

I could create a PHP script that would loop through an array with the table names to convert and initiate a query but I went and did it manually directly to the database tables that contain the text. I tested it out on the development/test platforms several times which entails moving around large SQL dumps so it is time consuming to run just one test. But was mostly successful except for the French language titles of the topics. Like I mentioned, it could be the encoding conversion doubly encoded fields it shouldn't have.

Perhaps you can track down which encoding those French sub forum topic titles might be in?

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

rockedge wrote: Wed Apr 21, 2021 1:16 pm

I did some fixes with encoding.

Did you use the "promising" replacement patterns? Where? Only on the test platform or on the productive system? I'm asking because I see that the post texts in the German/French/Spanish subforums seem all OK now.

Some worked well but the French language sub forum titles are still wrong.

German and Spanish topic titles are also wrong.

This is the SQL code I used directly in phpmyadmin SQL console to convert the individual tables:

Code: Select all

UPDATE `phpbb_posts` SET `post_text`= CONVERT(CAST(CONVERT(`post_text` USING latin1) AS BINARY) USING utf8)
UPDATE `phpbb_forums` SET `forum_name`= CONVERT(CAST(CONVERT(`forum_name` USING latin1) AS BINARY) USING utf8)

Sorry, I don't speak SQL or PHP. If this tries to convert from Latin1 to UTF8 I don't understand the objective. For example the French résolu consists of 7 letters of the Latin 1 codepage. Converting them to UTF8 will encode them to 5 letters of Unicode block "Basic Latin" and 2 letters of Unicode block "Latin-1 Supplement", but this is irrelevant for the reader because the resulting string is the same: résolu. Garbled strings remain garbled.

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel the basic idea is the query takes the individual database table and column for example

Code: Select all

UPDATE `phpbb_topic` SET `topic_title`= CONVERT(CAST(CONVERT(`topic_title` USING latin1) AS BINARY) USING utf8)

and first converts the table->column encoding from latin1 to BINARY then encodes from BINARY to UTF8. So the idea is translate the database table in one query from LATIN1 -> BINARY -> UTF8.

This seems to have worked with the actual POST data, and but not on the Topic Title so I am working on 2 cloned databases (just in case) to try out different methods to fix the Topic Titles.

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

MochiMoppel wrote: Wed Apr 21, 2021 3:03 pm

Did you use the "promising" replacement patterns?

Let me rephrase: Did you try it?

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

No. Not yet. I have to write some code to try it out. It might be the best way.

I spent the last 20 hours copying the test database into a second copy before I try it. Some time today I'll run it.

User avatar
MochiMoppel
Posts: 1137
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 18 times
Been thanked: 372 times

Re: Glitches in oldforum.puppylinux

Post by MochiMoppel »

rockedge wrote: Thu Apr 22, 2021 2:39 pm

Some time today I'll run it.

Did it work?
The post subjects seem to be fixed now. Not much left to fix.
If you did this with the "LATIN1 -> BINARY -> UTF8" method and not with the "promising" method then this is much superior to any "promising" SQL/sed/awk workaround, isn't it. Still looks like a miracle to me, but sometimes miracles happen :)

User avatar
puppy_apprentice
Posts: 662
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 4 times
Been thanked: 108 times

Re: Glitches in oldforum.puppylinux

Post by puppy_apprentice »

The browsers padlock is broken for old forum.
Image

have to use third-party picture service because attached picture is broken - uploading error?
have to use third-party picture service because attached picture is broken - uploading error?
http_not_https.jpg (42.88 KiB) Viewed 357 times
User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@MochiMoppel I think we are close to finished. The Latin1 > Binary > UTF8 method seems to work well.

Now the problem is I am finding html tags that are not correct at all (<r>, <E>) that somehow in the conversion were placed in the text in some posts where the post has many Quotes and emoji's and this causes some topic pages to show up blank. Some posts I fixed manually would show pages 3 through 5 but not 1 or 2. I go into the mysql table and remove these false html tags and the topic/post works again.

So there might be a reason to do the awk/sed search and replace to fix that. Other wise I do have the other converted database runs that seem to not have this problem. Maybe I might try to swap in and change the database by importing one of the versions that do not have these tags in them. But that involves large data imports and I do not want to break what already works.

The converter tool translated some character patterns into html tags that don't exist and should not be in the mysql post_text table. The latest test conversions I did on test platforms did not have this problem it seems.

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@puppy_apprentice I have been looking at that also to remove the broken lock. I have it now gone when I'm logged in by changing my avatar's address to use https (Gravatar) in my personal board preferences.

We are working towards tracking down loose ends.

User avatar
puppy_apprentice
Posts: 662
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 4 times
Been thanked: 108 times

Re: Glitches in oldforum.puppylinux

Post by puppy_apprentice »

Ok. I think that some pictures of AlanStyle-Subsilver urls should be changed from http:// to https:// like main banner of old forum (murga_forum_header110.png).

I see this when i try to login. It says that login is insecure:
Image

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

That's the odd thing, I have all the images as https for the header banner. I don't have the broken lock at the moment on the oldforum so I'll test it out on one of the development platforms to see if I can find out more.

perdido

Re: Glitches in oldforum.puppylinux

Post by perdido »

Just a heads up, looking kind of strange over at the old forum, replies missing subject headers on the main page.
Maybe there is some maintenance happening and it will correct itself.....

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

@perdido Are you still encountering it?

I did run a database backup that might have caused that error.

perdido

Re: Glitches in oldforum.puppylinux

Post by perdido »

Hi rockedge,
As of right now, yes. 4 different browsers show it the same.

User avatar
Makoto
Posts: 76
Joined: Thu Jul 16, 2020 6:29 am
Has thanked: 10 times
Been thanked: 8 times

Re: Glitches in oldforum.puppylinux

Post by Makoto »

I've seen other forums with the more recent versions of phpBB do that... I haven't looked into it, or tried it with different browsers, but for me it did seem to happen with replies to a thread (and not the post that starts a thread).

Edit - yeah, for me, it's only happening with replies to threads that do NOT have a title/subject line of their own added by the user replying. I see it both on the main index listing and in the posts themselves (it doesn't carry over the thread's title for every post, and only posts with added titles will... well, show a title).

Tested it with both Firefox and Pale Moon. It doesn't happen here, just on the old forum.

[ FossaPup64 9.5, Frugal install ]

perdido

Re: Glitches in oldforum.puppylinux

Post by perdido »

Makoto wrote: Wed May 05, 2021 6:43 am

Edit - yeah, for me, it's only happening with replies to threads that do NOT have a title/subject line of their own added by the user replying. I see it both on the main index listing and in the posts themselves (it doesn't carry over the thread's title for every post, and only posts with added titles will... well, show a title).

Somebody is going to have to re-add all those missing title/subject lines manually then? :D

User avatar
rockedge
Site Admin
Posts: 5842
Joined: Mon Dec 02, 2019 1:38 am
Location: Connecticut,U.S.A.
Has thanked: 2093 times
Been thanked: 2186 times
Contact:

Re: Glitches in oldforum.puppylinux

Post by rockedge »

I am working on a better copy of the converted database right now. I have noticed problems with subject lines and topic titles among other problems. There are stray html tags or encoding of characters that are incorrect, due to some process of the conversion tool, all through the database tables. These incorrectly encoded characters as well as the stray html tags that actually don't exist, cause the parsing of the pages to fail.

All the data is there in the current database, BUT the badly encoded junk and strangely converted incorrect html tags is causing the pages to be blank or show an error.

Now after doing the conversion of the original to the latest many more times, the process has improved. I just learned more by doing the entire process over again and again, on several different versions of web server combinations.

My idea is when I have this latest version of the database all squared away is to replace the current one with it. I rushed the first working version into production status to just get the murga forum up, with the idea in mind to keep on improving it once the glitches were discovered.

Otherwise....yes.....it will have to be done manually.....I've already done a bunch as I ran across them, but it involves direct MySQL database editing most of the time. So I am very interested in getting this database I'm working with now, finished with all the glitches addressed and fixed.

Oh and the settings for each user in their UCP that enables BBCode and emoji's in their Signature field, has flipped to "disabled" and so now any one with a signature field might notice there signatures don't render they'll need to go to their user control panel and select "signature" and uncheck those 2 boxes. This should be this way by default, but something in the conversion changed it to "disabled"

User avatar
mikewalsh
Moderator
Posts: 5669
Joined: Tue Dec 03, 2019 1:40 pm
Location: King's Lynn, UK
Has thanked: 606 times
Been thanked: 1742 times

Re: Glitches in oldforum.puppylinux

Post by mikewalsh »

@rockedge :-

Hah! I wondered why my siggie wasn't showing up as it should. I 'flipped' those two checkboxes back again, and everything's now behaving itself.

Thanks for the info! :thumbup:

Mike. ;)

Puppy "stuff" ~ MORE Puppy "stuff" ~ ....and MORE! :D
_______________________________________________________

Image

Post Reply

Return to “Forum Organization & Structure Council”