sed doesn't find all file names

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
User avatar
amethyst
Posts: 2470
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 520 times

sed doesn't find all file names

Post by amethyst »

Here is an extract from a line in SFR's multicopypaste script:

NAME="$(echo "$1" | sed -e 's~&~\&amp;~g' -e 's~<~\&lt;~g' -e 's~>~\&gt;~g')"

I find that in some (rare) cases the above sed command does not pick up all names which results in unsuccessful operation. For instance: trying to work with a file named "Spaarrekeningstaat%20-%2013_02_2021.pdf" is unsuccessful. I get a no such file or directory error. Any ideas how the sed command can be changed to include ALL file names. Cheers.
PS: Dragging and dropping this file works when not using the application so it can be copied/moved.

User avatar
JakeSFR
Posts: 287
Joined: Wed Jul 15, 2020 2:23 pm
Been thanked: 171 times

Re: sed question

Post by JakeSFR »

It's ROX's fault.
When filename contains HTML hex codes, ROX can replace them with ASCII characters in some circumstances.

For example, when you right click Spaarrekeningstaat%20-%2013_02_2021.pdf file and use 'Open With/Send To', ROX sends invalid Spaarrekeningstaat - 13_02_2021.pdf filename to the target application.

It's fixed in jun7's fork:
https://github.com/jun7/rox-filer/issues/197

Greetings!

[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
User avatar
MochiMoppel
Posts: 1343
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 22 times
Been thanked: 521 times

Re: sed question

Post by MochiMoppel »

@JakeSFR I don't understand how this solves to the sed question.
Assuming that $1 holds the value 'Spaarrekeningstaat%20-%2013_02_2021.pdf' and further assuming that the sed command is supposed to convert characters that conflict with pango, e.g. '&' and '<', to HTML entities, wouldn't then the '%' also have to be converted? For reasons I can't remember I have '%' in the list of critical characters. Should be OK but must have caused troubles in the past ...

User avatar
JakeSFR
Posts: 287
Joined: Wed Jul 15, 2020 2:23 pm
Been thanked: 171 times

Re: sed question

Post by JakeSFR »

@MochiMoppel: No, because the filename that gets processed by 'sed' in that line no longer contains those '%xy' entries (they are replaced by ROX beforehand), so there's nothing to convert.

What MultiCopyPaste receives from ROX is, e.g. 'abc def', regardless if the actual filename was 'abc def' or 'abc%20def'.

I don't see any good and clean way to fix that ambiguity at the script level, only in ROX itself...

Greetings!

[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
User avatar
amethyst
Posts: 2470
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 520 times

Re: sed question

Post by amethyst »

It seems to me that the combination of % and a digit in the file name creates this problem. I've tried other weird characters and names which seem to work no problem.

To be clear: you can have multiple % in a name, no problem. You can have other characters (not %) and numbers in the name, no problem. % and other characters, no problem. % and numbers, no go.

User avatar
MochiMoppel
Posts: 1343
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 22 times
Been thanked: 521 times

Re: sed question

Post by MochiMoppel »

JakeSFR wrote:

I don't see any good and clean way to fix that ambiguity at the script level, only in ROX itself...

Maybe something like this:

Code: Select all

[[ -e $1 ]] && NAME=$1 || NAME=${1// /%20}
NAME="$(echo "$NAME" | sed -e 's~&~\&amp;~g' -e 's~<~\&lt;~g' -e 's~>~\&gt;~g')"
User avatar
amethyst
Posts: 2470
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 520 times

Re: sed question

Post by amethyst »

MochiMoppel wrote: Thu Feb 25, 2021 2:20 pm
JakeSFR wrote:

I don't see any good and clean way to fix that ambiguity at the script level, only in ROX itself...

Maybe something like this:

Code: Select all

[[ -e $1 ]] && NAME=$1 || NAME=${1// /%20}
NAME="$(echo "$NAME" | sed -e 's~&~\&amp;~g' -e 's~<~\&lt;~g' -e 's~>~\&gt;~g')"

All combinations of % and digits?

User avatar
JakeSFR
Posts: 287
Joined: Wed Jul 15, 2020 2:23 pm
Been thanked: 171 times

Re: sed question

Post by JakeSFR »

MochiMoppel wrote: Thu Feb 25, 2021 2:20 pm
JakeSFR wrote:

I don't see any good and clean way to fix that ambiguity at the script level, only in ROX itself...

Maybe something like this:

Code: Select all

[[ -e $1 ]] && NAME=$1 || NAME=${1// /%20}
NAME="$(echo "$NAME" | sed -e 's~&~\&amp;~g' -e 's~<~\&lt;~g' -e 's~>~\&gt;~g')"

Yeah, but the problem is that %20 is not the only one, there could be many combinations.
And what if both 'abc def' and 'abc%20def' files exist in a given location? Which one is the right one?

Anyway, I've done something like that, but more extensively, in UExtract's AppRun at some point (before it got properly fixed in ROX), but it's neither good nor clean and I'm far from being proud of that hack.
Actually, I might want to remove it in next version:

Code: Select all

# Try to properly handle filenames with HTML % triplets, which are replaced by ROX to ASCII characters.
# Fixed by jun7 in https://github.com/jun7/rox-filer/commit/5a3126c7e9780007ebd318dae0a6403027cb0d79
for FILE in $(seq 0 $((${#IFILES[@]}-1))); do
	if [ ! -f "${IFILES[$FILE]}" ]; then
		TMPFILENAME="${IFILES[$FILE]}"
		for i in ' ' '!' '\*' "\'" '(' ')' ';' ':' '@' '&' '=' '+' ',' '\/' '\?' '%' '#' '[' ']'; do
			TMPFILENAME="${TMPFILENAME//$i/$(printf "%%%x" "'$i")}"
			[ -f "$TMPFILENAME" ] && IFILES[$FILE]="$TMPFILENAME" && break
		done
	fi
done

It works in the most common cases, when there's only one specific '%xy' value in the real filename, but won't work in cases like 'abc%20def%23ghi'.

amethyst wrote:

All combinations of % and digits?

Yeah, basically all possible combination (or, the other way around, all files available at a given location) would have to be checked, which is quite insane...
And even then, there's still room for ambiguity.

Greetings!

[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
User avatar
amethyst
Posts: 2470
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 520 times

Re: sed question

Post by amethyst »

Jake, so how would one implement that into ROX or your script? Thanks. No rush... :)

User avatar
JakeSFR
Posts: 287
Joined: Wed Jul 15, 2020 2:23 pm
Been thanked: 171 times

Re: sed question

Post by JakeSFR »

amethyst wrote: Thu Feb 25, 2021 3:11 pm

Jake, so how would one implement that into ROX or your script? Thanks. No rush... :)

In a script, it's pretty pointless, as I've already shown.
EDIT: if you only care about '%20' values, MochiMoppel's example should do the trick, though.

If it comes for ROX, I think you need to contact the maintainer of your Puppy version or directly the maintainer (if there's any?) of that specific ROX build you use.

The relevant commit from jun7's fork would be a good starting point for the developer(s) to back-port the fix:
https://github.com/jun7/rox-filer/commi ... 3027cb0d79

Greetings!

[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
User avatar
MochiMoppel
Posts: 1343
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 22 times
Been thanked: 521 times

Re: sed question

Post by MochiMoppel »

amethyst wrote:

% and other characters, no problem. % and numbers, no go.

Oh, it's worse than you think.
%bla.txt is OK, but %Bbla.txt is not! The rule is: % followed by 2 hex digits (0-9, a-f,A-F) creates a problem.

JakeSFR wrote:

In a script, it's pretty pointless, as I've already shown.

Do not despair. :)
It's not as hopeless as you think.

I don't know multicopypaste, so my solution may not fit for this purpose, but - with a few caveats - it can correct all unwanted UTF-8 name conversions.
You should remember that ROX puts a copy of all selected and unconverted filenames into the primary selection buffer. So if the filenames in the buffer contain %xx triplets you know that ROX will not pass correct names to a right-click app. In this case you should use the buffer.

My solution requires xclip or xsel (should be present in all decent distros), though works without them as long as names don't contain %xx. If indeed not installed resulting names may have to be checked for existence and user should be alerted.

My test files:
/opt/a %2b b.txt
/opt/a %C3%98 b.txt

and what ROX would output:
/opt/a + b.txt
/opt/a Ø b.txt

The following test script should be copied to or linked in the directory /root/.config/rox.sourceforge.net/SendTo.
After selecting files in ROX-Filer the script should be called from the right-click menu ("Open With...")

Code: Select all

#!/bin/bash
clip=$(type -p xclip)||clip=$(type -p xsel) # assume that one of the two is installed
buf=$($clip -o)                             # assign contents of  primary selection buffer to variable buf
TMP=$IFS ; IFS=$'\n'                        # prepare for filenames containing spaces or tabs
if [[ $# = 1 && ! -e $1 && -e $clip && ! -e $buf ]];then # if user directly rightclicked a single file and file contains %xx filename, then this file is not in buffer and ROX converted name to UTF-8 
    Xdialog -msg "File not found\nPlease select file in ROX-Filer with Ctrl+Lclick, then try again" x && exit
elif [[ $buf =~ %[0-9a-fA-F]{2} ]] ;then    # if the primary selection buffer contains %xx
    buf=${buf// \//$'\n/'}                  # put every filename (may contain spaces) on separate line
    set --                                  # let's forget the crappy args passed by ROX
    set $@ -- $buf                          # instead assign to $@ the correct file names contained in the buffer
fi

#more code needed here if xclip or xsel no installed. Test for non-existing files (original %xx converted to UTF-8)

gxmessage -c  "First file: $1"
gxmessage -c  "Second file: $2"
gxmessage -c  "All files: $*"
IFS=$TMP    #in case original IFS is needed in any following code

As you see the only problem occurs if a user directly righ-clicks a single %xx file to trigger the context menu. ROX does not select a right-clicked file and doesn't put its path into the buffer (Make a funny test: Rclick on a single file but do not select anything from the menu, instead select another window. The rclicked file is not selected or markded as active but the window remains active and the selected window is not focussed). For multiple selections this problem does not occur since the user first has to select the files with Ctrl+Lclick before triggering the context menu. If he selects a single file this way, no problem either, if not he has to be educated :lol:

User avatar
amethyst
Posts: 2470
Joined: Tue Dec 22, 2020 6:35 am
Has thanked: 57 times
Been thanked: 520 times

Re: sed doesn't find all file names

Post by amethyst »

Just drag and drop "problematic" files. I encounter these files rarely, so not really an issue. Besides, it seems as if the problem has been addressed in new ROX versions... but thanks for your effort.

User avatar
JakeSFR
Posts: 287
Joined: Wed Jul 15, 2020 2:23 pm
Been thanked: 171 times

Re: sed doesn't find all file names

Post by JakeSFR »

MochiMoppel wrote:

You should remember that ROX puts a copy of all selected and unconverted filenames into the primary selection buffer.

I think I wasn't aware of that...

MochiMoppel wrote:

So if the filenames in the buffer contain %xx triplets you know that ROX will not pass correct names to a right-click app. In this case you should use the buffer.

Almost! If it also worked for direct right-clicking a single file, which is probably the most common use case, it could be a viable workaround indeed.

But still, the real solution is to fix it in ROX itself...

Greetings!

[O]bdurate [R]ules [D]estroy [E]nthusiastic [R]ebels => [C]reative [H]umans [A]lways [O]pen [S]ource
Omnia mea mecum porto.
Post Reply

Return to “Programming”