Page 1 of 1
sed cannot wild match garbled characters when using -z option (solved)
Posted: Tue Sep 19, 2023 3:45 am
by miltonx
I tried to treat entire text as one line by using -z option, but then the .* widcard failed:
Code: Select all
echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
(outputs nothing)
When -z is removed, it correctly matches:
Code: Select all
echo -e 'firstline\n2ndline' | sed -E "s|.*|x|"
output:x
x
I'm running this on Debian 11.
Any ideas why this happens?
Re: sed cannot wild match when using -z option
Posted: Tue Sep 19, 2023 4:55 am
by MochiMoppel
miltonx wrote: Tue Sep 19, 2023 3:45 am
I tried to treat entire text as one line by using -z option, but then the .* widcard failed:
Code: Select all
echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
(outputs nothing)
Works here (BW64) as expected. Outputs a single 'x'.
Re: sed cannot wild match when using -z option
Posted: Tue Sep 19, 2023 5:49 am
by Burunduk
Works on Fossapup, same sed 4.7 as in Debian 11.
With -z sed removes line feeds too. Maybe you've just overlooked that x before the prompt string.
Code: Select all
root# echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
xroot#
Re: sed cannot wild match when using -z option
Posted: Tue Sep 19, 2023 8:22 am
by miltonx
Burunduk wrote: Tue Sep 19, 2023 5:49 am
Works on Fossapup, same sed 4.7 as in Debian 11.
With -z sed removes line feeds too. Maybe you've just overlooked that x before the prompt string.
Code: Select all
root# echo -e 'firstline\n2ndline' | sed -z -E "s|.*|x|"
xroot#
Yes, I overlooked that x lurking there.
Re: sed cannot wild match when using -z option
Posted: Tue Sep 19, 2023 8:36 am
by miltonx
But I still have problem with the following sample.
There is a text file /z with content (including some garbed characters) like this:
Code: Select all
ffmpeg version 2.8.11 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 5.3.0 (GCC)
configuration: --prefix=/usr --libdir=/usr/lib64 --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-pthreads --enable-small --enable-postproc --enable-libvorbis --enable-gpl --enable-shared --enable-nonfree --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-debug --enable-bzlib --enable-zlib --enable-libspeex --enable-version3 --enable-runtime-cpudetect --enable-x11grab --enable-libschroedinger --enable-libtheora --enable-libxvid --enable-swscale --enable-libvpx
libavutil 54. 31.100 / 54. 31.100
libavcodec 56. 60.100 / 56. 60.100
libavformat 56. 40.101 / 56. 40.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 40.101 / 5. 40.101
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.101 / 1. 2.101
libpostproc 53. 3.100 / 53. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/sda2/wreckit2.mp4':
Metadata:
major_brand : mp42
minor_version : 1
compatible_brands: isomavc1mp423gp5
creation_time : 2019-01-04 16:08:31
encoder : My MP4Box GUI 0.6.0.6 <http://my-mp4box-gui.zymichost.com>
Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
Metadata:
creation_time : 2019-01-04 16:08:31
handler_name : ÃÞµÃÃƻµÃõ2£ºÂ´Ã³Ãֻ¥ÁªÃø.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:44
handler_name : [91xinpian.com]ÃÞµÃÃƻµÃõ2£ºÂ´Ã³Ãֻ¥ÁªÃøHD1080P¸ÃÃÃ¥Ó¢ÃïÃÃÃÃÃÃˮӡ_track2_und_track1_und.aac
Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:47
handler_name : GPAC MPEG-4 OD Handler
Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:47
handler_name : GPAC MPEG-4 Scene Description Handler
At least one output file must be specified
Running:
Code: Select all
cat /z | sed -z -E "s|.*(Duration.*)At least.*|\1|" > /zz
Resulting /zz is: (nothing matched)
Code: Select all
ffmpeg version 2.8.11 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 5.3.0 (GCC)
configuration: --prefix=/usr --libdir=/usr/lib64 --enable-libmp3lame --enable-libx264 --enable-libx265 --enable-libfdk-aac --enable-pthreads --enable-small --enable-postproc --enable-libvorbis --enable-gpl --enable-shared --enable-nonfree --enable-libopencore-amrnb --enable-libopencore-amrwb --disable-debug --enable-bzlib --enable-zlib --enable-libspeex --enable-version3 --enable-runtime-cpudetect --enable-x11grab --enable-libschroedinger --enable-libtheora --enable-libxvid --enable-swscale --enable-libvpx
libavutil 54. 31.100 / 54. 31.100
libavcodec 56. 60.100 / 56. 60.100
libavformat 56. 40.101 / 56. 40.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 40.101 / 5. 40.101
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.101 / 1. 2.101
libpostproc 53. 3.100 / 53. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/sda2/wreckit2.mp4':
Metadata:
major_brand : mp42
minor_version : 1
compatible_brands: isomavc1mp423gp5
creation_time : 2019-01-04 16:08:31
encoder : My MP4Box GUI 0.6.0.6 <http://my-mp4box-gui.zymichost.com>
Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
Metadata:
creation_time : 2019-01-04 16:08:31
handler_name : ÃÞµÃÃƻµÃõ2£ºÂ´Ã³Ãֻ¥ÁªÃø.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:44
handler_name : [91xinpian.com]ÃÞµÃÃƻµÃõ2£ºÂ´Ã³Ãֻ¥ÁªÃøHD1080P¸ÃÃÃ¥Ó¢ÃïÃÃÃÃÃÃˮӡ_track2_und_track1_und.aac
Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:47
handler_name : GPAC MPEG-4 OD Handler
Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:47
handler_name : GPAC MPEG-4 Scene Description Handler
At least one output file must be specified
Re: sed cannot wild match when using -z option
Posted: Tue Sep 19, 2023 10:22 am
by MochiMoppel
miltonx wrote: Tue Sep 19, 2023 8:36 am
Running:
Code: Select all
cat /z | sed -z -E "s|.*(Duration.*)At least.*|\1|" > /zz
Results in
Code: Select all
Duration: 01:51:12.12, start: 0.000000, bitrate: 1345 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 1246 kb/s, 25 fps, 25 tbr, 25k tbn, 50 tbc (default)
Metadata:
creation_time : 2019-01-04 16:08:31
handler_name : ÃÞµÃÃƻµÃõ2£ºÂ´Ã³Ãֻ¥ÁªÃø.Ralph.Breaks.the.Internet.2018.HD-720p.X264.AAC-99Mp4_track201.h264
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 95 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:44
handler_name : [91xinpian.com]ÃÞµÃÃƻµÃõ2£ºÂ´Ã³Ãֻ¥ÁªÃøHD1080P¸ÃÃÃ¥Ó¢ÃïÃÃÃÃÃÃˮӡ_track2_und_track1_und.aac
Stream #0:2(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:47
handler_name : GPAC MPEG-4 OD Handler
Stream #0:3(und): Data: none (mp4s / 0x7334706D), 0 kb/s (default)
Metadata:
creation_time : 2019-01-04 16:08:47
handler_name : GPAC MPEG-4 Scene Description Handler
Works as expected.
Maybe you examined the input file instead of the output file...
Re: sed cannot wild match when using -z option
Posted: Tue Sep 19, 2023 8:19 pm
by Burunduk
Is it my turn to say something?
Well, this code puts zz into the system root directory next to z already there. For some reason I don't like it.
Other than that, it works OK. I think it'll work for you too if you copy the ffmpeg output back. It's now a valid unicode sequence but it probably wasn't initially. You can try to run this:
LC_ALL=C sed -z -E "s|.*(Duration.*)At least.*|\1|" ffmpeg.out > ffmpeg.txt
If it works, the problem is in those garbled characters. See GNU sed manual, paragraph 5.9.1
For example:
Code: Select all
root# echo -e 'abÄba\nab\xc4ba'
abÄba
abÄba
root# echo -e 'abÄba\nab\xc4ba'| sed -E 's/a(.*)a/\1/'
bÄb
abÄba
root# echo -e 'abÄba\nab\xc4ba'| LC_ALL=C sed -E 's/a(.*)a/\1/'
bÄb
bÄb
Re: sed cannot wild match when using -z option
Posted: Wed Sep 20, 2023 2:18 am
by miltonx
Burunduk wrote: Tue Sep 19, 2023 8:19 pm
Is it my turn to say something?
Well, this code puts zz into the system root directory next to z already there. For some reason I don't like it.
Other than that, it works OK. I think it'll work for you too if you copy the ffmpeg output back. It's now a valid unicode sequence but it probably wasn't initially. You can try to run this:
LC_ALL=C sed -z -E "s|.*(Duration.*)At least.*|\1|" ffmpeg.out > ffmpeg.txt
If it works, the problem is in those garbled characters. See GNU sed manual, paragraph 5.9.1
For example:
Code: Select all
root# echo -e 'abÄba\nab\xc4ba'
abÄba
abÄba
root# echo -e 'abÄba\nab\xc4ba'| sed -E 's/a(.*)a/\1/'
bÄb
abÄba
root# echo -e 'abÄba\nab\xc4ba'| LC_ALL=C sed -E 's/a(.*)a/\1/'
bÄb
bÄb
It's not good practice to put randomly named files under / directory, but this was purely for quick experimenting this script.
After testing, it looks like the garbled characters caused the failure to match. The sed locale considerations page provides very good information. It solves my question.
@MochiMoppel made it work probably because the garbed characters underwent some modification when posting to this forum. When I copy it back to /z, it also works. But when I run ffmpeg again and redirect the result to /z, sed fails.