How to use sed to match multiple lines?

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
HerrBert
Posts: 333
Joined: Mon Jul 13, 2020 6:14 pm
Location: Germany, NRW
Has thanked: 17 times
Been thanked: 112 times

How to use sed to match multiple lines?

Post by HerrBert »

What it is about:
I want to get a text-block containing the header and all occurrencies of strings appearing in this text-block.

Looking for some magic using sed

What i have so far is a code that prints every single occurrence with the corresponding header:

Code: Select all

# ls -AhlpR /mnt/sda3/x86_64/ | sed -ne '/^\//h;/.* \+.* \+.* \+.* \+.* \+.* \+.* \+.*spot/{x;s/^\//\n\//;p;x;p}'

/mnt/sda3/x86_64/:
-rw-r--r--  1 root root 492K Jan  8 09:58 .ff_esr_64-spotprofile.tar.gz

/mnt/sda3/x86_64/brave-browser:
-rw-r--r--  1 root root  26M Aug 25  2021 brave-spotprofile.tar.gz

/mnt/sda3/x86_64/brave-browser:
-rwxr-xr-x  1 root root  453 Jul 14  2021 save-spot

/mnt/sda3/x86_64/light:
-rw-r--r-- 1 root root 369K Sep 24  2021 light-spotprofile.tar.gz

/mnt/sda3/x86_64/light:
drwx------ 8 spot spot 4,0K Jan 14 16:35 spot/

/mnt/sda3/x86_64/min-browser:
-rw-r--r--  1 root root 3,4M Nov 27 16:24 min-spotprofile-221127-180223.tar.gz

/mnt/sda3/x86_64/min-browser:
-rw-r--r--  1 root root 3,3M Nov 27 18:02 min-spotprofile.tar.gz

/mnt/sda3/x86_64/min-browser:
-rwxr-xr-x  1 root root  416 Aug 26  2021 save-spot

/mnt/sda3/x86_64/palemoon:
-rw-r--r--  1 root root  90K Aug 20  2022 pm64-spotprofile.tar.gz

/mnt/sda3/x86_64/seamonkey:
-rw-r--r--  1 root root  4,9M Apr  6  2022 seamonkey-spotprofile.tar.gz

/mnt/sda3/x86_64/slimjet:
-rwxr-xr-x  1 root root  457 Aug 13  2021 save-spot

My desired output would look like:

Code: Select all

/mnt/sda3/x86_64/:
-rw-r--r--  1 root root 492K Jan  8 09:58 .ff_esr_64-spotprofile.tar.gz

/mnt/sda3/x86_64/brave-browser:
-rw-r--r--  1 root root  26M Aug 25  2021 brave-spotprofile.tar.gz
-rwxr-xr-x  1 root root  453 Jul 14  2021 save-spot

/mnt/sda3/x86_64/light:
-rw-r--r-- 1 root root 369K Sep 24  2021 light-spotprofile.tar.gz
drwx------ 8 spot spot 4,0K Jan 14 16:35 spot/

/mnt/sda3/x86_64/min-browser:
-rw-r--r--  1 root root 3,4M Nov 27 16:24 min-spotprofile-221127-180223.tar.gz
-rw-r--r--  1 root root 3,3M Nov 27 18:02 min-spotprofile.tar.gz
-rwxr-xr-x  1 root root  416 Aug 26  2021 save-spot

/mnt/sda3/x86_64/palemoon:
-rw-r--r--  1 root root  90K Aug 20  2022 pm64-spotprofile.tar.gz

/mnt/sda3/x86_64/seamonkey:
-rw-r--r--  1 root root  4,9M Apr  6  2022 seamonkey-spotprofile.tar.gz

/mnt/sda3/x86_64/slimjet:
-rwxr-xr-x  1 root root  457 Aug 13  2021 save-spot

I don't want to change formatting of ls's output and i don't want to match a user or group in my pattern.

Any suggestions on this???

mow9902
Posts: 178
Joined: Fri Jul 24, 2020 11:57 pm
Has thanked: 13 times
Been thanked: 51 times

Re: How to use sed to match multiple lines?

Post by mow9902 »

I'm no programmer - but why don't you set a variable to read the header, and then only print the variable when the value changes?

User avatar
cobaka
Posts: 521
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 87 times
Been thanked: 49 times

Re: How to use sed to match multiple lines?

Post by cobaka »

Hello @HerrBert

You present a delicious problem to chew on!
In order to understand what you want I need to understand the words 'header' and 'text-block' correctly.

Text-block:
Do you mean a block of UTF-8 (or ASCII) text (characters) in a text file or on the screen?

Header:
The word 'header' appears to have many meanings in Linux. Your example shows the 'long listing' from the Linux 'ls' command. Can you explain how you mean me to understand "header".

Observation: sed is versatile, but so is grep.

Thanks,

cobaka (woof)

собака --> это Русский --> an old dog
"so-baka" (not "co", as in coast or crib).

some1
Posts: 71
Joined: Wed Aug 19, 2020 4:32 am
Has thanked: 17 times
Been thanked: 11 times

Re: How to use sed to match multiple lines?

Post by some1 »

Just in case -
I came up with some awk-code which will do the parsing of your shown data into the shown output.
So - eventually you can pipe the sed-output into awk.
----
But you want a sed-solution -so lets stay on topic.
----
By the way - those of us - who seldom use sed -
might appreciate an explanation of the sed-code
above used to maul the ls-output.

User avatar
MochiMoppel
Posts: 1123
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 361 times

Re: How to use sed to match multiple lines?

Post by MochiMoppel »

@HerrBert
Not a final solution by maybe it gives you a better start. Removing the enclosing quotes (if you don't like them) and removing listed headers without a match can be done in a second step, certainly also with sed. It may even be done within the original sed command but that would be tricky.

Code: Select all

ls -AhlpRQ /mnt/sda3/x86_64 | sed -n '
/^$/p            #keep empty lines
/:$/p            #keep headers (lines that end with colon)
/".*spot.*"$/p   #keep files/directories that contain spot
'

[EDIT] Simplified and fixed code

Last edited by MochiMoppel on Wed Mar 29, 2023 8:02 am, edited 1 time in total.
some1
Posts: 71
Joined: Wed Aug 19, 2020 4:32 am
Has thanked: 17 times
Been thanked: 11 times

Re: How to use sed to match multiple lines?

Post by some1 »

@MochiMoppel: Thanks!

Burunduk
Posts: 244
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 122 times

Re: How to use sed to match multiple lines?

Post by Burunduk »

The MochiMoppel's code is too readable! Here is a write-only variant:

Code: Select all

ls -AhlpR /mnt/sda3/x86_64/ | sed -n '/^\//h;/^\([^/ ]\+  *\)\{4\}.*spot/H;/^$/{x;/\n/{G;p}};${g;/\n/p}'

Note: it expects that paths in headers start with /. For relative paths quotes are necessary.
Explanation: H - append \n to the hold space (this \n will be subsequently used as a marker to see if a group is empty or not) and then append the pattern space.

HerrBert
Posts: 333
Joined: Mon Jul 13, 2020 6:14 pm
Location: Germany, NRW
Has thanked: 17 times
Been thanked: 112 times

Re: How to use sed to match multiple lines?

Post by HerrBert »

@Burunduk
Excellent :thumbup2:

I tried several other approaches inspired by posts from unix-/stackexchange... i didn't find a matching solution.
Thought about revers output of holdspace but TBH that's beyond my knowledge... :oops:
So here is my code that runs sed twice, but produces the desired output, too:

Code: Select all

ls -AhlpR /mnt/sda3/x86_64 | sed -ne '/^\//h;/.* \+.* \+.* \+.* \+.* \+.* \+.* \+.*spot/H;/^$/{x;s/^\//\n\//;p}' | sed -e '/./{H;$!d};x;/.* .*spot/!d'
HerrBert
Posts: 333
Joined: Mon Jul 13, 2020 6:14 pm
Location: Germany, NRW
Has thanked: 17 times
Been thanked: 112 times

Re: How to use sed to match multiple lines?

Post by HerrBert »

Also fiddled with @MochiMoppel 's code now.
Works okay with second instance of sed:

Code: Select all

# ls -AhlpRQ /mnt/sda3/x86_64 | sed -n '/^$/p;/^"\//p;/".*spot.*"/p' | sed -e '/./{H;$!d};x;/ ".*spot.*"/!d'

"/mnt/sda3/x86_64":
-rw-r--r--  1 root root 492K Jan  8 09:58 ".ff_esr_64-spotprofile.tar.gz"

"/mnt/sda3/x86_64/brave-browser":
-rw-r--r--  1 root root  26M Aug 25  2021 "brave-spotprofile.tar.gz"
-rwxr-xr-x  1 root root  453 Jul 14  2021 "save-spot"

"/mnt/sda3/x86_64/light":
-rw-r--r-- 1 root root 369K Sep 24  2021 "light-spotprofile.tar.gz"
drwx------ 8 spot spot 4,0K Jan 14 16:35 "spot"/

"/mnt/sda3/x86_64/min-browser":
-rw-r--r--  1 root root 3,4M Nov 27 16:24 "min-spotprofile-221127-180223.tar.gz"
-rw-r--r--  1 root root 3,3M Nov 27 18:02 "min-spotprofile.tar.gz"
-rwxr-xr-x  1 root root  416 Aug 26  2021 "save-spot"

"/mnt/sda3/x86_64/palemoon":
-rw-r--r--  1 root root  90K Aug 20  2022 "pm64-spotprofile.tar.gz"

"/mnt/sda3/x86_64/seamonkey":
-rw-r--r--  1 root root  4,9M Apr  6  2022 "seamonkey-spotprofile.tar.gz"

"/mnt/sda3/x86_64/slimjet":
-rwxr-xr-x  1 root root  457 Aug 13  2021 "save-spot"
# 

Looks much cleaner than my roundabout way ;)

My use case is a bit different. I have huge text files with unquoted output of ls which will be the source for sed
to find/compare files from other backup drives. The piped ls command was just intended to give a reproducible
example.
I have to admit that the -Q option to ls is very useful for similar scenarios.

User avatar
MochiMoppel
Posts: 1123
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 361 times

Re: How to use sed to match multiple lines?

Post by MochiMoppel »

Burunduk wrote: Mon Mar 27, 2023 1:35 pm

The MochiMoppel's code is too readable!

I've already been accused of many things, but this is a first :lol:
Your code is brilliant.

@HerrBert I've simplified my code a bit. I also added a second sed instance, but unlike you and Burunduk tried to avoid the hold space, Just can't remember how it works and each time have to consult the manual :cry:

This should work too The second sed may not always work :o :

Code: Select all

ls -AhlpRQ /mnt/sda3/x86_64 | sed -n '
/^$/p            #keep empty lines
/:$/p            #keep headers (lines that end with colon)
/".*spot.*"$/p   #keep files/directories that contain spot
' | sed '
$,/:$/d          #delete last line when it ends with colon (= orphan header). Following code would not delete it.
N                #treat 2 lines as 1 
/:\n$/d          #delete any remaining orphan headers (they end with colon and newline)
s/"//g           #replace any quotation marks with nothing (i.e  delete them)
'

The code assumes that matched filenames contain no quotation marks. If they do then the -Q option would have escaped them and the final replace command needs to be modified.

[EDIT] Added missing $ in the first sed
/".*spot.*"$/p

Last edited by MochiMoppel on Wed Mar 29, 2023 4:47 am, edited 2 times in total.
some1
Posts: 71
Joined: Wed Aug 19, 2020 4:32 am
Has thanked: 17 times
Been thanked: 11 times

Re: How to use sed to match multiple lines?

Post by some1 »

Could it be - that if sed-code is readable -its somewhat dubious?

@HerrBert ,@MochiMoppel
Try
make a dir /mnt/sda3/x86_64/spottify
create a file in spottify named badspot
Then run the codepieces.

Furthermore - when I tried MochiMoppels latest-
isolated header/path-items were NOT deleted.

Budunkus writable code seem ok/Excellent.
A simple awk-program also works well - albeit slower
and less challenging.

Anyway - a nice thread.

User avatar
MochiMoppel
Posts: 1123
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 361 times

Re: How to use sed to match multiple lines?

Post by MochiMoppel »

some1 wrote: Wed Mar 29, 2023 2:52 am

Could it be - that if sed-code is readable -its somewhat dubious?

Indeed :lol:
That's the fun with simple code. You never know when the 💩‎ hits the fan.

I added a single dollar. @HerrBert's code should work now , mine still doesn't :twisted:
Anyway, thanks for testing :thumbup2:

Burunduk
Posts: 244
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 122 times

Re: How to use sed to match multiple lines?

Post by Burunduk »

MochiMoppel wrote: Wed Mar 29, 2023 4:24 am

mine still doesn't :twisted:

It works, with a couple of small changes:

Code: Select all

ls -AhlpRQ /mnt/sda3/x86_64 | sed -n '
/^$/p            #keep empty lines
/:$/p            #keep headers (lines that end with colon)
/".*spot.*"$/p   #keep files/directories that contain spot
' | sed '
${/:$/d}         #instead of $,/:$/d [1] delete last line when it ends with colon (= orphan header). Following code would not delete it.
$!{/./N}         #instead of N [2] treat 2 lines as 1
/:\n$/d          #delete any remaining orphan headers (they end with colon and newline)
s/"\(.*\)"/\1/mg #instead of s/"//g [3] replace outer quotation marks with nothing (i.e  delete them)
'

1. , is not && : sed '1,/foo/d' - delete a range of lines from the first one to the one containing "foo". If only the first line contains "foo", all the lines are deleted.

2a. $! - if no lines left to append, sed quits without running the remaining commands.

2b. /./ - if a block contains an even number of lines, the next line is empty. N then appends the next header to it and the pattern space contains \n at the start, so lonely headers are deleted or not depending on parity.

3. For filenames with quotes. m here - don't match \n.

User avatar
MochiMoppel
Posts: 1123
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 361 times

Re: How to use sed to match multiple lines?

Post by MochiMoppel »

Burunduk wrote: Wed Mar 29, 2023 6:12 am

It works, with a couple of small changes:
<snip>
s/"\(.*\)"/\1/mg #instead of s/"//g [3] replace outer quotation marks with nothing (i.e delete them)
<snip>
3. For filenames with quotes. m here - don't match \n.

Where did you find the m command? Through trial and error I can see what it does but I can't find it in the GNU manual.

A simple mind like me would think that it does the same as s/[^\\]"//g . Seems to work, though I'm cautious to make such claims.
This would change blabla "file\"name" to blabla file\"name
And then there is still the job to remove the escape character from file\"name
My last lines would then look like this:

Code: Select all

s/[^\\]"//g  # replace any quotation mark, unless it is escaped
s/\\"/"/g    # remove preceding escape character from escaped quotation marks
Burunduk
Posts: 244
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 122 times

Re: How to use sed to match multiple lines?

Post by Burunduk »

MochiMoppel wrote: Thu Mar 30, 2023 2:41 am

Code: Select all

s/[^\\]"//g  # replace any quotation mark, unless it is escaped

You probably mean s/(?<!\\)"//g but sed doesn't support this. Could be s/\([^\]\)"/\1/g but there is no "blabla" in a header, so s/\(^\|[^\]\)"/\1/g - write-only again.

In s/"\(.*\)"/\1/mg, m is a modifier to the s command. This is why it's not in the index. A dot doesn't match \n characters with it.

Code: Select all

root# echo $'"dir":\nbla bla "file\\"name"' | sed 's/[^\]"//g' # \ is not special inside []
"di:
bla blafile\"nam
root# echo $'"dir":\nbla bla "file\\"name"' | perl -pe 's/(?<!\\)"//g' # [OK]
dir:
bla bla file\"name
root# echo $'"dir":\nbla bla "file\\"name"' | sed 's/\([^\]\)"/\1/g'
"dir:
bla bla file\"name
root# echo $'"dir":\nbla bla "file\\"name"' | sed 's/\(^\|[^\]\)"/\1/g' # [OK]
dir:
bla bla file\"name
root# echo $'"dir":\nbla bla "file\\"name"' | sed 'N;s/"\(.*\)"/\1/g'
dir":
bla bla "file\"name
root# echo $'"dir":\nbla bla "file\\"name"' | sed 'N;s/"\(.*\)"/\1/mg' # [OK]
dir:
bla bla file\"name

Is it necessary to remove remaining '\' characters? The file\"name can be used as is: rm file\"name and filenames can contain other weird chars, for example a tab: "file\tname" will become "filetname". Or is it at all a bad idea?

Speaking of modifiers, this one is crazy:

Code: Select all

root# echo $'"dir":\nbla bla "file\\"name"' | sed 's/.*/echo &/e' # [not OK!]
dir:
bla bla file"name
root# echo $'$(whoami)' | sed 's/.*/echo "&"/e' # runs whoami
root
HerrBert
Posts: 333
Joined: Mon Jul 13, 2020 6:14 pm
Location: Germany, NRW
Has thanked: 17 times
Been thanked: 112 times

Re: How to use sed to match multiple lines and only print matches in paragraphs?

Post by HerrBert »

Another approach:

Code: Select all

sed -e '/^\//h;{/\(.*[^ ] \+\)\{4\}.*spot/H;/^$/!d}; x; /\(.*[^ ] \+\)\{4\}.*spot/!d;s/^\//\n\//'

Many thanks again to @Burunduk for grouping code :thumbup2:
Annotated:

Code: Select all

sed -e '
/^\//h				# copy to holdspace if line begins with /
{				# start cycle
/\(.*[^ ] \+\)\{4\}.*spot/H	# append line to holdspace if searchpattern is found
/^$/!d				# do not delete patternspace if empty line is found, otherwise restart cycle
}				# continue if patternspace was deleted
x				# exchange holdspace and patternspace
/\(.*[^ ] \+\)\{4\}.*spot/!d	# do not delete (=print) patternspace if searchpattern is found
s/^\//\n\//			# replace leading / with newline/
' <<<"$(ls -AhlpR /mnt/sda3/x86_64)"

What causes me a headache is the conditional loop (cycle) using {..;..} and the question when the condition
becomes true and sed continues with following commands...

I feel like i don't want to flag this solved... ;)

Burunduk
Posts: 244
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 6 times
Been thanked: 122 times

Re: How to use sed to match multiple lines?

Post by Burunduk »

This code doesn't use the hold space but uses a loop to illustrate how it works in sed (it's not been thoroughly tested). More commands can be used for loops but b is the simplest. It branches unconditionally so the condition should be provided as an address: /regex/b[label] - if the pattern space contains regex then branch [to a label].

Code: Select all

ls -AhlpR /mnt/sda3/x86_64 | sed '
/^\//N                    # If the line starts with /, it is a header - keep it and append the next line.
:a                        # This is a label. When we branch here, we do not wipe the pattern space and do not read the next line.
/^\(.* \b\)\{4\}.*spot/Mb # If "spot" is here, branches to the start. b without a label starts a new cycle (prints the line and reads the next one).
/\n/{                     # If the pattern space contains 2 lines...
s/\n..*//                 # ... try to delete the second one. It must have at least 1 character.
Tb                        # The previous s command fails if an empty line was added. If it happens, the T command branches to :b.
N;ba                      # Else append the next line and branch to :a to see if the added line is "right".
}                         # Endif. {...;...} groups commands for one address. If an address does not match, the whole group is skipped.
:b                        # A label for the T command. A similar sed command t branches if a previous s command finds a match and replaces something.
/^$/!d                    # Delete everything that has not been processed before. Except the empty lines.
'

@MochiMoppel: Have you found the description of the sed command modifiers? There is one more place where they can appear (besides s): /.../Ms/.../.../Ig (chapters 3.3 and 4.3 of the manual).

Edit: modified the sed command.

User avatar
MochiMoppel
Posts: 1123
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 17 times
Been thanked: 361 times

Re: How to use sed to match multiple lines?

Post by MochiMoppel »

Burunduk wrote: Sun Apr 02, 2023 4:11 pm

@MochiMoppel: Have you found the description of the sed command modifiers? There is one more place where they can appear (besides s): /.../Ms/.../.../Ig (chapters 3.3 and 4.3 of the manual).

Yes, thank you. Found it when searching in the manual for modifier. I shouldn't have delved into this stuff in the first place... at least not so late at night and definitely not after a can of "Suntory Kinmugi Lager". I felt a bit - how can I say? - sedated.

Post Reply

Return to “Programming”