regex string processing

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
User avatar
stemsee
Posts: 783
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 8
Has thanked: 186 times
Been thanked: 132 times

regex string processing

Post by stemsee »

I have seen several topics about processing strings with regex being featured. But I was not able to find a dedicated regex thread, So I decided to start one, if no one objects, as there are notable experts on the forum who regularly offer guidance and solutions.

In fact starting the thread is just a convenience to get my own programming needs met. Currently I am trying to implement some basic text formatting options namely; quotes, bullet,s tablature, sequenced-numbers prefix, upper to lower, lower to upper (case) - each of these to operate on selected multi-line text, either adding or removing said formatting. Well I thought these must be easy to achieve, so I earnestly began learning regex with sed and grep. However ... to cut a long story short, it is very difficult to construct a regex argument that accommodates all manner of exceptions.

Actually I achieved my goals in the most basic conditions. Such as properly ended sentences, with no special characters ... and even in this case I was able, after a few days, to find partly working solutions. So now for the finesse.

Here is the body of text which I am using to test my sed constructions on. The text is echoed -e to a file, and the operations are upon the text in that file. But piping would be preferred.

This text has instances of the targeted strings at word end and beginning and , bracketed, quoted and what not. The text is just a real example.

the function I have so far is this, and any pointers will be gratefully implemented.

Code: Select all

echo -e "$EDITS" | for i in \( \) \{ \[ \} \] \? \& \$ \# \" \! \? \* \+ \: \@ \| \' \_ \- \. \, \; \= \^ \~ \% \/; do sed "s/$i/\\$i/g"; done > "$track"/MO

export L=$(echo -e "$pattern" | grep -v '^$' | wc -l)

export cnt=1

while read line 
do
if [[ $(echo -e "$line" | grep -v '^$') != "" ]]; then
case "$BSSID" in 

tabs+) sed -i "s/$line$/^\t$line/" "$track"/MO;;
tabs-) sed -i "/$line/s/\t//" "$track"/MO;;

upper) NP=$(echo -e "$line" | tr a-z A-Z)
sed -i "s/$line/$NP/g" "$track"/MO;;

lower) NP=$(echo -e "$line" | tr A-Z a-z)
sed -i "s/$line/$NP/g" "$track"/MO;;

bullets+) sed -i -e "/$line$/s/$line$/• $line/" "$track"/MO;;
bullets-) NP=$(echo -e $line | sed -e "s/• //")
sed -i "s/$line/$NP/" "$track"/MO;;

numbers+) sed -i -e "/$line$/s/$line$/$cnt\) $line/" "$track"/MO;;
numbers-) NP=$(echo -e "$line" | sed -E "s/[0-99]\) //") 
sed -i "s/$line/$NP/" "$track"/MO;;

quotes+) case "$SCALE" in
0)[ "$cnt" -eq 1 ] && sed -i -e "s/$line$/$IDENTITY$line/" "$track"/MO
[ "$cnt" -eq "$L" ] && sed -i -e "s/$line$/$line$DOCS/" "$track"/MO;;

1)  sed -i -e "s/$line/$IDENTITY$line$DOCS/" "$track"/MO;;
esac;;

quotes-)  case "$SCALE" in
0)[ "$cnt" -eq 1 ] && sed -i -e "s/^$IDENTITY//" "$track"/MO
[ "$cnt" -eq "$L" ] && sed -i -e "/$DOCS$/s/$DOCS//" "$track"/MO;;

1)  sed -r -i -e "s/^[$IDENTITY].*[$DOCS]$/s/^$IDENTITY//" -e "/[$DOCS]$/s/$DOCS$//g" "$track"/MO;;
2) NP=$(echo -e "$line" | cut -c2- | rev | cut -c2- | rev); sed -i "s/$line/$NP/g" "$track"/MO;;
esac;;
esac
export cnt=$((cnt + 1))
fi

done <<< "$pattern"

MODIFIED=$(cat "$track"/MO) 

text

Code: Select all

Lesson Title: Modal Verbs
Comment:
Icon: tick
Objective:

Materials:


Guided Practice (15 minutes):

Independent Practice (10 minutes):

Closure (5 minutes):

Notes:
24 Modal Auxiliary Verbs Table
Introduction:

Modal verbs are a type of auxiliary verb used to express various meanings such as

	obligation
	possibility
	permission
	ability
	suggestion
	certainty
	expectation

In English, the most commonly used modal verbs are:

	 can
	 could
	 may
	 might
	 shall
	 should
	 will
	 would
	 must
	 bought to

	could is past tense of can.
	would is past tense of will.
	should is past tense of shall.
	might is past tense of may.

In this lesson, we'll review the different uses of modal verbs and how to form sentences using them.

Modal verbs are an important part of English grammar, and mastering their use can greatly
enhance your ability to communicate effectively in various situations.
By understanding the different uses of modal verbs and practicing their use in context,
you can become a more confident and skilled speaker of English.

Lesson: 1

Expressing Ability: Can and could are used to express ability in present and past tense respectively.
Example: I can swim. She could play the guitar when she was younger.

Be able to can be used instead of can or could to express ability.
Example: I am able to speak Spanish fluently.

Expressing Possibility: May and might are used to express possibility.
Example: It may rain later today. She might come to the party if she finishes work early.

Could is also used to express possibility in a less certain way.
Example: He could be at home or at work.

Expressing Obligation:
Must and have to are used to express obligation.
Example: I must finish this report by tomorrow. You have to wear a helmet when riding a motorcycle.

Should and ought to are used to express advice or strong suggestion.
Example: You should drink plenty of water every day. She ought to get more exercise.

Expressing Permission:
Can and may are used to express permission.
Example: Can I borrow your pen? May I leave the meeting early?

Could and might are used to ask for permission in a more indirect or polite way.
Example: Could I ask you a question? Might I suggest an alternative approach?

Expressing Certainty:
Must is used to express a strong sense of certainty or deduction.
Example: He must be at home because his car is in the driveway.

Should is used to express a high degree of certainty or expectation.
Example: The train should arrive at the station at 8:30.

MODALS FUNCTIONS EXAMPLES

Will	asking					possibility / suggestion

	Will you go to school?
It will rain today.You will not keep late hours at night before the exam.

Would 						requesting 
	Would you give me a pen?

Shall						asking / possibility

	Shall I do the work?

I hope I shall complete the project within a week.

Should						suggestion / seeking advice

You should walk a mile in the morning.
	Should we go for a walk?

Can						ability / possibility

The boy can speak English fluently.
We can hold a condolence meeting for his death this Sunday.

Could						ability / requesting

He could do the sum.
Could you help me to solve the problem?

May						possibility / permission / offering

	He may come here today.
	May I come in?
	May I get you a cup of tea?

Might						possibility / suggestion

His statement might be true.
You might just as well go.			certainty/strong probability / prohibition

	You must obey your teacher.
	Man must die one day.
	You must be tired after a long journey.
	We must not waste our time.

						Challenge / negative force/ interrogation
	He dare not say so. (not dares’)
	I dare you to prove that you’ve said so.
	He dare not follow you.
	Who dares to enter the room?
	Interrogation
	You need not (needn’t) come here.

Need						prohibition / interrogation

You need not (needn’t) come here.
Need he go there?

Used to habitual action in the past My father used to teach me English.

Ought to obligation strong likelihood You ought to work hard.
The lawyer ought to be able to help you.

Lesson: 2

Can and could are used to express ability in present and past tense respectively.
Example: I can swim. She could play the guitar when she was younger.


Example: I am able to speak Spanish fluently.


May and might are used to express possibility.
Example: It may rain later today. She might come to the party if she finishes work early.


Example: He could be at home or at work.


Must and have to are used to express obligation.
Example: I must finish this report by tomorrow. You have to wear a helmet when riding a motorcycle.


Example: You should drink plenty of water every day. She ought to get more exercise.


Example: Can I borrow your pen? May I leave the meeting early?


Example: Could I ask you a question? Might I suggest an alternative approach?

Must is used to express a strong sense of certainty or deduction.

Example: He must be at home because his car is in the driveway.


Example: The train should arrive at the station at 8:30.

EXERCISE:

Now, let's practice using modal verbs in sentences. Fill in the blanks with the appropriate modal verb:

I ___________ finish this project by the end of the week. (must / could / should)

She ___________ be at work right now. (must / might / can)

You ___________ turn off your phone during the movie. (should / must / can)
___________ I borrow your car tonight? (can / may / could)

We ___________ play tennis tomorrow if the weather is good. (can / may / should)

User avatar
stemsee
Posts: 783
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 8
Has thanked: 186 times
Been thanked: 132 times

Re: regex string processing

Post by stemsee »

I ___________ finish this project by the end of the week. (must / could / should)

She ___________ be at work right now. (must / might / can)

You ___________ turn off your phone during the movie. (should / must / can)
___________ I borrow your car tonight? (can / may / could)

We ___________ play tennis tomorrow if the weather is good. (can / may / should)

My code cannot insert initial tabs, or any quotes, or bullets to the text block above. Only by capturing the first character or word and adding the formatting to it .. eg I > \tI .... or I > "I ... etc and likewise with the last character could I programmatically format these lines.

User avatar
stemsee
Posts: 783
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 8
Has thanked: 186 times
Been thanked: 132 times

Re: regex string processing

Post by stemsee »

Problem with that was solved by changing sed divider '/' to '|' .... :lol:

User avatar
MochiMoppel
Posts: 1245
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 22 times
Been thanked: 446 times

Re: regex string processing

Post by MochiMoppel »

First line of code lacks more than "finesse". It doesn't work :roll:
echo -e "$EDITS" | for i in \( \) \{ \[ \} \] \? \& \$ \# \" \! \? \* \+ \: \@ \| \' \_ \- \. \, \; \= \^ \~ \% \/; do sed "s/$i/\\$i/g"; done > "$track"/MO
Pay attention to error messages. What are you trying to achieve here?

User avatar
stemsee
Posts: 783
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 8
Has thanked: 186 times
Been thanked: 132 times

Re: regex string processing

Post by stemsee »

I was trying to edit the text so that those special characters would be actually escaped in the text body, as it seemed to me that the operations were influenced by the presence of those characters. I tend to guess rather than test properly, but it seemed to make a positive difference, at first. Finally it was the sed command format that was the problem, changed from sed "s///g" to sed "s|||g" etc. the format character is whatever character follows the s ... however I tried to use scarab and it didn't work, so maybe to do with single or two byte characters. So '|' in sed will be problematic when that character appears in text.

Why does this work?

Code: Select all

string='hello how are you today'
echo $string | awk '{$((NF + 1))="*";print $6$0$6}'}'
*hello how are you today*
/code]
some1
Posts: 86
Joined: Wed Aug 19, 2020 4:32 am
Has thanked: 18 times
Been thanked: 15 times

Re: regex string processing

Post by some1 »

stemsee wrote: Thu Aug 22, 2024 5:26 am

Why does this work?

Code: Select all

string='hello how are you today'
echo $string | awk '{$((NF + 1))="*";print $6$0$6}'}'
*hello how are you today*

It does NOT work.
Reason: Sloppyness and/or lack of awk-knowledge.

Better stay with sed in your project.

------------------------------------------------
Some awk-code to play with

Code: Select all

string='hello how are you today'

Code: Select all

echo $string | awk '($NF=$NF "*") &&( $1="*" $1)'
##*hello how are you today*


echo $string | awk '$0="*" $0 "*"'
##*hello how are you today*

echo $string | awk 'BEGIN{OFS="~"}
{$0=OFS $0 OFS}1'
##~hello how are you today~

#-----------
echo $string | awk 'BEGIN{OFS="~"}
{$1=OFS $1;$0=$0 OFS}1'
##~hello~how~are~you~today~

exit
Last edited by rockedge on Thu Aug 22, 2024 2:41 pm, edited 1 time in total.
Reason: fixed BBCode for code blocks
User avatar
stemsee
Posts: 783
Joined: Sun Jul 26, 2020 8:11 am
Location: lattitude 8
Has thanked: 186 times
Been thanked: 132 times

Re: regex string processing

Post by stemsee »

some1 wrote: Thu Aug 22, 2024 9:11 am
stemsee wrote: Thu Aug 22, 2024 5:26 am

Why does this work?

Code: Select all

string='hello how are you today'
echo $string | awk '{$((NF + 1))="*";print $6$0$6}'}'
*hello how are you today*

It does NOT work.
Reason: Sloppyness and/or lack of awk-knowledge.

Better stay with sed in your project.

Good advice...but with re-format to $1-$5 instead of $0 kinda works on fatdog64-902 !
"# awk --version
GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.0, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2022 Free Software Foundation."

It is interesting that new fields can be created programmatically in sequence without knowing the value of input NF, and NF updates with each addition so new fields are always NF + 1, or can simply define new fields outright '$6="*"';

I am reading this guide to begin with trying to learn best practice. https://www.gnu.org/software/gawk/manua ... ng-Started

Your code illustrates how nuch better awk is suited, imo, to text formatting.

With regex '\b' word boundary sensitive 'look ahead/behind/around' it should be not too hard to implement soft wrapping text, expansion, justifying and maybe hyphenation ([\b][\w]-[\w][\b]/-\n/). The solutions I found so far seem sloppy (ugly results) and destructive (truncation). In my project using yad, the ;txt field has no column limit for example 80 (portrait) and 240 (landscape), also has line wrapping so visually difficult to keep track of line lengths. So when it comes to printing with 'yad --print', which has no border control function, it became necessary to re-format the code when creating a file which yad --print can operate with. I found that 'fold -s -w80' works better than other solutions, such as rich-cli, nano -S etc, which are terminal display aware. But fold still leaves saw toothed edges. Best solution must be to craft more independent (of the terminal display) code.

Attachments
xscreenshot-20240823T120816.png
xscreenshot-20240823T120816.png (34.66 KiB) Viewed 551 times
superhik
Posts: 48
Joined: Mon Jun 19, 2023 7:56 pm
Has thanked: 6 times
Been thanked: 20 times

Re: regex string processing

Post by superhik »

So this is all about wrapping?
Why didn't you write so in the first place? :lol:

Code: Select all

#!/bin/bash

wrap_width=80

wrap_text() {
    local text="$1"
    local length=${#text}
    local start=0

    while [ "$start" -lt "$length" ]; do
        local end=$((start + wrap_width))
        
        if [ "$end" -lt "$length" ]; then
            # Find the last space before wrap limit
            while [ "${text:end:1}" != " " ] && [ "$end" -gt "$start" ]; do
                end=$((end - 1))
            done
        fi

        # If no space found, wrap at the limit
        if [ "$end" -eq "$start" ]; then
            end=$((start + wrap_width))
        fi

        # Print the wrapped line
        echo "${text:start:end-start}"

        # Skip the space after wrapping
        start=$((end + 1))
    done
}

# Read from stdin and wrap
while IFS= read -r line; do
    wrap_text "$line"
done

Save, chmod +x, and run

Code: Select all

echo "Your long text here" |  ./wrap.sh

P.S. Welcome back from an early retirement

Stealing from the poor to give to the rich!
bslit - Block Splitter Custom Calendar Widget + Diary

Post Reply

Return to “Programming”