text splitting problem

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

text splitting problem

Post by MochiMoppel »

I'm trying to split a string of space delimited words so that each word ends up on a separate line , but substrings enclosed in quotes or brackets should be treated as one word, even if they contain multiple words.

Example:
Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod

should be converted into
Lorem
'ipsum dolor sit'
amet
'consecur'
adipis
[elit sed do]
[eius]
mod

Is there an elegant way to achieve this? So far I've come up with a solution that seems to work, but I find it a bit clumsy:

Code: Select all

string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
for n in $string;do
    case $n in
        '['*']'|"'"*"'")                    # [eius] and 'consecur'
            echo "$n" 
            ;;
        '['*|*']'|*"'"*)
            if [[ -z $iscompound ]];then    # [elit and 'ipsum
                iscompound=1
                sub+=$n' '
            else                            # do] andr sit'
                sub+=$n
                echo "$sub"
                [[ $n =~ ("'"|']') ]] && iscompound= sub=
            fi
            ;;
        *)
            [[ $iscompound ]] && sub+=$n' ' || echo "$n" # dolor or Lorem
            ;;
    esac
done

Is there a better way?

User avatar
puppy_apprentice
Posts: 662
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 4 times
Been thanked: 108 times

Re: text splitting problem

Post by puppy_apprentice »

Specifically for this example:

Code: Select all

x="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod";echo $x | tr "'" "\n" | tr "[" "\n" | tr "]" "\n"

But this is not exactly what you want ;)

Icon has better functions for such things.

HerrBert
Posts: 340
Joined: Mon Jul 13, 2020 6:14 pm
Location: Germany, NRW
Has thanked: 18 times
Been thanked: 114 times

Re: text splitting problem

Post by HerrBert »

Also specifically for this example:

Code: Select all

string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
i=0
for n in $string; do
	[ "${n::1}" = "'" -o "${n::1}" = "[" ] && i=1
	[ "${n: -1:1}" = "'" -o "${n: -1:1}" = "]" ] && i=0
	[ $i -eq 0 ] && echo $n || echo -n "$n "
done

[edit] shorter:

Code: Select all

string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
i=0
for n in $string; do
	[[ ${n::1} = [\[\'] ]] && i=1
	[[ ${n: -1:1} = [\]\'] ]] && i=0
	[ $i -eq 0 ] && echo $n || echo -n "$n "
done

(still learning this crazy sh*t :oops: )

User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

Re: text splitting problem

Post by MochiMoppel »

Very clever :thumbup:

Even shorter and all lines same lenght :lol:

Code: Select all

string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
i=0
for n in $string; do
	[[ ${n::1}    = [[\'] ]] && ((i++))
	[[ ${n: -1:1} = []\'] ]] && ((i++))
	((i%2)) && echo -n "$n " || echo $n
done
HerrBert
Posts: 340
Joined: Mon Jul 13, 2020 6:14 pm
Location: Germany, NRW
Has thanked: 18 times
Been thanked: 114 times

Re: text splitting problem

Post by HerrBert »

:lol:
TBH, ((i++)) was my first attempt, but at least i thought there is no math needed

Very nice exercise :thumbup:

User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

Re: text splitting problem

Post by MochiMoppel »

HerrBert wrote: Thu Nov 16, 2023 12:07 pm

TBH, ((i++)) was my first attempt, but at least i thought there is no math needed

Agreed! Down with Math ! :twisted:

Code: Select all

string="Lorem 'ipsum dolor sit' 'consecur' adipis [elit sed do] [eius] mod"
for n in $string; do
  [[ $n =  [[\']* ]] && i=1
  [[ $n = *[]\']  ]] && i=0
  ((i)) && echo -n "$n " || echo $n
done
User avatar
puppy_apprentice
Posts: 662
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 4 times
Been thanked: 108 times

Re: text splitting problem

Post by puppy_apprentice »

Code: Select all

string="Lorem 'ipsum dolor sit' 'consecur' adipis [elit sed do] [eius] mod *hello world*"
for n in $string
do
  [[ $n =  [$1]* ]] && i=1
  [[ $n = *[$2]  ]] && i=0
  ((i)) && echo -n "$n " || echo $n
done

test.sh \*[\' ]\*\'

User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

Re: text splitting problem

Post by MochiMoppel »

Math is back :welcome:

@HerrBert Still based on your concept I made the snippet more versatile. The [...] and '...' compounds may now have preceding and/or trailing strings. This makes it possible to filter regex patterns (which after all is the purpose of my exercise).

Code: Select all

string="abc *[\ ]*  ^[0-9].* $'a b c' $'\n'"
for n in $string; do
  [[ $n = *[[\'\]]*     ]] && ((i++)) # matches ^[0-9].* or $'a or ]*
  [[ $n = *[[\']*[]\']* ]] && i=0     # matches ^[0-9].* but not  $'a or ]*
  ((i%2)) && echo -n "$n " || echo $n
done

Output:

Code: Select all

abc
*[\ ]*
^[0-9].*
$'a b c'
$'\n'
User avatar
puppy_apprentice
Posts: 662
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 4 times
Been thanked: 108 times

Re: text splitting problem

Post by puppy_apprentice »

Code: Select all

arr_of_strings=(Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod)
echo ${arr_of_strings[@]}
for (( i=0; i<=${#arr_of_strings[@]};i++ ))
do
	echo ${arr_of_strings[i]}
done

Code: Select all

Lorem ipsum dolor sit amet consecur adipis [elit sed do] [eius] mod
Lorem
ipsum dolor sit
amet
consecur
adipis
[elit sed do]
[eius]
mod

Code: Select all

arr_of_strings=(Lorem \'ipsum dolor sit\' amet \'consecur\' adipis [elit sed do] [eius] mod *asterisk check*)
echo ${arr_of_strings[@]}
for (( i=0; i<=${#arr_of_strings[@]};i++ ))
do
	echo ${arr_of_strings[i]}
done

Code: Select all

Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod *asterisk check*
Lorem
'ipsum
dolor
sit'
amet
'consecur'
adipis
[elit sed do]
[eius]
mod
*aterisk
check*

Arrays can be used only in specific situations.

User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

Re: text splitting problem

Post by MochiMoppel »

puppy_apprentice wrote: Sat Dec 02, 2023 8:48 pm

Arrays can be used only in specific situations.

at least not here :mrgreen:

pp4mnklinux
Posts: 854
Joined: Wed Aug 19, 2020 5:43 pm
Location: Edinburgh
Has thanked: 535 times
Been thanked: 234 times
Contact:

Re: text splitting problem

Post by pp4mnklinux »

Only a suggestion using the 'awk' command:

Code: Select all

string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"

echo "$string" | awk -v RS="[ \t\n]+" '{gsub(/^'"'"'(.*)'"'"'$/, "\1", $1); print $1}'

As I said...only a suggestion, but if I understood you correctly, it is possible this help you as general solution:

Code: Select all

#!/bin/bash

split_string() {
    local input_string="$1"
    local result=""
    local current_word=""

    while IFS= read -rn1 char; do
        case "$char" in
            ' '|$'\n'|$'\t')  # Space, newline, or tab
                if [[ -n $current_word ]]; then
                    result+="$current_word"$'\n'
                    current_word=""
                fi
                ;;
            "'")
                in_quote=true
                current_word+="$char"
                ;;
            "[" | "]")
                in_bracket=true
                current_word+="$char"
                ;;
            *)
                current_word+="$char"
                ;;
        esac
    done <<< "$input_string"

    if [[ -n $current_word ]]; then
        result+="$current_word"$'\n'
    fi

    echo "$result"
}


~

F96CE_XFCE_FUSILLI ====> https://puppyxfcefusilli.wordpress.com/

User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

Re: text splitting problem

Post by MochiMoppel »

pp4mnklinux wrote: Tue Dec 05, 2023 6:39 am

if I understood you correctly, it is possible this help you as general solution

Thanks for trying to help. It obviously fails to meet the requirements but it adds to the list of approaches that do not work. Learning from mistakes is almost as much fun as finding a working solution - at least that's what I try to believe when I again end up in a dead-end street of the coding maze :lol:

User avatar
greengeek
Posts: 1244
Joined: Thu Jul 16, 2020 11:06 pm
Has thanked: 372 times
Been thanked: 148 times

Re: text splitting problem

Post by greengeek »

Here is what Google Bard suggested:

Code: Select all

#!/bin/bash

# Check if input file is provided
if [ -z "$1" ]; then
  echo "Please provide the input file path as an argument."
  exit 1
fi

# Define the text file path
input_file="$1"

# Define output filename with ".processed" extension
output_file="${input_file}.processed"

# Initialize variables
current_buffer=""
within_brackets=false
within_quotes=false

# Process each character in the input file
while IFS= read -r -n1 char; do
  # Check if within brackets or quotes
  if [[ "$char" == "[" ]]; then
    within_brackets=true
  elif [[ "$char" == "]" ]]; then
    within_brackets=false
  elif [[ "$char" == "\"" ]]; then
    within_quotes=not $within_quotes
  fi

  # Check if character is space and not within brackets or quotes
  if [[ "$char" == " " && ! "$within_brackets" && ! "$within_quotes" ]]; then
    # Add current buffer to output file with newline
    echo "$current_buffer" >> "$output_file"
    # Reset current buffer
    current_buffer=""
  else
    # Append character to current buffer
    current_buffer="$current_buffer$char"
  fi
done < "$input_file"

# Ensure last buffer is written to output file
if [[ ! -z "$current_buffer" ]]; then
  echo "$current_buffer" >> "$output_file"
fi

echo "Processed file saved to: $output_file"

Doesn't work.
No idea how Google got so wealthy...
8-)
(Maybe i am as successful telling Bard what to do as i am telling my wife what to do... Skillset lacking..)

Here is what I asked Bard (maybe I misunderstood the question)

Please write a bash script to process a text file and convert whitespace to linefeed except within brackets or single quotes or double quotes

User avatar
MochiMoppel
Posts: 1145
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 19 times
Been thanked: 377 times

Re: text splitting problem

Post by MochiMoppel »

@greengeek
:lol:
I asked Bard exactly the same question. For me he came up with a different suggestion:

Code: Select all

#!/bin/bash

# Check if a character is inside a quote
is_inside_quote () {
  local char="$1"
  local in_single_quote="false"
  local in_double_quote="false"

  # Check for single quote
  if [[ "$char" == "'" ]]; then
    in_single_quote="$in_single_quote"
    in_double_quote="false"
  elif [[ "$in_single_quote" == "false" && "$char" == '"' ]]; then
    in_double_quote="$in_double_quote"
    in_single_quote="false"
  fi

  # Check for bracket
  if [[ "$char" == '[' ]]; then
    in_bracket="true"
  elif [[ "$in_bracket" == "true" && "$char" == ']' ]]; then
    in_bracket="false"
  fi

  # Return true if inside any quote or bracket
  if [[ "$in_single_quote" == "true" || "$in_double_quote" == "true" || "$in_bracket" == "true" ]]; then
    return 0
  else
    return 1
  fi
}

# Input file path
input_file="$1"

# Output file path (optional)
output_file="${2:-$input_file.out}"

# Check if input file exists
if [[ ! -f "$input_file" ]]; then
  echo "Error: Input file '$input_file' does not exist."
  exit 1
fi

# Open output file for writing
exec > "$output_file"

# Process each character in the input file
while read -r -n 1 char; do
  if [[ "$char" == " " || "$char" == "\t" ]]; then
    if ! is_inside_quote "$char"; then
      echo
    fi
  else
    echo -n "$char"
  fi
done < "$input_file"

echo

exit 0

Needless to say that it doesn't work either. Even worse than your Bard because he fails to insert newline characters.
But the comments are nice.
Interestingly no syntax errors in both attempts and in your case funny mistakes. If within_quotes=false then he might expect that within_quotes=not $within_quotes (syntactically correct!) results in within_quotes=true. That's not how Bash works but it's how humans think. And this means that he can have hardly copied this stuff from a serious code source.

I also checked his "Draft 2" and "Draft 3" alternatives. Wouldn't even start because of multiple syntax errors :lol:

Post Reply

Return to “Programming”