Page 1 of 1
text splitting problem
Posted: Thu Nov 16, 2023 3:38 am
by MochiMoppel
I'm trying to split a string of space delimited words so that each word ends up on a separate line , but substrings enclosed in quotes or brackets should be treated as one word, even if they contain multiple words.
Example:
Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod
should be converted into
Lorem
'ipsum dolor sit'
amet
'consecur'
adipis
[elit sed do]
[eius]
mod
Is there an elegant way to achieve this? So far I've come up with a solution that seems to work, but I find it a bit clumsy:
Code: Select all
string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
for n in $string;do
case $n in
'['*']'|"'"*"'") # [eius] and 'consecur'
echo "$n"
;;
'['*|*']'|*"'"*)
if [[ -z $iscompound ]];then # [elit and 'ipsum
iscompound=1
sub+=$n' '
else # do] andr sit'
sub+=$n
echo "$sub"
[[ $n =~ ("'"|']') ]] && iscompound= sub=
fi
;;
*)
[[ $iscompound ]] && sub+=$n' ' || echo "$n" # dolor or Lorem
;;
esac
done
Is there a better way?
Re: text splitting problem
Posted: Thu Nov 16, 2023 8:33 am
by puppy_apprentice
Specifically for this example:
Code: Select all
x="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod";echo $x | tr "'" "\n" | tr "[" "\n" | tr "]" "\n"
But this is not exactly what you want ![Wink ;)](./images/smilies/icon_e_wink.gif)
Icon has better functions for such things.
Re: text splitting problem
Posted: Thu Nov 16, 2023 10:35 am
by HerrBert
Also specifically for this example:
Code: Select all
string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
i=0
for n in $string; do
[ "${n::1}" = "'" -o "${n::1}" = "[" ] && i=1
[ "${n: -1:1}" = "'" -o "${n: -1:1}" = "]" ] && i=0
[ $i -eq 0 ] && echo $n || echo -n "$n "
done
[edit] shorter:
Code: Select all
string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
i=0
for n in $string; do
[[ ${n::1} = [\[\'] ]] && i=1
[[ ${n: -1:1} = [\]\'] ]] && i=0
[ $i -eq 0 ] && echo $n || echo -n "$n "
done
(still learning this crazy sh*t
)
Re: text splitting problem
Posted: Thu Nov 16, 2023 11:32 am
by MochiMoppel
Very clever ![Thumbup :thumbup:](./images/smilies/thumbup.gif)
Even shorter and all lines same lenght ![Laughing :lol:](./images/smilies/icon_lol.gif)
Code: Select all
string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
i=0
for n in $string; do
[[ ${n::1} = [[\'] ]] && ((i++))
[[ ${n: -1:1} = []\'] ]] && ((i++))
((i%2)) && echo -n "$n " || echo $n
done
Re: text splitting problem
Posted: Thu Nov 16, 2023 12:07 pm
by HerrBert
TBH, ((i++)) was my first attempt, but at least i thought there is no math needed
Very nice exercise ![Thumbup :thumbup:](./images/smilies/thumbup.gif)
Re: text splitting problem
Posted: Thu Nov 16, 2023 12:21 pm
by MochiMoppel
HerrBert wrote: ↑Thu Nov 16, 2023 12:07 pmTBH, ((i++)) was my first attempt, but at least i thought there is no math needed
Agreed! Down with Math ! ![Twisted Evil :twisted:](./images/smilies/icon_twisted.gif)
Code: Select all
string="Lorem 'ipsum dolor sit' 'consecur' adipis [elit sed do] [eius] mod"
for n in $string; do
[[ $n = [[\']* ]] && i=1
[[ $n = *[]\'] ]] && i=0
((i)) && echo -n "$n " || echo $n
done
Re: text splitting problem
Posted: Thu Nov 16, 2023 3:20 pm
by puppy_apprentice
Code: Select all
string="Lorem 'ipsum dolor sit' 'consecur' adipis [elit sed do] [eius] mod *hello world*"
for n in $string
do
[[ $n = [$1]* ]] && i=1
[[ $n = *[$2] ]] && i=0
((i)) && echo -n "$n " || echo $n
done
Re: text splitting problem
Posted: Fri Nov 17, 2023 8:36 am
by MochiMoppel
Math is back ![Welcome :welcome:](./images/smilies/welcome.gif)
@HerrBert Still based on your concept I made the snippet more versatile. The [...] and '...' compounds may now have preceding and/or trailing strings. This makes it possible to filter regex patterns (which after all is the purpose of my exercise).
Code: Select all
string="abc *[\ ]* ^[0-9].* $'a b c' $'\n'"
for n in $string; do
[[ $n = *[[\'\]]* ]] && ((i++)) # matches ^[0-9].* or $'a or ]*
[[ $n = *[[\']*[]\']* ]] && i=0 # matches ^[0-9].* but not $'a or ]*
((i%2)) && echo -n "$n " || echo $n
done
Output:
Code: Select all
abc
*[\ ]*
^[0-9].*
$'a b c'
$'\n'
Re: text splitting problem
Posted: Sat Dec 02, 2023 8:48 pm
by puppy_apprentice
Code: Select all
arr_of_strings=(Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod)
echo ${arr_of_strings[@]}
for (( i=0; i<=${#arr_of_strings[@]};i++ ))
do
echo ${arr_of_strings[i]}
done
Code: Select all
Lorem ipsum dolor sit amet consecur adipis [elit sed do] [eius] mod
Lorem
ipsum dolor sit
amet
consecur
adipis
[elit sed do]
[eius]
mod
Code: Select all
arr_of_strings=(Lorem \'ipsum dolor sit\' amet \'consecur\' adipis [elit sed do] [eius] mod *asterisk check*)
echo ${arr_of_strings[@]}
for (( i=0; i<=${#arr_of_strings[@]};i++ ))
do
echo ${arr_of_strings[i]}
done
Code: Select all
Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod *asterisk check*
Lorem
'ipsum
dolor
sit'
amet
'consecur'
adipis
[elit sed do]
[eius]
mod
*aterisk
check*
Arrays can be used only in specific situations.
Re: text splitting problem
Posted: Tue Dec 05, 2023 5:57 am
by MochiMoppel
puppy_apprentice wrote: ↑Sat Dec 02, 2023 8:48 pmArrays can be used only in specific situations.
at least not here ![Mr. Green :mrgreen:](./images/smilies/icon_mrgreen.gif)
Re: text splitting problem
Posted: Tue Dec 05, 2023 6:39 am
by pp4mnklinux
Only a suggestion using the 'awk' command:
Code: Select all
string="Lorem 'ipsum dolor sit' amet 'consecur' adipis [elit sed do] [eius] mod"
echo "$string" | awk -v RS="[ \t\n]+" '{gsub(/^'"'"'(.*)'"'"'$/, "\1", $1); print $1}'
As I said...only a suggestion, but if I understood you correctly, it is possible this help you as general solution:
Code: Select all
#!/bin/bash
split_string() {
local input_string="$1"
local result=""
local current_word=""
while IFS= read -rn1 char; do
case "$char" in
' '|$'\n'|$'\t') # Space, newline, or tab
if [[ -n $current_word ]]; then
result+="$current_word"$'\n'
current_word=""
fi
;;
"'")
in_quote=true
current_word+="$char"
;;
"[" | "]")
in_bracket=true
current_word+="$char"
;;
*)
current_word+="$char"
;;
esac
done <<< "$input_string"
if [[ -n $current_word ]]; then
result+="$current_word"$'\n'
fi
echo "$result"
}
Re: text splitting problem
Posted: Thu Dec 07, 2023 7:39 am
by MochiMoppel
pp4mnklinux wrote: ↑Tue Dec 05, 2023 6:39 am if I understood you correctly, it is possible this help you as general solution
Thanks for trying to help. It obviously fails to meet the requirements but it adds to the list of approaches that do not work. Learning from mistakes is almost as much fun as finding a working solution - at least that's what I try to believe when I again end up in a dead-end street of the coding maze ![Laughing :lol:](./images/smilies/icon_lol.gif)
Re: text splitting problem
Posted: Fri Dec 08, 2023 8:03 am
by greengeek
Here is what Google Bard suggested:
Code: Select all
#!/bin/bash
# Check if input file is provided
if [ -z "$1" ]; then
echo "Please provide the input file path as an argument."
exit 1
fi
# Define the text file path
input_file="$1"
# Define output filename with ".processed" extension
output_file="${input_file}.processed"
# Initialize variables
current_buffer=""
within_brackets=false
within_quotes=false
# Process each character in the input file
while IFS= read -r -n1 char; do
# Check if within brackets or quotes
if [[ "$char" == "[" ]]; then
within_brackets=true
elif [[ "$char" == "]" ]]; then
within_brackets=false
elif [[ "$char" == "\"" ]]; then
within_quotes=not $within_quotes
fi
# Check if character is space and not within brackets or quotes
if [[ "$char" == " " && ! "$within_brackets" && ! "$within_quotes" ]]; then
# Add current buffer to output file with newline
echo "$current_buffer" >> "$output_file"
# Reset current buffer
current_buffer=""
else
# Append character to current buffer
current_buffer="$current_buffer$char"
fi
done < "$input_file"
# Ensure last buffer is written to output file
if [[ ! -z "$current_buffer" ]]; then
echo "$current_buffer" >> "$output_file"
fi
echo "Processed file saved to: $output_file"
Doesn't work.
No idea how Google got so wealthy...
![Cool 8-)](./images/smilies/icon_cool.gif)
(Maybe i am as successful telling Bard what to do as i am telling my wife what to do... Skillset lacking..)
Here is what I asked Bard (maybe I misunderstood the question)
Please write a bash script to process a text file and convert whitespace to linefeed except within brackets or single quotes or double quotes
Re: text splitting problem
Posted: Mon Dec 11, 2023 7:11 am
by MochiMoppel
@greengeek
I asked Bard exactly the same question. For me he came up with a different suggestion:
Code: Select all
#!/bin/bash
# Check if a character is inside a quote
is_inside_quote () {
local char="$1"
local in_single_quote="false"
local in_double_quote="false"
# Check for single quote
if [[ "$char" == "'" ]]; then
in_single_quote="$in_single_quote"
in_double_quote="false"
elif [[ "$in_single_quote" == "false" && "$char" == '"' ]]; then
in_double_quote="$in_double_quote"
in_single_quote="false"
fi
# Check for bracket
if [[ "$char" == '[' ]]; then
in_bracket="true"
elif [[ "$in_bracket" == "true" && "$char" == ']' ]]; then
in_bracket="false"
fi
# Return true if inside any quote or bracket
if [[ "$in_single_quote" == "true" || "$in_double_quote" == "true" || "$in_bracket" == "true" ]]; then
return 0
else
return 1
fi
}
# Input file path
input_file="$1"
# Output file path (optional)
output_file="${2:-$input_file.out}"
# Check if input file exists
if [[ ! -f "$input_file" ]]; then
echo "Error: Input file '$input_file' does not exist."
exit 1
fi
# Open output file for writing
exec > "$output_file"
# Process each character in the input file
while read -r -n 1 char; do
if [[ "$char" == " " || "$char" == "\t" ]]; then
if ! is_inside_quote "$char"; then
echo
fi
else
echo -n "$char"
fi
done < "$input_file"
echo
exit 0
Needless to say that it doesn't work either. Even worse than your Bard because he fails to insert newline characters.
But the comments are nice.
Interestingly no syntax errors in both attempts and in your case funny mistakes. If within_quotes=false then he might expect that within_quotes=not $within_quotes
(syntactically correct!) results in within_quotes=true. That's not how Bash works but it's how humans think. And this means that he can have hardly copied this stuff from a serious code source.
I also checked his "Draft 2" and "Draft 3" alternatives. Wouldn't even start because of multiple syntax errors ![Laughing :lol:](./images/smilies/icon_lol.gif)