Page 1 of 1

Parsing how question

Posted: Mon Nov 04, 2024 6:36 pm
by fredx181

Trying to get some results in parsing (with bash) something like this:

Code: Select all

    "html_url": "https://github.com/whoamI/repo1",
    "description": "null",
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo2",
    "description": "blah",
    "fork": true,
--
    "html_url": "https://github.com/whoamI/repo3",
    "description": null,
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo4",
    "description": "null",
    "fork": true,
--

What I would like the outcome to be is: (only from where : "fork": false,):

Code: Select all

"html_url": "https://github.com/whoamI/repo1",
"html_url": "https://github.com/whoamI/repo3",

Tried some ways with a while read loop, but can't figure out how.


Re: Parsing how question

Posted: Tue Nov 05, 2024 12:44 am
by MochiMoppel
fredx181 wrote: Mon Nov 04, 2024 6:36 pm

Trying to get some results in parsing (with bash)

You mean solely with bash shell commands?
OK, but then let's increase fun and speed and make it also compatible with ash and even dash:

Code: Select all

#! /bin/ash
OLD='
    "html_url": "https://github.com/whoamI/repo1",
    "description": "null",
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo2",
    "description": "blah",
    "fork": true,
--
    "html_url": "https://github.com/whoamI/repo3",
    "description": null,
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo4",
    "description": "null",
    "fork": true,
--'

LF="
"
while : ; do
	NEW=${OLD%$LF*description*false*}
	[ "$NEW" = "$OLD" ] && break || OLD=$NEW
	HTM=\"htm${NEW##*htm}$LF$HTM
done
echo "$HTM"

This would also work with bash/ash/dash but uses sed:

Code: Select all

echo $OLD | sed  's/false,/&\n/g' | sed -rn '/false,$/ s/(.*htm)(.*\/[^,]*)(.*)/\"htm\2,/p'

Requires bash, uses tac:

Code: Select all

echo "$OLD" | tac |
while read -r LINE; do
	[[ $LINE = *false* ]] && FOUND=1
	[[ $FOUND$LINE = 1*htm* ]] && echo "$LINE" && FOUND=
done

Note that this writes resulting lines in reverse order. Not an elegant solution anyway :thumbdown:


Re: Parsing how question

Posted: Tue Nov 05, 2024 1:07 pm
by fredx181

Many thanks ! @MochiMoppel Works very well.
I wouldn't have come to such compact code in a 100 years ! :lol:


Re: Parsing how question

Posted: Tue Nov 05, 2024 5:33 pm
by puppy_apprentice

I like grep:

Code: Select all

grep -B2 "\"fork\": false," data.txt | grep "html_url"

data.txt:

Code: Select all

    "html_url": "https://github.com/whoamI/repo1",
    "description": "null",
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo2",
    "description": "blah",
    "fork": true,
--
    "html_url": "https://github.com/whoamI/repo3",
    "description": null,
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo4",
    "description": "null",
    "fork": true,
--

Re: Parsing how question

Posted: Tue Nov 05, 2024 11:54 pm
by Burunduk
MochiMoppel wrote: Tue Nov 05, 2024 12:44 am

This would also work with bash/ash/dash but uses sed:

Code: Select all

echo $OLD | sed  's/false,/&\n/g' | sed -rn '/false,$/ s/(.*htm)(.*\/[^,]*)(.*)/\"htm\2,/p'

Just a note: this snippet works because the echo command outputs the whole thing as a single line.

For a multi-line input this sed command could be used:

Code: Select all

echo "$OLD" | sed -n '/html_url/h;/false,/{g;p}'

However, sed is not a json parser.

As this is a "parsing how" question, maybe jq is a better answer.

For example, if you have something like this:

Code: Select all

bat repos.json
───────┬─────────────────────────────────────────────────────────────────
       │ File: repos.json
───────┼─────────────────────────────────────────────────────────────────
   1   │ [
   2   │   {
   3   │     "html_url": "https://github.com/whoamI/repo1",
   4   │     "description": "null",
   5   │     "fork": false
   6   │   },
   7   │   {
   8   │     "html_url": "https://github.com/whoamI/repo2",
   9   │     "description": "blah",
  10   │     "fork": true
  11   │   },
  12   │   {
  13   │     "html_url": "https://github.com/whoamI/repo3",
  14   │     "description": null,
  15   │     "fork": false
  16   │   },
  17   │   {
  18   │     "html_url": "https://github.com/whoamI/repo4",
  19   │     "description": "null",
  20   │     "fork": true
  21   │   }
  22   │ ]
───────┴─────────────────────────────────────────────────────────────────

you can run

Code: Select all

jq '.[] | select(.fork==false) | .html_url' repos.json

to get

Code: Select all

"https://github.com/whoamI/repo1"
"https://github.com/whoamI/repo3"