Parsing how question

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
User avatar
fredx181
Posts: 3248
Joined: Tue Dec 03, 2019 1:49 pm
Location: holland
Has thanked: 407 times
Been thanked: 1411 times
Contact:

Parsing how question

Post by fredx181 »

Trying to get some results in parsing (with bash) something like this:

Code: Select all

    "html_url": "https://github.com/whoamI/repo1",
    "description": "null",
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo2",
    "description": "blah",
    "fork": true,
--
    "html_url": "https://github.com/whoamI/repo3",
    "description": null,
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo4",
    "description": "null",
    "fork": true,
--

What I would like the outcome to be is: (only from where : "fork": false,):

Code: Select all

"html_url": "https://github.com/whoamI/repo1",
"html_url": "https://github.com/whoamI/repo3",

Tried some ways with a while read loop, but can't figure out how.

User avatar
MochiMoppel
Posts: 1290
Joined: Mon Jun 15, 2020 6:25 am
Location: Japan
Has thanked: 22 times
Been thanked: 476 times

Re: Parsing how question

Post by MochiMoppel »

fredx181 wrote: Mon Nov 04, 2024 6:36 pm

Trying to get some results in parsing (with bash)

You mean solely with bash shell commands?
OK, but then let's increase fun and speed and make it also compatible with ash and even dash:

Code: Select all

#! /bin/ash
OLD='
    "html_url": "https://github.com/whoamI/repo1",
    "description": "null",
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo2",
    "description": "blah",
    "fork": true,
--
    "html_url": "https://github.com/whoamI/repo3",
    "description": null,
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo4",
    "description": "null",
    "fork": true,
--'

LF="
"
while : ; do
	NEW=${OLD%$LF*description*false*}
	[ "$NEW" = "$OLD" ] && break || OLD=$NEW
	HTM=\"htm${NEW##*htm}$LF$HTM
done
echo "$HTM"

This would also work with bash/ash/dash but uses sed:

Code: Select all

echo $OLD | sed  's/false,/&\n/g' | sed -rn '/false,$/ s/(.*htm)(.*\/[^,]*)(.*)/\"htm\2,/p'

Requires bash, uses tac:

Code: Select all

echo "$OLD" | tac |
while read -r LINE; do
	[[ $LINE = *false* ]] && FOUND=1
	[[ $FOUND$LINE = 1*htm* ]] && echo "$LINE" && FOUND=
done

Note that this writes resulting lines in reverse order. Not an elegant solution anyway :thumbdown:

User avatar
fredx181
Posts: 3248
Joined: Tue Dec 03, 2019 1:49 pm
Location: holland
Has thanked: 407 times
Been thanked: 1411 times
Contact:

Re: Parsing how question

Post by fredx181 »

Many thanks ! @MochiMoppel Works very well.
I wouldn't have come to such compact code in a 100 years ! :lol:

User avatar
puppy_apprentice
Posts: 692
Joined: Tue Oct 06, 2020 8:43 pm
Location: land of bigos and schabowy ;)
Has thanked: 5 times
Been thanked: 115 times

Re: Parsing how question

Post by puppy_apprentice »

I like grep:

Code: Select all

grep -B2 "\"fork\": false," data.txt | grep "html_url"

data.txt:

Code: Select all

    "html_url": "https://github.com/whoamI/repo1",
    "description": "null",
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo2",
    "description": "blah",
    "fork": true,
--
    "html_url": "https://github.com/whoamI/repo3",
    "description": null,
    "fork": false,
--
    "html_url": "https://github.com/whoamI/repo4",
    "description": "null",
    "fork": true,
--
Burunduk
Posts: 258
Joined: Thu Jun 16, 2022 6:16 pm
Has thanked: 7 times
Been thanked: 127 times

Re: Parsing how question

Post by Burunduk »

MochiMoppel wrote: Tue Nov 05, 2024 12:44 am

This would also work with bash/ash/dash but uses sed:

Code: Select all

echo $OLD | sed  's/false,/&\n/g' | sed -rn '/false,$/ s/(.*htm)(.*\/[^,]*)(.*)/\"htm\2,/p'

Just a note: this snippet works because the echo command outputs the whole thing as a single line.

For a multi-line input this sed command could be used:

Code: Select all

echo "$OLD" | sed -n '/html_url/h;/false,/{g;p}'

However, sed is not a json parser.

As this is a "parsing how" question, maybe jq is a better answer.

For example, if you have something like this:

Code: Select all

bat repos.json
───────┬─────────────────────────────────────────────────────────────────
       │ File: repos.json
───────┼─────────────────────────────────────────────────────────────────
   1   │ [
   2   │   {
   3   │     "html_url": "https://github.com/whoamI/repo1",
   4   │     "description": "null",
   5   │     "fork": false
   6   │   },
   7   │   {
   8   │     "html_url": "https://github.com/whoamI/repo2",
   9   │     "description": "blah",
  10   │     "fork": true
  11   │   },
  12   │   {
  13   │     "html_url": "https://github.com/whoamI/repo3",
  14   │     "description": null,
  15   │     "fork": false
  16   │   },
  17   │   {
  18   │     "html_url": "https://github.com/whoamI/repo4",
  19   │     "description": "null",
  20   │     "fork": true
  21   │   }
  22   │ ]
───────┴─────────────────────────────────────────────────────────────────

you can run

Code: Select all

jq '.[] | select(.fork==false) | .html_url' repos.json

to get

Code: Select all

"https://github.com/whoamI/repo1"
"https://github.com/whoamI/repo3"
Post Reply

Return to “Programming”