Page 1 of 1
Parsing how question
Posted: Mon Nov 04, 2024 6:36 pm
by fredx181
Trying to get some results in parsing (with bash) something like this:
Code: Select all
"html_url": "https://github.com/whoamI/repo1",
"description": "null",
"fork": false,
--
"html_url": "https://github.com/whoamI/repo2",
"description": "blah",
"fork": true,
--
"html_url": "https://github.com/whoamI/repo3",
"description": null,
"fork": false,
--
"html_url": "https://github.com/whoamI/repo4",
"description": "null",
"fork": true,
--
What I would like the outcome to be is: (only from where : "fork": false,
):
Code: Select all
"html_url": "https://github.com/whoamI/repo1",
"html_url": "https://github.com/whoamI/repo3",
Tried some ways with a while read loop, but can't figure out how.
Re: Parsing how question
Posted: Tue Nov 05, 2024 12:44 am
by MochiMoppel
fredx181 wrote: Mon Nov 04, 2024 6:36 pm
Trying to get some results in parsing (with bash)
You mean solely with bash shell commands?
OK, but then let's increase fun and speed and make it also compatible with ash and even dash:
Code: Select all
#! /bin/ash
OLD='
"html_url": "https://github.com/whoamI/repo1",
"description": "null",
"fork": false,
--
"html_url": "https://github.com/whoamI/repo2",
"description": "blah",
"fork": true,
--
"html_url": "https://github.com/whoamI/repo3",
"description": null,
"fork": false,
--
"html_url": "https://github.com/whoamI/repo4",
"description": "null",
"fork": true,
--'
LF="
"
while : ; do
NEW=${OLD%$LF*description*false*}
[ "$NEW" = "$OLD" ] && break || OLD=$NEW
HTM=\"htm${NEW##*htm}$LF$HTM
done
echo "$HTM"
This would also work with bash/ash/dash but uses sed:
Code: Select all
echo $OLD | sed 's/false,/&\n/g' | sed -rn '/false,$/ s/(.*htm)(.*\/[^,]*)(.*)/\"htm\2,/p'
Requires bash, uses tac:
Code: Select all
echo "$OLD" | tac |
while read -r LINE; do
[[ $LINE = *false* ]] && FOUND=1
[[ $FOUND$LINE = 1*htm* ]] && echo "$LINE" && FOUND=
done
Note that this writes resulting lines in reverse order. Not an elegant solution anyway
Re: Parsing how question
Posted: Tue Nov 05, 2024 1:07 pm
by fredx181
Many thanks ! @MochiMoppel Works very well.
I wouldn't have come to such compact code in a 100 years !
Re: Parsing how question
Posted: Tue Nov 05, 2024 5:33 pm
by puppy_apprentice
I like grep:
Code: Select all
grep -B2 "\"fork\": false," data.txt | grep "html_url"
data.txt:
Code: Select all
"html_url": "https://github.com/whoamI/repo1",
"description": "null",
"fork": false,
--
"html_url": "https://github.com/whoamI/repo2",
"description": "blah",
"fork": true,
--
"html_url": "https://github.com/whoamI/repo3",
"description": null,
"fork": false,
--
"html_url": "https://github.com/whoamI/repo4",
"description": "null",
"fork": true,
--
Re: Parsing how question
Posted: Tue Nov 05, 2024 11:54 pm
by Burunduk
MochiMoppel wrote: Tue Nov 05, 2024 12:44 am
This would also work with bash/ash/dash but uses sed:
Code: Select all
echo $OLD | sed 's/false,/&\n/g' | sed -rn '/false,$/ s/(.*htm)(.*\/[^,]*)(.*)/\"htm\2,/p'
Just a note: this snippet works because the echo command outputs the whole thing as a single line.
For a multi-line input this sed command could be used:
Code: Select all
echo "$OLD" | sed -n '/html_url/h;/false,/{g;p}'
However, sed is not a json parser.
As this is a "parsing how" question, maybe jq is a better answer.
For example, if you have something like this:
Code: Select all
bat repos.json
───────┬─────────────────────────────────────────────────────────────────
│ File: repos.json
───────┼─────────────────────────────────────────────────────────────────
1 │ [
2 │ {
3 │ "html_url": "https://github.com/whoamI/repo1",
4 │ "description": "null",
5 │ "fork": false
6 │ },
7 │ {
8 │ "html_url": "https://github.com/whoamI/repo2",
9 │ "description": "blah",
10 │ "fork": true
11 │ },
12 │ {
13 │ "html_url": "https://github.com/whoamI/repo3",
14 │ "description": null,
15 │ "fork": false
16 │ },
17 │ {
18 │ "html_url": "https://github.com/whoamI/repo4",
19 │ "description": "null",
20 │ "fork": true
21 │ }
22 │ ]
───────┴─────────────────────────────────────────────────────────────────
you can run
Code: Select all
jq '.[] | select(.fork==false) | .html_url' repos.json
to get
Code: Select all
"https://github.com/whoamI/repo1"
"https://github.com/whoamI/repo3"