AWK, variables and variable persistence

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
User avatar
cobaka
Posts: 595
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 100 times
Been thanked: 71 times

AWK, variables and variable persistence

Post by cobaka »

Woof woof:

Hello Puppians! The topic here is about AWK and persistence of variables; not Puppy at all.
At the moment, the code below does nothing except illustrate a problem I don't understand.
I suspect my lack of knowledge involves local variables vs global variables, or (in this case) variables within the 'BEGIN' part of an AWK command vs variables within the main command BODY.

Here is my code.

Code: Select all

vPA=/mnt/home/bNgo/transcript/indexing/    # The name of the folder containing my work-file
vINFIL=x_sample.txt                         # The name of my work-file.  This text file contains 7 records, each with 4 fields.
IX_KEY="blank"

awk   'BEGIN { $IX_KEY="void"; print "index keyword is: ", $IX_KEY;
	          print "re-ordering fields in raw ix file" ;\
	          printf  $IX_KEY;\
	          print "\n - - - - - - - - - - - - - - - -"}  # <<-  eoBEGIN  -<<<
# main follows 	          
       { printf "\n" $IX_KEY " Rec. nr " NR " ==> " $0 " - "; $IX_KEY=$3; printf $IX_KEY;}
# eo-main
      END { print "\n  "NR, " records processed  ", "\n --> " $IX_KEY; } '  $vPA$vINFIL
Here is the result.
# . ye-tic.awk
index keyword is: void
re-ordering fields in raw ix file
void
- - - - - - - - - - - - - - - -

157 : cucumber 1 Rec. nr 1 ==> 157 : cucumber 1 - cucumber
160 : cucumber 2 Rec. nr 2 ==> 160 : cucumber 2 - cucumber
174 : tomato 3 Rec. nr 3 ==> 174 : tomato 3 - tomato
191 : tomato 4 Rec. nr 4 ==> 191 : tomato 4 - tomato
191 : tomato 5 Rec. nr 5 ==> 191 : tomato 5 - tomato
27 : lettuce 6 Rec. nr 6 ==> 27 : lettuce 6 - lettuce
-1 : end 7 Rec. nr 7 ==> -1 : end 7 - end
7 records processed
--> end
Some words of explanation. The file 'x-sample.txt' contains 4 fields per record. For every record the items before "Rec. nr" (in the printout) are the 4 fields. So - a number, a colon (:) a vegetable and the record number (- one to seven-). The appearance of these four fields before "Rec, nr: illustrates the problem nicely, because the value of IX_KEY here MUST be null because the print-out shows every field in the record. I repeat IX_KEY (therefore) has the value $0. I expect the first word in the first line should be 'void', and after some vegetable. Immediately above the dashed line IX_KEY has the value "void". Again, at the end of the line IX_KEY has the value from $3 - the vegetable. It has that value, because I assigned that value in the command: $IX_KEY=$3;

What is going on here? I print the value of IX_KEY three times. At the beginning it is 'void' - as expected. Then (in the main loop) it is 'null' or interpreted as null and hence $0. Then, at the end of the line it has the string value of $3, but .... at the beginning of the following line - it is again $0. Hmmm .... I don't understand this!!

I must add: The code itself is in a script: ye-tic.awk. The script is in /root/my-applications/bin and I invoke the script as you see above: ie # . ye-tic.awk

Clearly I don't understand something about variable persistence (or perhaps I have two variables with the same name). I don't understand this. I read the manual about variables, and the answer is probably in there, but after 8 hours of de-bugging, it's time to call on the rest of the pack for help! Sorry that the presentation here does not line up in well-ordered columns. Woof!

TIA

cobaka

собака --> это Русский --> a dog
"c" -- say "s" - as in "see" or "scent" or "sob".

User avatar
cobaka
Posts: 595
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 100 times
Been thanked: 71 times

Re: AWK, variables and variable persistence

Post by cobaka »

Woof to all!
After mowing lawn and thinking a great deal I decided to try something strictly 'not done'.
So far, everything I read set out the structure of an awk command as:

awk ' BEGIN { setup actions}
pattern {action};
pattern {action};
and so on ....
END {closing actions} ' file reference
.....................................^ note the single quote here and at the beginning.

Enclosing the arguments for AWK inside single quotes allows the command parameters to be passed to (or thru) the shell to AWK. Generally no single quotes should be used 'inside' the opening and closing quote symbols. However, by enclosing the variable " ' $IX_KEY ' " inside double and single quotes, the AWK command operates in a completely different manner. I do not understood this completely yet, but believe this is a clue to understanding.

cobaka.

собака --> это Русский --> a dog
"c" -- say "s" - as in "see" or "scent" or "sob".

step
Posts: 576
Joined: Thu Aug 13, 2020 9:55 am
Has thanked: 63 times
Been thanked: 213 times
Contact:

Re: AWK, variables and variable persistence

Post by step »

HI @cobaka, disclaimer: I didn't read the fine details of your two posts, so my reply could be inadequate, if so my apologies; I leave it to your judgement.

There's a high-level view to your first script that needs some clarification. Your script isn't an awk script. It's a shell script that calls (runs awk). As such, your script is subject to shell syntax rules. A relevant rule is string syntax: a shell string can be delimited by single quotes. Keep that in mind as rule sh-1, our name for it. Good. Your shell script defines some variables, to which another shell syntax rule applies: $variable_name gets the value stored in variable_name. That's rule sh-2 for us. Then your shell script runs awk and provides awk with the required awk script as a shell string (rule sh-1). Within the shell string awk syntax ONLY applies. In other words, rule sh-2 is meaningless to awk. I'm saying that to point out that getting the value of any shell variable by syntax rule sh-2, $variable_name, is meaningless to awk and will not yield the expected value. Your awk script indeed references some shell variables by rules sh-2. That won't work.

You pondered the problem and discovered that by playing with single quotes you can get the value from the shell into awk. Hopefully, my clarification above helps you understand that when you play with single quote you're playing with rule sh-1 entirely within the shell scope. Awk is unaware of that. Especially then dealing with strings you have to keep tabs of to which scope (shell's or awk's) the quote you add/remove applies.

Now, two templates you could use to pass shell VALUES from shell scope to awk scope. Note I wrote values and not variables. There is no syntactic way you can pass shell variable semantics (setting values) between a shell instance and a called awk instance.

Template one (I use this most of the time)
'

Code: Select all

# Shell scope
sh_varname="value"

awk -v awk_varname="$sh_varname" '#awk script inside single-quoted shell string
BEGIN { # BEGIN is optional
	print awk_varname    # will print "value" (without double quotes)
}
	{ print awk_varname } # so will this
END { print awk_varname} # so will this
...
# this ends the awk script'    input_filename
Important: awk_varname's type is always string.

Template two (I reserve this template for special cases) - read the last line first

Code: Select all

# Shell scope
awk '#awk script inside single-quoted shell string
BEGIN { # BEGIN is optional
	print awk_varname    # will print AN EMPTY LINE because awk_varname has no value here
}
{
	print awk_varname  # will print "value"
}
END { print awk_varname } # so will this
...
# this ends the awk script'  awk_varname="value"  input_filename
Important: awk_varname's type is always string; awk_varname is defined after the BEGIN clause has finished running.

A finale note: if you decide to take the route outline in your second post, I suggest you get in the habit to double quote your shell variables, e.g.:

Code: Select all

awk 'awk script'"$shell_variable"'more awk script'
Post Reply

Return to “Programming”