awk, curly braces and semicolons - function in the cmnd line.

For discussions about programming, and for programming questions and advice


Moderator: Forum moderators

Post Reply
User avatar
cobaka
Posts: 521
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 87 times
Been thanked: 49 times

awk, curly braces and semicolons - function in the cmnd line.

Post by cobaka »

A big 'woof' to all

I wish to understand awk. (At least to some degree of competence).

I asked 'duckduck-go' if awk was a command and a program. It appears it is both (or either).
With that trivia out of the way, I proceed to my "Q".

First, I must explain that the variables $FIPA and $INFIL are (together) a string variable that (when concatenated) will 'find' my data-file. To re-iterate: The file reference is legit.

Now consider this single-line command:

Code: Select all

 gawk ' 1   {print NR, $3}' $FIPA$INFIL  
# This works. '1' evaluates to 'true and the implied "if" is true.
Each line in the file is printed to the terminal.

Next, consider the command:

Code: Select all

gawk 'if(1)   {print NR, $3}' $FIPA$INFIL 
# This does not work.
gawk: cmd. line:1: if(1) {print NR, $3}
gawk: cmd. line:1: ^ syntax error.

Adding curly braces solves the problem-o:
# gawk '{if(1) {print NR, $3}}' $FIPA$INFIL
1
2 Holland
3 Mexico
4 France
5 Spain
6 Italy

I read this info while I tried to understand the syntax:
"Curly braces in awk enclose the action part of the pattern {action} pair. As you don't supply any pattern, the result is FALSE and nothing is acted. Without the braces, it's a pattern which on second encounter is TRUE, so the default action (print $0) is executed."

Fr. https://www.unix.com/shell-programming- ... eeded.html

I'm using gawk, from uPupBB. Maybe the syntax (in awk) is different from the para above?
At any rate, I fail to understand the function (or purpose) of curly braces and (perhaps even) the semicolon separator. Can anyone give me a link of elucidation so I may be wise in the ways of awk?

Собака
Last edited by cobaka on Mon Sep 21, 2020 7:46 am, edited 1 time in total.

собака --> это Русский --> an old dog
"so-baka" (not "co", as in coast or crib).

User avatar
garnet
Posts: 64
Joined: Tue Aug 04, 2020 2:21 pm
Location: Alexandria
Has thanked: 6 times
Been thanked: 11 times

Re: akw, curly braces and semicolons - function in the cmnd line.

Post by garnet »

The link above is for shell scripting.

Awk is serious stuff. It is a completely different language and has different syntax. Forget what you learnt about shell scripting and look at this instead: https://www.grymoire.com/Unix/Awk.html

Or read it from the original authors of awk itself: https://doc.lagout.org/programmation/AW ... nighan.pdf (213 pages). You'll end up learning how to write a compiler using awk :lol:

By the way you may want to edit the title. It's not "akw", it's "awk".

Hope that helps ^_^

User avatar
cobaka
Posts: 521
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 87 times
Been thanked: 49 times

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by cobaka »

Hello @garnet

First, thank you for correcting my spelling. I had a bad case of 'trigger' finger. 'a' and 'k' are on different hands, and so it's easier to type a-k than a-w. But akw is now awk.

You are correct. awk is difficult to learn because the syntax was written by people who understood (intuitively) the internal structure of command interpreters. They took short-cuts and assumed every-one knew (and thought) like they did. More than that, many people who post helpful notes on the 'net' have a working but incomplete knowledge of the command. Here is a comment that is quite true but incomplete.
Curly braces in awk enclose the action part of the pattern {action} pair. As you don't supply any pattern, the result is FALSE and nothing is acted. Without the braces, it's a pattern which on second encounter is TRUE, so the default action (print $0) is executed.
The above was helpful, but not sufficiently helpful to allow me to write a complete program in awk. I'm on the way, but not over the finish line yet. I think I might be there in a day or so (generally speaking). I have learned that curly braces are needed to identify the BEGIN MAIN and END parts of an awk command. Starting with a properly constructed 'shell' (I mean basic components of an awk program), with curly braces correctly positioned) the basic 'structure' works.

I got there by starting with a blank script file and progressively adding to the command and (at the same time) reading from the net. Then I added more (perhaps BEGIN/END or 'if ... else') and so on. I retained each working example by adding the comment character at the beginning of the line. I also copy and re-use the working model so my sample program becomes larger with more working examples. I can always retrieve the working example (by un-commenting each line) when the following stage does not work.

I'm making progress.
Got the document you mentioned.
Will read it soon.
Will read about writing a compiler with great interest.
Wrote one of my own design in 68HC11 assembly language.
Took me 6 weeks to complete and another 3 to test.

Tnx and best wishes from Australia
собака

собака --> это Русский --> an old dog
"so-baka" (not "co", as in coast or crib).

User avatar
Keef
Posts: 250
Joined: Tue Dec 03, 2019 8:05 pm
Has thanked: 3 times
Been thanked: 67 times

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by Keef »

You could also try here:
https://learnxinyminutes.com/docs/awk/
Got some more useful links at the bottom too.
step
Posts: 516
Joined: Thu Aug 13, 2020 9:55 am
Has thanked: 50 times
Been thanked: 184 times
Contact:

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by step »

@cobaka since you are using gawk I would recommend reading the comprehensive gawk user's manual. Be wary that there are many awk implementations - gawk is one - and they differ in substance and detail. I like gawk the best, version 4.2.0 or higher. To help you understand why that error message, you need to keep in mind that a (g)awk script (often called a "program"), consists of pairs. Each pair is a pattern followed by a command. Only the command goes inside curly braces. A pattern can be omitted. So can a command. So, all the following are models for valid scripts:

Code: Select all

pattern { commands }
pattern
            { commands }
When you omit the command you imply that the command is "{print}".
When you omit the pattern you imply that its corresponding command is to act on all input lines.
Otherwise {commands} applies to all and only the lines that match pattern.

You wrote this awk script (simplified):

Code: Select all

if(1) { commands }
In the user's manual you will read that "if(1)" can't be a pattern. That's why you got the error message. Your original line didn't follow the pattern+command model.

You fixed it - if I remember correctly - by moving "if(1)" inside the curly braces, where "if(1)" belongs. At this point your script followed the pattern+command model, specifically the one that omits the pattern. Recall that omitting the pattern means applying the command to all input lines, which is what you wanted to do in the very first place by writing "if(1)" where a pattern is expected. I hope that by now you can understand why the fixed script behaves the way you originally expected.

Let's review the first script you presented, the one that worked as is. You had "1" instead of "if(1)". Correctly, 1 is considered a pattern*. It's the kind of pattern that always matches, meaning "is true". Other patterns that always match are, e.g., /.*/, 43 (all numbers except zero), "string" (all strings except the empty string ""), and many others. One is good enough. So that script meant "for each input line apply the command inside the curly braces". That's it.

* The reason why 1 is a valid pattern is because by "pattern" awk means a "pattern expression", that is, a logical expression involving logical values, regular expression patterns, and logical operators. The gawk user's manual can explain better than me. Good reading!
User avatar
cobaka
Posts: 521
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 87 times
Been thanked: 49 times

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by cobaka »

Hello @step (and others)

Wow! Thank you for your concise and informative reply!
I understand the points you make generally, but need to take additional time to understand completely.
I note your point that awk exists as several versions. I will continue to use gawk in preference to awk.
I note that both awk and gawk are included with the uPupBB package. I can find where these commands are held (#which awk etc) and the version (#gawk --version). I have gawk 4.1.4 I note this is earlier than gawk v4.2 - the version you recommend.

I find the observation about omitted patterns and why if(1) fails very useful. I used the conditional test 'if(1) to check my understanding about 'what's going on here' with patterns.
I thought (since gawk would evaluate an expression) that I might use if(1) as a 'pattern' to return 'true' or if(0) to return 'false'. That would clarify my understanding about what's going on 'under the hood'.
(At the moment I think a pattern returns a logical 'true' or 'false' before an 'action' and (by that means) controls an action.
I find the omission of an explicit statement of test confusing at this stage. I'm sure (when I understand completely) that will vanish. The thing about bash (and all the other parts of unix/linux) is this: the original coders knew their potatoes. The trick is to understand what they did with the potatoes that they knew.
So - I was surprised when if(1) did not work.

Now I'm on a different topic: Barry K identified a key requirement of a PC based OS. That is the separation of the OS code (entirely) from stored config data. This gives the most maintainable OS possible - something not available with Windows. The other facility Barry's code gives is speed and operability. (I just made that word up.) Keeping older hardware running (effectively) is a social good. Our society is in desperate need of social goods. I'll stop before this becomes a rant.

Thanks (again) for the concise answer you gave to the exact question I asked.

собака

собака --> это Русский --> an old dog
"so-baka" (not "co", as in coast or crib).

User avatar
garnet
Posts: 64
Joined: Tue Aug 04, 2020 2:21 pm
Location: Alexandria
Has thanked: 6 times
Been thanked: 11 times

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by garnet »

Wrote one of my own design in 68HC11 assembly language.
Wow! You wrote a compiler on your own? Using assembly language? w(°o°)w
That's impressive!
Tnx and best wishes from Australia
Good luck ^_^

Hope that helps ^_^

User avatar
cobaka
Posts: 521
Joined: Thu Jul 16, 2020 6:04 am
Location: Central Coast, NSW - au
Has thanked: 87 times
Been thanked: 49 times

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by cobaka »

Hello @garnet
Wow! You wrote a compiler on your own? Using assembly language? w(°o°)w
That's impressive!
Thank you BUT the language was (and is) considered to be basic. At that time Charles Moore's stack-oriented FORTH had attracted a good deal of interest. Two fellows in the USA published a design for a "knock-off" of FORTH called CONVERS. Many similarities to FORTH, but also differences in the internal construction. CONVERS ran faster than FORTH and that was important when the CPU clock ran at 2MHz. (Yes, two MHz. Technology was different in 1980.)

Here is a link to the original publication put out by Tilden and Denton. Their original coding was for a PDP-11.
Link: https://duckduckgo.com/?q=Convers+progr ... =h_&ia=web
The only restriction placed on their work was that it continue to be called CONVERS, and credit given.

I took the code for CONVERS, (see above), modified it somewhat, linked it to an operating system (PTDOS by Processor Technology) and assembled it for the 8085 uP. It was used in industrial control systems for a decade or two. At that time (1979/80) programmable logic controllers were slow and expensive.

While CONVERS was/is supposedly simple, I was surprised by the bug-free nature of programs written in CONVERS. The fellows who worked for me (and writing in CONVERS) delivered bug-free programs. Their programs ran reliably for years. Never missed a beat. A major source of trouble was failure to replace the back-up battery in the solid state disk system. Then, when the battery went flat, production stopped. (Bad, bad, bad (and poor maintenance practice)) We tried to figure why programs written in CONVERS presented fewer bugs than other mid-level languages, like C or Pascal (and now - AWK). Never got a satisfying answer.

Later, I added the idea of pre-emptive task interruption and priority-driven task switching. I re-wrote the code for the (very elegant) 68HC11 uP. I called this 'The Schultz CONVERS Engine' because I discussed the design with Lyndon Schultz. He was an excellent 'sharpening stone' to keep my ideas well-ordered. This version of CONVERS allocated the resources of the uP to 7 tasks. Each task was assigned a priority and (on arrival of an interrupt - hardware) the CPU would switch (or not switch) to the new task, according to a priority interrupt table. The programmer allocated the priority. In all, the task stack was seven levels deep, allowing for 7 priorities or levels of interruption. Each task ran in a separate environment; the main shared resources were the uP hardware (timers, IO and so on). If anyone "out there" wants to run 68HC11 machine control I'll give them the source code. (Un-likely, but who knows and IF 'who' knows, THEN why not ask him?)

Writing the code for the 68HC11 turned out to be more difficult than for the 8085. The debugging software available for the 8085 was comprehensive and well written. Not so much for the 68HC11. Also, writing real-time, interrupt-drive, task-switching software is quite demanding. But all that's a long time ago and (I suspect) my brain was more capable then than now.

Turning to the topic of awk, braces and language syntax (awk syntax) your pointer to the (213 page) manual has been quite helpful. Also, Yale (and others) publish very concise (and accurate) summaries of the language and I focused on using just one of these. A concise statement of syntax is a good place to start when using a language/command. Awk is really starting to work for me! Thanks for your help.

All the best fr. Australia

cobaka

собака --> это Русский --> an old dog
"so-baka" (not "co", as in coast or crib).

User avatar
6502coder
Posts: 89
Joined: Mon Jul 13, 2020 6:21 pm
Location: Western US
Has thanked: 3 times
Been thanked: 20 times

Re: awk, curly braces and semicolons - function in the cmnd line.

Post by 6502coder »

Welcome to the world of AWK!

I once wrote a primitive FORTH interpreter in AWK. I think I called it "THIRD" -- because it wasn't quite FORTH. (That's not as bad a pun as it may seem because in fact Charles Moore wanted to name his language FOURTH (for "fourth-generation language) but the system he was working on at the time only allowed 5 letters for identifiers.)

I had a very brief look at the CONVERS document. I want to point out that FORTH was a language, not an implementation spec. That is, while most FORTHs were implemented as threaded interpreters, nothing in the language spec itself REQUIRED this. Even among the threaded interpreter implementations, there were variations: most were "indirect threaded," some were "direct threaded," others were "subroutine threaded", etc. So to claim that CONVERS was faster than FORTH is to beg the question of how the FORTH was implemented.

Regarding the "bug-free" nature of CONVERS, I would assume that, as in FORTH, most routines were very short -- Charles Moore found the idea of a program larger than a few dozen Kbytes to be totally absurd -- and thus there was less to go wrong in the first place. Also, as Jerry Pournelle once said in his Byte Magazine column (although he was actually quoting a friend of his), FORTH is basically a macro assembler language that uses the programmer as a precompiler. The two points being that a) since the programmer has to keep track of what's on the stack at all times, the language demands a mental focus that in and of itself probably tends to result in better quality code; and b) the resulting source code can very compact but that's because the programmer had to do more mental work writing it in the first place.

Pardon me for wandering off topic...but hey, you started it! :) ;)
Post Reply

Return to “Programming”