AWK
AWK -
AWK reads the input file as records and fields (lines and words)
Example:
a quick brown fox jumped
over a lazy dog
By default, the above file has 2 records with first record holding 5 fields and second record holding 4 fields.
|<---- Fields 1-5 ---->|
a quick brown fox jumped <-- Record 1
over a lazy dog <-- Record 2
|<- Fields 1-4 ->|
This is the default behaviour of AWK. By using few internal variable this can be modified.
USEFUL AWK VARIABLES;
FS -> Input Field Separator/delimiter (Words)
RS -> Input Record Separator (Lines)
NF -> Number of Fields
NR -> Number of Records (It is a accumulation of all input files)
FNR -> Number of Records of current file being processed
OFS -> Output Fields Separator (when concating $1,$2,etc this variable is used)
ORS -> Output Record Separator (while printing lines, this is used to separate them)
Note:
Only special variables uses sigil ($) to access them. While normal variables are accessed without it i.e. $0, $1 etc.
AWK Basic Notes:
C like syntax (; not mandatory)
Static typed
It loops through the file/s implicitly
Basic structure:
BEGIN {operation} # Optional
PATTERN1 {operation1}
PATTERN2 {operation2}
END {operation} # Optional
Variables:
Fields are stored in $1, $2 ... $NF
The line is store in $0
The number of field is store in NF
All AWK variables are global, except for vars declared in funciton argutment list
Operators:
Airthmetic -> + - * / % ^ (+= -= etc)
Logical -> && ||
Comparison -> < > <= >= == !=
Ternary -> ?:
Unary -> ++ -- (Need to check if unary + and - are supported)
Regex -> ~ !~
Branching statement:
The branching statement follow C like syntax
if ( a > b )
print "A is greater than B";
else if ( b > a ) {
print "B is greater than A";
}
else
print "A equal to B";
Looping statement:
while (a < 10) {
print "A = "a;
a++;
}
for (a=0; a<10; a++) print a;
Data types:
- Variables need not be declared
- All arrays are associative
Associative array (write):
hash["key1"] = val1;
hash["key2"] = val2;
Associative array (read):
# Check if key exist
if ("key1" in hash) print "key exist in the hash";
# Loop over keys
for (key in hash) print hash[key];
Subroutine:
Definition:
funciton adder (a,b,c,localvar) {
globvar = a + b + c;
localvar = 10; # This var scope is local to funciton so its redundant here.
return globvar;
}
Calling:
ret_val = adder(var1,var2,var3);
Regex and few useful stuff:
Search and replace:
sub(SEARCH_PATTERN,REPLACE_PATTER); # works on $0
sub(SEARCH_PATTERN,REPLACE_PATTER,VARIABLE);
gsub(SEARCH_PATTERN,REPLACE_PATTER); # works on $0, with global modifier
gsub(SEARCH_PATTERN,REPLACE_PATTER,VARIABLE); # global modifier
Match and split:
match(arr,"search_string"); # Returns the position of search char
split(VARIABLE,PLACEHOLDER,DELIMITER);
length(string);
File operation:
# getline by default tries to pick line from STDIN
filehandle = "PATH/Filename"
getline var < filehandle
close(filehandle);
filehandle = "PATH/Filename"
getline var > filehandle
close(filehandle);
PATTERN Matching for each line:
Each pattern in a checked and executed. ???
If no variable is provided, the $0 is used.
AWK - Few tips
AWK is a text processing utility on GNU/Linux. It is a interpreted language and supports regular expression. Few useful tips are below,
Examples:
1. Count a column of numerical data
seq 1 1 100 | awk '{count=count+$1}END{print count}'
2. If else if condition
awk '{if ($1 == "false") {array[$2] = $1} else if (array[$2] != "false") array[$2] = $1} END {for (i in array) if (array[i] == "true") print i}' inputfile
3. Loop Condition
awk '{if ($1 == "false") {array[$2] = $1} else if (array[$2] != "false") array[$2] = $1} END {for (i in array) if (array[i] == "true") print i}' inputfile
4. Print every line where the value of the last field is > x
awk '$NF > x'
5. Print every line with more than x fields
awk 'NF > x'
6. print the total number of lines that contain "ABC"
awk '/ABC/{n++}; END {print n+0}' file
7. substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar")}; 1' # replace only 1st instance
gawk '{$0=gensub(/foo/,"bar",4)}; 1' # replace only 4th instance
awk '{gsub(/foo/,"bar")}; 1' # replace ALL instances in a line
8. substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")}; 1'
9. substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")}; 1'
10. change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'
11. Print the content between two search string
awk '/START_STRING/ {show=1} show; /END_STRING/ {show=0}'
12. IFS (Input Field Separator)
awk -F "<Field_separator>" '<expression>' file
awk -F "/" 'print $3;' file
awk -F "/" '/search_string/ print $2;' file
Most of the examples are picked up from here
Example:
The below example show how to handle a bit blasted report using AWK.
Code:
#!/usr/bin/awk -f
BEGIN {
# Set feild separator
FS="[";
# Set some local variables
msb="";
lsb=-1;
vecproc=0;
}
# Process vectors
NF > 1 {
# Print info on previous vector if exist
if ( vecproc == 1 ){
if ( sig != $1 ) {
print sig"["msb""lsb"]";
msb="";
lsb=-1;
}
}
# Process vector
sig=$1;
vec=$2;
sub(/]/,"",vec);
if ( lsb == -1 ) {
msb = vec":";
lsb = vec;
} else if ( vec == lsb-1 ) {
lsb = vec;
} else {
msb = msb""lsb","vec":";
lsb = vec;
}
vecproc=1;
}
# Process single bit signal
NF == 1 {
# Process previous processed vector
if ( vecproc == 1 ) {
print sig"["msb""lsb"]";
msb="";
lsb=-1;
vecproc=0
}
print;
}
# Post run routine - print any processed vector
END {
if ( vecproc == 1 ) {
print sig"["msb""lsb"]";
}
}
Sample Input:
/top/sub1/sub2/sub3/sig1
/top/sub1/sub2/sub3/sig2[8]
/top/sub1/sub2/sub3/sig2[7]
/top/sub1/sub2/sub3/sig2[4]
/top/sub1/sub2/sub3/sig2[2]
/top/sub1/sub2/sub3/sig2[1]
/top/sub1/sub2/sub3/sig2[0]
/top/sub1/sub2/sub3/sig3
/top/sub1/sub2/sub3/sig4[8]
/top/sub1/sub2/sub3/sig4[7]
/top/sub1/sub2/sub3/sig4[4]
/top/sub1/sub2/sub3/sig4[2]
/top/sub1/sub2/sub3/sig4[0]
/top/sub1/sub2/sub3/sig5[8]
/top/sub1/sub2/sub3/sig5[7]
/top/sub1/sub2/sub3/sig5[4]
/top/sub1/sub2/sub3/sig5[3]
/top/sub1/sub2/sub3/sig5[0]
/top/sub1/sub2/sub3/sig6[5]
Output:
/top/sub1/sub2/sub3/sig1
/top/sub1/sub2/sub3/sig2[8:7,4:4,2:0]
/top/sub1/sub2/sub3/sig3
/top/sub1/sub2/sub3/sig4[8:7,4:4,2:2,0:0]
/top/sub1/sub2/sub3/sig5[8:7,4:3,0:0]
/top/sub1/sub2/sub3/sig6[5:5]