AWK

AWK -

AWK reads the input file as records and fields (lines and words)

Example:

a quick brown fox jumped

over a lazy dog

By default, the above file has 2 records with first record holding 5 fields and second record holding 4 fields.

|<---- Fields 1-5 ---->|

a quick brown fox jumped <-- Record 1

over a lazy dog <-- Record 2

|<- Fields 1-4 ->|

This is the default behaviour of AWK. By using few internal variable this can be modified.

USEFUL AWK VARIABLES;

  • FS -> Input Field Separator/delimiter (Words)

  • RS -> Input Record Separator (Lines)

  • NF -> Number of Fields

  • NR -> Number of Records (It is a accumulation of all input files)

  • FNR -> Number of Records of current file being processed

  • OFS -> Output Fields Separator (when concating $1,$2,etc this variable is used)

  • ORS -> Output Record Separator (while printing lines, this is used to separate them)


Note:

Only special variables uses sigil ($) to access them. While normal variables are accessed without it i.e. $0, $1 etc.


AWK Basic Notes:

C like syntax (; not mandatory)

Static typed

It loops through the file/s implicitly

Basic structure:

BEGIN {operation} # Optional

PATTERN1 {operation1}

PATTERN2 {operation2}

END {operation} # Optional


Variables:

Fields are stored in $1, $2 ... $NF

The line is store in $0

The number of field is store in NF

All AWK variables are global, except for vars declared in funciton argutment list


Operators:

Airthmetic -> + - * / % ^ (+= -= etc)

Logical -> && ||

Comparison -> < > <= >= == !=

Ternary -> ?:

Unary -> ++ -- (Need to check if unary + and - are supported)

Regex -> ~ !~


Branching statement:

The branching statement follow C like syntax

if ( a > b )

print "A is greater than B";

else if ( b > a ) {

print "B is greater than A";

}

else

print "A equal to B";


Looping statement:

while (a < 10) {

print "A = "a;

a++;

}


for (a=0; a<10; a++) print a;


Data types:

- Variables need not be declared

- All arrays are associative


Associative array (write):

hash["key1"] = val1;

hash["key2"] = val2;


Associative array (read):

# Check if key exist

if ("key1" in hash) print "key exist in the hash";

# Loop over keys

for (key in hash) print hash[key];


Subroutine:

Definition:

funciton adder (a,b,c,localvar) {

globvar = a + b + c;

localvar = 10; # This var scope is local to funciton so its redundant here.

return globvar;

}


Calling:

ret_val = adder(var1,var2,var3);


Regex and few useful stuff:


Search and replace:

sub(SEARCH_PATTERN,REPLACE_PATTER); # works on $0

sub(SEARCH_PATTERN,REPLACE_PATTER,VARIABLE);

gsub(SEARCH_PATTERN,REPLACE_PATTER); # works on $0, with global modifier

gsub(SEARCH_PATTERN,REPLACE_PATTER,VARIABLE); # global modifier

Match and split:

match(arr,"search_string"); # Returns the position of search char

split(VARIABLE,PLACEHOLDER,DELIMITER);

length(string);


File operation:

# getline by default tries to pick line from STDIN

filehandle = "PATH/Filename"

getline var < filehandle

close(filehandle);


filehandle = "PATH/Filename"

getline var > filehandle

close(filehandle);


PATTERN Matching for each line:

Each pattern in a checked and executed. ???

If no variable is provided, the $0 is used.

AWK - Few tips

AWK is a text processing utility on GNU/Linux. It is a interpreted language and supports regular expression. Few useful tips are below,

Examples:

1. Count a column of numerical data

seq 1 1 100 | awk '{count=count+$1}END{print count}'


2. If else if condition

awk '{if ($1 == "false") {array[$2] = $1} else if (array[$2] != "false") array[$2] = $1} END {for (i in array) if (array[i] == "true") print i}' inputfile


3. Loop Condition

awk '{if ($1 == "false") {array[$2] = $1} else if (array[$2] != "false") array[$2] = $1} END {for (i in array) if (array[i] == "true") print i}' inputfile


4. Print every line where the value of the last field is > x

awk '$NF > x'


5. Print every line with more than x fields

awk 'NF > x'


6. print the total number of lines that contain "ABC"

awk '/ABC/{n++}; END {print n+0}' file


7. substitute (find and replace) "foo" with "bar" on each line

awk '{sub(/foo/,"bar")}; 1' # replace only 1st instance

gawk '{$0=gensub(/foo/,"bar",4)}; 1' # replace only 4th instance

awk '{gsub(/foo/,"bar")}; 1' # replace ALL instances in a line


8. substitute "foo" with "bar" ONLY for lines which contain "baz"

awk '/baz/{gsub(/foo/, "bar")}; 1'


9. substitute "foo" with "bar" EXCEPT for lines which contain "baz"

awk '!/baz/{gsub(/foo/, "bar")}; 1'


10. change "scarlet" or "ruby" or "puce" to "red"

awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'


11. Print the content between two search string

awk '/START_STRING/ {show=1} show; /END_STRING/ {show=0}'


12. IFS (Input Field Separator)

awk -F "<Field_separator>" '<expression>' file

awk -F "/" 'print $3;' file

awk -F "/" '/search_string/ print $2;' file

Most of the examples are picked up from here

Example:

The below example show how to handle a bit blasted report using AWK.


Code:

#!/usr/bin/awk -f

BEGIN {

# Set feild separator

FS="[";


# Set some local variables

msb="";

lsb=-1;

vecproc=0;

}


# Process vectors

NF > 1 {

# Print info on previous vector if exist

if ( vecproc == 1 ){

if ( sig != $1 ) {

print sig"["msb""lsb"]";

msb="";

lsb=-1;

}

}


# Process vector

sig=$1;

vec=$2;

sub(/]/,"",vec);

if ( lsb == -1 ) {

msb = vec":";

lsb = vec;

} else if ( vec == lsb-1 ) {

lsb = vec;

} else {

msb = msb""lsb","vec":";

lsb = vec;

}

vecproc=1;

}


# Process single bit signal

NF == 1 {

# Process previous processed vector

if ( vecproc == 1 ) {

print sig"["msb""lsb"]";

msb="";

lsb=-1;

vecproc=0

}

print;

}


# Post run routine - print any processed vector

END {

if ( vecproc == 1 ) {

print sig"["msb""lsb"]";

}

}


Sample Input:

/top/sub1/sub2/sub3/sig1

/top/sub1/sub2/sub3/sig2[8]

/top/sub1/sub2/sub3/sig2[7]

/top/sub1/sub2/sub3/sig2[4]

/top/sub1/sub2/sub3/sig2[2]

/top/sub1/sub2/sub3/sig2[1]

/top/sub1/sub2/sub3/sig2[0]

/top/sub1/sub2/sub3/sig3

/top/sub1/sub2/sub3/sig4[8]

/top/sub1/sub2/sub3/sig4[7]

/top/sub1/sub2/sub3/sig4[4]

/top/sub1/sub2/sub3/sig4[2]

/top/sub1/sub2/sub3/sig4[0]

/top/sub1/sub2/sub3/sig5[8]

/top/sub1/sub2/sub3/sig5[7]

/top/sub1/sub2/sub3/sig5[4]

/top/sub1/sub2/sub3/sig5[3]

/top/sub1/sub2/sub3/sig5[0]

/top/sub1/sub2/sub3/sig6[5]


Output:

/top/sub1/sub2/sub3/sig1

/top/sub1/sub2/sub3/sig2[8:7,4:4,2:0]

/top/sub1/sub2/sub3/sig3

/top/sub1/sub2/sub3/sig4[8:7,4:4,2:2,0:0]

/top/sub1/sub2/sub3/sig5[8:7,4:3,0:0]

/top/sub1/sub2/sub3/sig6[5:5]