site-josuah

/usr/josuah
Log | Files | Refs

commit 94c7d4215918d5e13fb96e323e91e9a89f4d9723
parent 23a43df23c41da48db4836afa1a61621d530ac21
Author: Josuah Demangeon <me@josuah.net>
Date:   Fri, 12 Jun 2020 20:53:18 +0200

wiki/awk: add a function to convert OUI to plain english

Diffstat:
Mwiki/awk/index.md | 494++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
1 file changed, 273 insertions(+), 221 deletions(-)

diff --git a/wiki/awk/index.md b/wiki/awk/index.md @@ -1,6 +1,5 @@ AWK === - AWK is a surprising efficient language, for both [performance][perf] and code efficiency. This comes with the ubiquitous array structure, and splitting the input in fields by default. @@ -25,29 +24,29 @@ but many things are. I use it for multiple projects: Below are multiple ways of using awk for getting the best out of it. These are partly by myself, partly collected from what I saw in the wild. - CSV fields with header ---------------------- Instead of trying to remember the number of the column, using the name of the column is much easier, and permit to have new columns inserted in the .csv file without breaking the script. - $ cat input.txt - domain_name,expiry_date,creation_date,owner,account_id - nowhere.com,2020-03,2019-05,me,23535 - perdu.com,2020-04,2018-03,you,23535 - pa.st,2020-09,2014-05,them,23535 - - $ awk ' - BEGIN { FS = "," } - NR == 1 { for (i = 1; i <= NF; i++) F[$i] = i; next } - $F["domain_name"] ~ /\.com$/ { - print $F["expiry_date"], $F["owner"], $F["domain_name"] - } - ' input.txt - 2020-03 me nowhere.com - 2020-04 you perdu.com - +``` +$ cat input.txt +domain_name,expiry_date,creation_date,owner,account_id +nowhere.com,2020-03,2019-05,me,23535 +perdu.com,2020-04,2018-03,you,23535 +pa.st,2020-09,2014-05,them,23535 + +$ awk ' + BEGIN { FS = "," } + NR == 1 { for (i = 1; i <= NF; i++) F[$i] = i; next } + $F["domain_name"] ~ /\.com$/ { + print $F["expiry_date"], $F["owner"], $F["domain_name"] + } +' input.txt +2020-03 me nowhere.com +2020-04 you perdu.com +``` UCL-style configuration ----------------------- @@ -55,71 +54,75 @@ Parsing data that is not organised with line-column is also convenient and efficient with awk, convenient for selecting one kind of value out of a configuration file: - $ cat input.txt - connections { - conn-faraway { - children { - localnet = fe80:123d:35d3::%vio1/64 - localnet = fe80:2e46:1d23::%vio2/64 - } - children { - localnet = fe80:546:23e4::%vio3/64 - } +``` +$ cat input.txt +connections { + conn-faraway { + children { + localnet = fe80:123d:35d3::%vio1/64 + localnet = fe80:2e46:1d23::%vio2/64 } - conn-veryclose { - children { - localnet = fe80:b536:243f::%vio3/64 - localnet = fe80:34f3:23c3::%vio3/64 - localnet = fe80:546a:343d::%vio3/64 - } + children { + localnet = fe80:546:23e4::%vio3/64 } } - - $ awk ' - $2 == "{" { F[lv++] = $1 } - $1 == "}" { delete F[--lv] } - F[0] == "connections" && F[2] == "children" && $1 == "localnet" { - print F[1], $3 + conn-veryclose { + children { + localnet = fe80:b536:243f::%vio3/64 + localnet = fe80:34f3:23c3::%vio3/64 + localnet = fe80:546a:343d::%vio3/64 } - ' input.txt - conn-faraway fe80:123d:35d3::%vio1/64 - conn-faraway fe80:2e46:1d23::%vio2/64 - conn-faraway fe80:546:23e4::%vio3/64 - conn-veryclose fe80:b536:243f::%vio3/64 - conn-veryclose fe80:34f3:23c3::%vio3/64 - conn-veryclose fe80:546a:343d::%vio3/64 - + } +} +``` + +``` +$ awk ' + $2 == "{" { F[lv++] = $1 } + $1 == "}" { delete F[--lv] } + F[0] == "connections" && F[2] == "children" && $1 == "localnet" { + print F[1], $3 + } +' input.txt +conn-faraway fe80:123d:35d3::%vio1/64 +conn-faraway fe80:2e46:1d23::%vio2/64 +conn-faraway fe80:546:23e4::%vio3/64 +conn-veryclose fe80:b536:243f::%vio3/64 +conn-veryclose fe80:34f3:23c3::%vio3/64 +conn-veryclose fe80:546a:343d::%vio3/64 +``` Key-Value splitter ------------------ Parsing key-value pairs can be mapped rather directly to an awk array, for instance, to extract an abstract out of a basic iCal file: - $ cat input.txt - BEGIN:VEVENT - METHOD:PUBLISH - UID:9189@FOSDEM20@fosdem.org - TZID:Europe-Brussels - DTSTART:20200201T170000 - DTEND:20200201T175000 - SUMMARY:State of the Onion - DESCRIPTION:Building usable free software to fight surveillance and censorship. - CLASS:PUBLIC - STATUS:CONFIRMED - CATEGORIES:Internet - LOCATION:Janson - END:VEVENT - $ awk ' - BEGIN { FS = ":" } - { F[$1] = $2 } - $1 == "END" { - print F["SUMMARY"] " - " F["DESCRIPTION"] - print F["DTSTART"], "(" F["TZID"] ")" - } - ' input.txt - State of the Onion - Building usable free software to fight surveillance and censorship. - 20200201T170000 (Europe-Brussels) - +``` +$ cat input.txt +BEGIN:VEVENT +METHOD:PUBLISH +UID:9189@FOSDEM20@fosdem.org +TZID:Europe-Brussels +DTSTART:20200201T170000 +DTEND:20200201T175000 +SUMMARY:State of the Onion +DESCRIPTION:Building usable free software to fight surveillance and censorship. +CLASS:PUBLIC +STATUS:CONFIRMED +CATEGORIES:Internet +LOCATION:Janson +END:VEVENT +$ awk ' + BEGIN { FS = ":" } + { F[$1] = $2 } + $1 == "END" { + print F["SUMMARY"] " - " F["DESCRIPTION"] + print F["DTSTART"], "(" F["TZID"] ")" + } +' input.txt +State of the Onion - Building usable free software to fight surveillance and censorship. +20200201T170000 (Europe-Brussels) +``` Edit variables passed to functions ---------------------------------- @@ -127,13 +130,16 @@ For languages that support references, pointers, or objects, it is possible to edit the variable passed to a function, so that the variable also gets edited in the function that called it. - void increment(int *i) { (*i)++; } +``` +void increment(int *i) { (*i)++; } +``` Awk does not support changing integers or strings, but supports editing the fields of an array: - function increment_first(arr) { arr[1]++ } - +``` +function increment_first(arr) { arr[1]++ } +``` Local variables in functions ---------------------------- @@ -143,91 +149,97 @@ each local variable we need. Functions can be called with fewer arguments than they have. - $ awk ' - function concat3(arg1, arg2, arg3, - local1) - { - local1 = arg1 arg2 arg3 - return local1 - } - - BEGIN { - local1 = 1 - print(concat3("a", "w", "k")) - print(local1) - } - ' - awk - 1 +``` +$ awk ' + function concat3(arg1, arg2, arg3, + local1) + { + local1 = arg1 arg2 arg3 + return local1 + } + + BEGIN { + local1 = 1 + print(concat3("a", "w", "k")) + print(local1) + } +' +awk +1 +``` I learned this with the [jj] project. [jj]: https://github.com/aaronNGi/jj/ - A sort() function ----------------- A very convenient feature lacking to awk is support for sorting members of an array. Is possible to implement sort() in awk (this is a quicksort): - function swap(array, a, b, - tmp) - { - tmp = array[a] - array[a] = array[b] - array[b] = tmp - } - - function sort(array, beg, end) - { - if (beg >= end) # end recursion - return - - a = beg + 1 # 1st is the pivot, so +1 - b = end - while (a < b) { - while (a < b && array[a] <= array[beg]) # beg: skip lesser - a++ - while (a < b && array[b] > array[beg]) # end: skip greater - b-- - swap(array, a, b) # found 2 misplaced - } - - if (array[beg] > array[a]) # put the pivot back - swap(array, beg, a) - - sort(array, beg, a - 1) # sort lower half - sort(array, a, end) # sort higher half +``` +function swap(array, a, b, + tmp) +{ + tmp = array[a] + array[a] = array[b] + array[b] = tmp +} + +function sort(array, beg, end) +{ + if (beg >= end) # end recursion + return + + a = beg + 1 # 1st is the pivot, so +1 + b = end + while (a < b) { + while (a < b && array[a] <= array[beg]) # beg: skip lesser + a++ + while (a < b && array[b] > array[beg]) # end: skip greater + b-- + swap(array, a, b) # found 2 misplaced } + if (array[beg] > array[a]) # put the pivot back + swap(array, beg, a) + + sort(array, beg, a - 1) # sort lower half + sort(array, a, end) # sort higher half +} +``` + This sorts the array values using integers keys: `array[1]`, `array[2]`, ... It sorts from `array[beg]` to `array[end]` included, so you can choose your array indices starting at 0 or 1, or sort just a part of the array. Example usage: with the both function above: - { - LINES[NR] = $0 - } - - END { - sort(LINES, 1, NR) - for (i = 1; i <= NR; i++) - print(LINES[i]) - } +``` +{ + LINES[NR] = $0 +} -Performance is far from terrible! +END { + sort(LINES, 1, NR) + for (i = 1; i <= NR; i++) + print(LINES[i]) +} +``` - $ od -An /dev/urandom | head -n 1000000 | time ./test.awk >/dev/null - real 0m 19.23s - user 0m 17.90s - sys 0m 0.12s +Performance is far from terrible! - $ od -An /dev/urandom | head -n 1000000 | time sort >/dev/null - real 0m 4.39s - user 0m 3.00s - sys 0m 0.10s +``` +$ od -An /dev/urandom | head -n 1000000 | time ./test.awk >/dev/null +real 0m 19.23s +user 0m 17.90s +sys 0m 0.12s +$ od -An /dev/urandom | head -n 1000000 | time sort >/dev/null +real 0m 4.39s +user 0m 3.00s +sys 0m 0.10s +``` A gmtime() function ------------------- @@ -237,114 +249,154 @@ fields year, mon, mday, hour, min, sec (2020-04-19T15:15:58Z): [tf]: https://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html - function isleap(year) - { - return (year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0) +``` +function isleap(year) +{ + return (year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0) +} + +function mdays(mon, year) +{ + return (mon == 2) ? (28 + isleap(year)) : (30 + (mon + (mon > 7)) % 2) +} + +function gmtime(sec, tm) +{ + tm["year"] = 1970 + while (sec >= (s = 86400 * (365 + isleap(tm["year"])))) { + tm["year"]++ + sec -= s } - - function mdays(mon, year) - { - return (mon == 2) ? (28 + isleap(year)) : (30 + (mon + (mon > 7)) % 2) + + tm["mon"] = 1 + while (sec >= (s = 86400 * mdays(tm["mon"], tm["year"]))) { + tm["mon"]++ + sec -= s } - - function gmtime(sec, tm) - { - tm["year"] = 1970 - while (sec >= (s = 86400 * (365 + isleap(tm["year"])))) { - tm["year"]++ - sec -= s - } - - tm["mon"] = 1 - while (sec >= (s = 86400 * mdays(tm["mon"], tm["year"]))) { - tm["mon"]++ - sec -= s - } - - tm["mday"] = 1 - while (sec >= (s = 86400)) { - tm["mday"]++ - sec -= s - } - - tm["hour"] = 0 - while (sec >= 3600) { - tm["hour"]++ - sec -= 3600 - } - - tm["min"] = 0 - while (sec >= 60) { - tm["min"]++ - sec -= 60 - } - - tm["sec"] = sec + + tm["mday"] = 1 + while (sec >= (s = 86400)) { + tm["mday"]++ + sec -= s + } + + tm["hour"] = 0 + while (sec >= 3600) { + tm["hour"]++ + sec -= 3600 + } + + tm["min"] = 0 + while (sec >= 60) { + tm["min"]++ + sec -= 60 } + tm["sec"] = sec +} +``` + The tm array will be filled with field names following the [[gmtime]] function as you can see above. [gmtime]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/gmtime.html - A localtime() function ---------------------- For printing functions in the user's favorite timezone, gmtime's time needs to be shifted. This can also be done in standard awk by calling the date(1) command: - function localtime(sec, tm, - tz, h, m) - { - if (!TZOFFSET) { - "date +%z" | getline tz - close("date +%z") - h = substr(tz, 2, 2) - m = substr(tz, 4, 2) - TZOFFSET = substr(date, 1, 1) (h * 3600 + m * 60) - } - return gmtime(sec + TZOFFSET, tm) +``` +function localtime(sec, tm, + tz, h, m) +{ + if (!TZOFFSET) { + "date +%z" | getline tz + close("date +%z") + h = substr(tz, 2, 2) + m = substr(tz, 4, 2) + TZOFFSET = substr(date, 1, 1) (h * 3600 + m * 60) } + return gmtime(sec + TZOFFSET, tm) +} +``` Note that date(1) will only be called the first time localtime() is called, and the TZOFFSET global variable will be used for the next calls. - A mktime() function ------------------- Complementary function to gmtime is mktime for converting a `tm[]` array back to an integer representation. This is useful for parsing time values back to an unix timestamp: - function isleap(year) - { - return (year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0) - } - - function mdays(mon, year) - { - return (mon == 2) ? (28 + isleap(year)) : (30 + (mon + (mon > 7)) % 2) - } - - function mktime(tm, - sec, mon, day) - { - sec = tm["sec"] + tm["min"] * 60 + tm["hour"] * 3600 - - day = tm["mday"] - 1 - - for (mon = tm["mon"] - 1; mon > 0; mon--) - day = day + mdays(mon, tm["year"]) - - # constants: x * 365 + x / 400 - x / 100 + x / 4 - day = day + int(tm["year"] / 400) * 146097 - day = day + int(tm["year"] % 400 / 100) * 36524 - day = day + int(tm["year"] % 100 / 4) * 1461 - day = day + int(tm["year"] % 4 / 1) * 365 - - return sec + (day - 719527) * 86400 - } +``` +function isleap(year) +{ + return (year % 4 == 0) && (year % 100 != 0) || (year % 400 == 0) +} + +function mdays(mon, year) +{ + return (mon == 2) ? (28 + isleap(year)) : (30 + (mon + (mon > 7)) % 2) +} + +function mktime(tm, + sec, mon, day) +{ + sec = tm["sec"] + tm["min"] * 60 + tm["hour"] * 3600 + + day = tm["mday"] - 1 + + for (mon = tm["mon"] - 1; mon > 0; mon--) + day = day + mdays(mon, tm["year"]) + + # constants: x * 365 + x / 400 - x / 100 + x / 4 + day = day + int(tm["year"] / 400) * 146097 + day = day + int(tm["year"] % 400 / 100) * 36524 + day = day + int(tm["year"] % 100 / 4) * 1461 + day = day + int(tm["year"] % 4 / 1) * 365 + + return sec + (day - 719527) * 86400 +} +``` All the following fields of `tm[]` must be defined: "year", "mon", "mday", "hour", "min", "sec". + +Convert MAC address to brand name +--------------------------------- +[MAC addresses](https://en.wikipedia.org/wiki/MAC_address) are composed by a +leading Organization Unique Identifier (OUI) of 3 bytes and a trailing 3-byte +number, unique for that OUI. + +Each vendor has its own OUI, so each OUI maps to a vendor. With the reference +list the [IEEE](https://ieee.org/) publishes, it is possible to convert MAC +address OUI digits to a human-readable name: + +``` +function oui_table(path, + url) +{ + url = "http://standards-oui.ieee.org/oui/oui.txt" + if (system("test -f '" path "'") > 0) + if (system("curl -L -o '" path "' " url) != 0) + return -1 + while (getline <path) + if ($2 " " $3 == "(base 16)") + OUI[$1] = substr($0, 22) + return 0 +} +``` + +Then a global `OUI` array does the MAC addresss to vendor name mapping: + +``` +BEGIN { + if (oui_table("/var/tmp/oui.txt") < 0) + exit(1) + + print(OUI[toupper("84a991")]) +} +```