site-josuah

/usr/josuah
Log | Files | Refs

commit 133d7f3d19aea51075621464e766ca62d3a0a11e
parent e2c4566dcd6d549190a9bba6548dca10c1011382
Author: Josuah Demangeon <me@josuah.net>
Date:   Fri, 17 Apr 2020 16:09:18 +0200

add a page about awk

Diffstat:
Mhead.html | 5++++-
Mindex.md | 6+++---
Dnotwiki.z0.is | 13-------------
Awiki/awk/index.md | 148+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/head.html b/head.html @@ -1,7 +1,10 @@ <!DOCTYPE html> <meta charset="UTF-8"/> -<style> body { max-width:80ch; margin:auto; padding:5em 5ch; } </style> +<style> +body { max-width:80ch; margin:auto; padding:5em 5ch; } +pre { margin-left:4ch; } +</style> <title>josuah.net</title> <a href="/">[ josuah.net ]</a> diff --git a/index.md b/index.md @@ -2,16 +2,16 @@ Welcome to my publication tool, [[josuah.net]]. This is the home of: - * The [[notwiki]] project, a documentation tool that you currently see at work. + * The [[NotWiki]] project, a documentation tool that you currently see at work. * Some [[qmail]]'s internals documentation (WIP) gathered through the [[notqmail]] project - * Some [[Ascii]] Art + * Some [[ASCII]] Art * A few fancy articles on my [[blog]] -[notwiki]: //notwiki.z0.is +[notwiki]: //code.z0.is/notwiki/ [qmail]: https://cr.yp.to/qmail.html [ascii]: /ascii/ [blog]: /blog/ diff --git a/notwiki.z0.is b/notwiki.z0.is @@ -1,13 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" - "DTD/xhtml-transitional.dtd"> -<html xmlns="http://www.w3.org/1999/xhtml" lang="en"> - <head> - <title>gopher redirect</title> - - <meta http-equiv="Refresh" content="1;url=//notwiki.z0.is" /> - </head> - <body> - This page is for redirecting you to: <a href="//notwiki.z0.is">//notwiki.z0.is</a>. - </body> -</html> diff --git a/wiki/awk/index.md b/wiki/awk/index.md @@ -0,0 +1,148 @@ + AWK +===== + +AWK is a surprising efficient language, for both [performance][perf] and code +efficiency. This comes with the ubiquitous array structure, and splitting the +input in fields by default. + +Not everything is parsed efficiently with AWK, Type-Length-Value for instance, +but many things are. I use it for multiple projects: + + * [[NotWiki]], featuring a (not)markdown [[parser]] that does two passes on + to easen-up the parsing, + + * [[ics2txt]], a basic iCal to TSV or plain text converter (two directions), + + * [[jj]] by aaronNGi, a daemon with an awk engine to project that turns raw + IRC protocol into easily readable split log files + +[perf]: https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html +[notwiki]: //code.z0.is/notwiki/ +[parser]: //code.z0.is/git/notwiki/files/ +[jj]: //josuah.net/wiki/jj/ + +Local variables in functions +---------------------------- +By default, all awk variables are global, which is inconvenient for writing +functions. The solution is to add an extra function argument at the end for +each local variable we need. + +Functions can be called with fewer arguments than they have. + + $ awk ' + function concat3(arg1, arg2, arg3, + loc) + { + loc = arg1 arg2 arg3 + return loc + } + + BEGIN { + loc = 1 + print(concat3("a", "w", "k")) + print(loc) + } + ' + awk + 1 + +I learned this with the [jj] project. + +[jj]: https://github.com/aaronNGi/jj/ + + +CSV fields with header +---------------------- +Instead of trying to remember the number of the column, using the name of the +column is much easier, and permit to have new columns inserted in the .csv file +without breaking the script. + + $ cat input.txt + domain_name,expiry_date,creation_date,owner,account_id + nowhere.com,2020-03,2019-05,me,23535 + perdu.com,2020-04,2018-03,you,23535 + pa.st,2020-09,2014-05,them,23535 + + $ awk ' + BEGIN { FS = "," } + NR == 1 { for (i = 1; i <= NF; i++) F[$i] = i; next } + $F["domain_name"] ~ /\.com$/ { + print $F["expiry_date"], $F["owner"], $F["domain_name"] + } + ' + 2020-03 me nowhere.com + 2020-04 you perdu.com + + +UCL-style configuration +----------------------- +Parsing data that is not organised with line-column is also convenient and +efficient with awk, convenient for selecting one kind of value out of a +configuration file: + + $ cat input.txt + connections { + conn-faraway { + children { + localnet = fe80:123d:35d3::%vio1/64 + localnet = fe80:2e46:1d23::%vio2/64 + } + children { + localnet = fe80:546:23e4::%vio3/64 + } + } + conn-veryclose { + children { + localnet = fe80:b536:243f::%vio3/64 + localnet = fe80:34f3:23c3::%vio3/64 + localnet = fe80:546a:343d::%vio3/64 + } + } + } + + $ awk ' + $2 == "{" { F[lv++] = $1 } + $1 == "}" { delete F[--lv] } + F[0] == "connections" && F[2] == "children" && $1 == "localnet" { + print F[1], $3 + } + ' input.txt + conn-faraway fe80:123d:35d3::%vio1/64 + conn-faraway fe80:2e46:1d23::%vio2/64 + conn-faraway fe80:546:23e4::%vio3/64 + conn-veryclose fe80:b536:243f::%vio3/64 + conn-veryclose fe80:34f3:23c3::%vio3/64 + conn-veryclose fe80:546a:343d::%vio3/64 + + +Key-Value splitter +------------------ +Parsing key-value pairs can be mapped rather directly to an awk array, +for instance, to extract an abstract out of a basic iCal file: + + $ cat input.txt + BEGIN:VEVENT + METHOD:PUBLISH + UID:9189@FOSDEM20@fosdem.org + TZID:Europe-Brussels + DTSTART:20200201T170000 + DTEND:20200201T175000 + SUMMARY:State of the Onion + DESCRIPTION:Building usable free software to fight surveillance and censorship. + CLASS:PUBLIC + STATUS:CONFIRMED + CATEGORIES:Internet + LOCATION:Janson + END:VEVENT + + $ awk ' + BEGIN { FS = ":" } + { F[$1] = $2 } + $1 == "END" { + print F["SUMMARY"] " - " F["DESCRIPTION"] + print F["DTSTART"], "(" F["TZID"] ")" + } + ' + State of the Onion - Building usable free software to fight surveillance and censorship. + 20200201T170000 (Europe-Brussels) +