"Awk is so small that you can be productive in half an hour. It's so concise tha...

silentbicycle · on Sept 29, 2010

The major benefit with awk is that it runs as a pattern recognizing/processing filter by default, so it handles certain common problems in very little code, and fits particularly well in Unix shell pipelines. I'm also a big fan of structuring code in terms of pattern-matching. (I wrote an Erlang-style pattern matching library for Lua, btw: http://github.com/silentbicycle/tamale/ )

I write a lot of little awk scripts, but if they grow past ~5 lines, they usually get rewritten in Lua. (Perhaps eventually with inner loops in C.) Still, Awk is simple and useful enough that it's still worth knowing.

chaostheory · on Sept 29, 2010

"The major benefit with awk is that it runs as a pattern recognizing/processing filter by default"

Doesn't every language have regular expressions built in now? Again I still fail to see the point of writing it in Awk when you can write something small and fast in a more powerful and modern language.

silentbicycle · on Sept 29, 2010

I mean something different than regular expressions: I'm talking about how the whole program is structured around "pattern -> action; other pattern -> other action; ...", with special event patterns for BEGIN, END, etc. That pattern-based dispatch is the top level of the language, rather than function definitions. (Those came later.) As the man page says, it's "pattern-directed".

It's a higher-level approach than typical scripting languages, and that's why it can be so concise - the model makes a lot of unpacking and looping implicit. It's a DSL for stream-processing problems which are easy phrased as "count these", "transform this into that", etc.

Are you familiar with Prolog? It uses a similar approach, but can match on whole trees (and other complex, nested data structures), not just a list of $N string/numeric tokens. Also, it supports backtracking - at any point, if it reaches a dead end, it can back up arbitrarily and try a different approach. Sometimes slow, but very handy for prototyping.

I agree that using another language than awk makes sense after a few lines, but it's still a sweet spot for 1-5ish line programs. Since awk itself is small enough that a two page cheat sheet is sufficient, it's worth keeping around. Perl (for example) has many nooks and crannies I forget about if I don't use it frequently.

swift · on Sept 29, 2010

Anyone who hasn't tried a general purpose language with pattern-based dispatch (usually referred to in practice as "pattern matching") should really do themselves a favor and try one; it's one of the most useful language features around. Now that I've become used to it, it's a bit unpleasant for me to use languages that don't have it. It's a very convenient way to structure code.

The parent post mentions Prolog, which is a good example, but there are several others worth trying that frequently come up on HN; Scala, Haskell, F#, and Ocaml spring to mind.

silentbicycle · on Sept 29, 2010

Yes! Anybody who knows me in person is probably tired of hearing about how good pattern matching is by now. :) I definitely know what you mean about missing it in languages without it, that's why I've been working on tamale.

I can't speak for Scala, but the PM in Haskell and OCaml is a bit different since it's informed by the static typing. When patterns have variant types (i.e., x is either Foo, Bar, or Baz * int), it also checks for complete coverage. Same general concept, different flavor. Also very useful.

I mentioned Prolog in particular because its emphasis on unification and backtracking make it the most pattern-matching-centric programming language I've seen. Where other languages have pattern matching, it almost is pattern matching.

Also, there are well-known ways to compile pattern specifications into efficient decision trees, so while it's a very expressive abstraction, it's not necessarily an expensive one. If they're being constructed at runtime (as they are in my Lua library), you can generally get a big improvement by just indexing on the patterns' first fields and doing linear search thereafter.