Use awk to filter log files on values ‘greater than’ or ‘less than’ a threshold in Linux

Image of Mattias Geniar

Mattias Geniar, December 31, 2015

Follow me on Twitter as @mattiasgeniar

Assume the following scenario: you want to tail the log files of a webserver, but you only want to see the requests whose files size exceeds 10MB.

Normally, you’d tail your logs like this.

$ tail -f access.log
... "GET /path/to/files.css HTTP/1.1" 200 145848 "" "Mozilla" "GET /path/to/files.js HTTP/1.1" 200 195848 "" "Mozilla" "GET /path/to/big-movie-file.webm HTTP/1.1" 200 11146409 "" "Mozilla"

You can use awk to filter the log files where a particular value exceeds a threshold. For instance, assuming the log files above (which have been shortened for readability), the filesize of the request is in the 6th field in the logs. In the above example, it’s right after the HTTP 200 status code.

To clarify, since awk by default splits on spaces (or “whitespace”) in the output, it detects the following blocks. Each are logically numbered $1, $2, $3, …


So in order to filter your logs, you can use awk like this, which uses field $6 to compare a value against..

$ tail -f access.log | awk '$6 > 10000000'

To break it down:

  • $6: the 6th field (by default, space separated) in the output of tail
  • > 10000000: the value should exceed 10.000.000. Since logfiles express this value in bytes, we do 1024 * 1024 * 10 to get 10MB

awk is a very powerful tool, even understanding just the basics like these comparison parameters can get you a long way!

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.