Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
AWK & services log files: Two functions for IPv6 addresses
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
truc
Advocate
Advocate


Joined: 25 Jul 2005
Posts: 3199

PostPosted: Tue Jun 21, 2011 10:25 am    Post subject: AWK & services log files: Two functions for IPv6 address Reply with quote

I regularly use awk to generate some quick statistics for the firewall/the nameserver/the proxy (e.g.: kern.log, daemon.log, squid3/access.log).


Recently I've started to play with IPv6, but realized that IPv6 address are in their extended format for ip6?tables, while in their compressed format for other services. This is quite annoying when you have to manually switch between the two formats, and, the compressed format is really more readable than the extended one.

The two awk functions below should do it for you. You can include them (as shown below) in you're awk scripts

utils.awk:
function extended2compressedIP (ip,      lastPos, maxLength, maxLength_start, maxLength_stop, nb, i, elt) {
   # IPv4: no compressed format
   if (ip !~ /:/)
      return ip

   # else IPv6
   lastPos = -1
   maxLength = 0
   maxLength_start = -1
   maxLength_stop = -1
   nb = split(ip, elt, ":")

   # find the longest group of 0(if any)
   for (i=1; i<=nb; i++) {
      sub(/^0+/, "", elt[i])
      sub(/^$/, "0", elt[i])
      if ("0" == elt[i]) {
         if (-1 == lastPos)
            lastPos = i
      } else if ( -1 != lastPos) {
         lastLength = i - lastPos
         if (maxLength < lastLength) {
            maxLength_start = lastPos
            maxLength_stop = i - 1
         }
         lastPos = -1
      }
   }

   # special case: 0:0:0:0:0:0:0:0
   if (1 == lastPos && -1 == maxLength_start)
      maxLength_start = 1

   # special case: AAAA:BBB:..:DDD:0:0:...
   if (-1 != maxLength_start && -1 == maxLength_stop)
      maxLength_stop = nb

   ip = ""
   for (i=1; i<=nb; i++) {
      if (maxLength_start == i) {
         # leading 0
         if (1 == i)
            ip = ":"
         i = maxLength_stop
      } else {
         ip = ip elt[i]
      }
      if (nb != i || maxLength_stop == i)
         ip = ip ":"
   }

   return ip
}

function compressed2extendedIP (ip,      i, j, len, elt) {
   # IPv4: no compressed format
   if (ip !~ /:/)
      return ip

   # else IPv6
   missing = -1

   nb = split(ip, elt, ":")

   ip = ""
   for (i=1; i<=nb; i++) {
      if (0 == length(elt[i])) {
         # there should be 8 groups (of 16 bits each, 2 bytes, 4 characters)
         if (0 != missing)
            missing = 8 - nb
         elt[i] = "0000"
         for (j=1; j<=missing; j++)
            elt[i] = elt[i] ":0000"
         # the has to be done one time only (if any)
         missing = 0
      } else {
         # 4 characters per group
         len = length(elt[i])
         for (j=len ; j<4; j++)
            elt[i] = "0" elt[i]
      }

      ip = sprintf("%s%s%s", ip, (i>1 ? ":" : "" ), elt[i])
   }
   return ip
}


For example:
Code:
< ~/ipv6 awk -f utils.awk --source '{ print $0, "->", extended2compressedIP($0) }'
2001:03ee:bd04:0054:04b9:0000:0000:d47a -> 2001:3ee:bd04:54:4b9::d47a
2a01:c916:0000:0004:0000:0000:0000:0036 -> 2a01:c916:0:4::36
192.168.54.122 -> 192.168.54.122
2001:03ee:bd04:0054:021b:fcff:feec:5a3c -> 2001:3ee:bd04:54:21b:fcff:feec:5a3c
ff02:0000:0000:0000:0000:0001:ff0a:e435 -> ff02::1:ff0a:e435
0000:0000:0000:0000:0000:0000:0000:0000 -> ::
fe80:0000:0000:0000:adad:7a8c:dad7:999b -> fe80::adad:7a8c:dad7:999b

and now the other way around:
Code:
< ~/ipv6 awk -f utils.awk --source '{ print $0, "->", extended2compressedIP($0) }' | awk -f utils.awk --source '{ print $0, "->", compressed2extendedIP($NF) }'
2001:03ee:bd04:0054:04b9:0000:0000:d47a -> 2001:3ee:bd04:54:4b9::d47a -> 2001:03ee:bd04:0054:04b9:0000:0000:d47a
2a01:c916:0000:0004:0000:0000:0000:0036 -> 2a01:c916:0:4::36 -> 2a01:c916:0000:0004:0000:0000:0000:0036
192.168.54.122 -> 192.168.54.122 -> 192.168.54.122
2001:03ee:bd04:0054:021b:fcff:feec:5a3c -> 2001:3ee:bd04:54:21b:fcff:feec:5a3c -> 2001:03ee:bd04:0054:021b:fcff:feec:5a3c
ff02:0000:0000:0000:0000:0001:ff0a:e435 -> ff02::1:ff0a:e435 -> ff02:0000:0000:0000:0000:0001:ff0a:e435
0000:0000:0000:0000:0000:0000:0000:0000 -> :: -> 0000:0000:0000:0000:0000:0000:0000:0000


Please, let me know if there are some corner cases where theses functions fail to do their job!
_________________
The End of the Internet!


Last edited by truc on Tue Jun 21, 2011 2:10 pm; edited 2 times in total
Back to top
View user's profile Send private message
truc
Advocate
Advocate


Joined: 25 Jul 2005
Posts: 3199

PostPosted: Tue Jun 21, 2011 10:45 am    Post subject: Reply with quote

Now consider this really simple awk script used to selectively print the request for a given IP address in the squid3 default log file:
selIP.awk:
BEGIN {
   ip=extended2compressedIP(ip)
}
($3 == ip) {
   print
}



Here is how you can use it:
Code:
</var/log/squid3/access.log awk -f utils.awk -f selIP.awk -v ip=2001:03ee:bd04:0054:021b:fcff:feec:5a3c


But it also accepts a compressed IPv6 address
Code:
</var/log/squid3/access.log awk -f utils.awk -f selIP.awk -v ip=2001:3ee:bd04:54:21b:fcff:feec:5a3c


And of course, it also works with a IPv4 adress
Code:
</var/log/squid3/access.log awk -f utils.awk -f selIP.awk -v ip=192.168.54.21


It's up to you to convert the data in INPUT to the right format(the one used in the log file) so your awk script is as efficient as possible

Feel free to comment!
_________________
The End of the Internet!
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Thu Jun 07, 2012 11:29 pm    Post subject: Reply with quote

Thanks truc ...

I'm using awk more and more myself, having got hooked sometime last year. There should be a banner someplace that says "awk does more than '{print $1}'" as I often see the likes of "* |grep | sed 's/foo/ba/g' | awk '{print 1}'" when in most cases this could have been handled entirely by awk '/foo/{gsub(/foo/,"ba");print}' <(input).

If your parsing really large log files, and know a little about what fields contain what, then awk can make parsing out specific data a snip eg:

Code:
awk '$23=="DPT=25"' /var/log/iptables.log

One point though, these are not "functions", but "program files" ... not to be facicious.

I [:heart:] awk.

best ... khayyam
Back to top
View user's profile Send private message
truc
Advocate
Advocate


Joined: 25 Jul 2005
Posts: 3199

PostPosted: Sun Jun 10, 2012 2:40 pm    Post subject: Reply with quote

khayyam wrote:
Thanks truc ...

I'm using awk more and more myself, having got hooked sometime last year. There should be a banner someplace that says "awk does more than '{print $1}'" as I often see the likes of "* |grep | sed 's/foo/ba/g' | awk '{print 1}'" when in most cases this could have been handled entirely by awk '/foo/{gsub(/foo/,"ba");print}' <(input).

True, but watch out when using some of the nice _gnu_ awk features, these are not posix and thus not everywhere!

Quote:
If your parsing really large log files, and know a little about what fields contain what, then awk can make parsing out specific data a snip eg:

Code:
awk '$23=="DPT=25"' /var/log/iptables.log


I've done many awk programs to generate some statistics(iptables/ip6tables, dnsmasq, squid3 and a few others), I've even quite proud of the result! But honestly, now that I'm learning Perl, I realize what's written&said everywhere: Perl is good at (among a great deal of other things) parsing text files.

Awk is often installed by default, but so is perl. I'm now using perl in my one-liners when I used to use awk. I still think it's important to know how to use awk (e.g. to avoid those grep | sed where a single awk call would have done it). But if you have some time to kill, you know what you can do! ;)


Quote:
One point though, these are not "functions", but "program files" ... not to be facicious.


Well, I actually share these two functions to be used within _your_ program files :lol:


BTW, extended2compressed function is not correct on one point: http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07#section-4.2.2
Code:
4.2.2.  Handling One 16 Bit 0 Field

   The symbol "::" MUST NOT be used to shorten just one 16 bit 0 field.
   For example, the representation 2001:db8:0:1:1:1:1:1 is correct, but
   2001:db8::1:1:1:1:1 is not correct.

I did not take the time to correct this, since it did not look like it gathered a lot of interest out there :wink:

khayyam wrote:
I [:heart:] awk.


That'd be sed for me! I love how twisted sed programs can be!
_________________
The End of the Internet!
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Mon Jun 11, 2012 1:40 pm    Post subject: Reply with quote

hey ...

truc wrote:
[...]Awk is often installed by default, but so is perl. I'm now using perl in my one-liners when I used to use awk. I still think it's important to know how to use awk (e.g. to avoid those grep | sed where a single awk call would have done it). But if you have some time to kill, you know what you can do! ;)


I've been put off by perl somewhat, and though I'd agree it outdoes awk, sed, ed, sh, for text handling, I often get this feeling that it lacks something like transparency (for want of a better word). I was attempting to learn it some years back, but I quickly became frustrated when approaching other peoples code, it was like staring into a dark recess. This could be seen as an advantage, and no doubt perl is "flexable" in terms of how I can be wielded, but I never got the sense that I was making any headway. This was no doubt exsasperated by the fact that it was under a heavy workload at that time, I should probably revisit and see if I fair better not having 12hr workdays.

truc wrote:
Quote:
One point though, these are not "functions", but "program files" ... not to be facicious.

Well, I actually share these two functions to be used within _your_ program files :lol:


yes, OK, but they are more like programs, and I was meerly pointing out that in awk parlance the term "program file" is used. Anyhow, its symantics, "module", "function", "library" ... take your pick.

best ... and thanks again for the {functions,*}

khay
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum