文本文件分组统计效率
日志文件G级以上,格式如下:
06-Apr-2011 10:59:30.069 queries: client 59.151.213.106#1027: query: xinmdaengwang.com IN SOA -
06-Apr-2011 10:59:30.070 queries: client 219.141.157.71#19803: query: cnlemdai.com IN MX -ED
06-Apr-2011 10:59:30.070 queries: client 72.14.202.82#45226: query: 0531-7963334.11004.com.cn IN A -
06-Apr-2011 10:59:30.072 queries: client 221.232.247.211#43678: query: www.jfmfy.com IN A -ED
06-Apr-2011 10:59:30.079 queries: client 222.221.0.11#60340: query: panyu.gdoodjob.cn IN A -EDC
06-Apr-2011 10:59:30.082 queries: client 202.96.136.239#50156: query: jotodyda.com IN MX -
06-Apr-2011 10:59:30.083 queries: client 61.128.114.166#55841: query: ksbdm.cjdrc.com.cn IN A -
06-Apr-2011 10:59:30.087 queries: client 61.235.70.98#50278: query: www.hzqdmd.net IN AAAA -ED
06-Apr-2011 10:59:30.091 queries: client 59.151.213.106#55959: query: zhengsdgutang.com IN IXFR -
06-Apr-2011 10:59:30.098 queries: client 61.140.11.169#45639: query: mail.feidsyunjx.com IN A -
....
使用shell命令进行统计,很费时,效率不高,不知用C与PERL哪个效率会更高些,望高手指点一下
# awk 'BEGIN{FS="[ #]"};{if($0~/06-Apr-2011 /)a[$5]+=1};END{for(j in a)print j,"\t",a[j]}' log.log| sort -n -r -k +2|head -10
59.151.213.106 21792
59.151.213.112 6241
218.85.157.74 944
218.85.157.67 914
159.226.202.9 757
159.226.202.7 634
159.226.202.18 521
1.202.214.5 399
61.140.11.214 395
220.181.125.132 391