文本文件分组统计效率

flyinweb 2011-04-07 11:57:20
日志文件G级以上,格式如下:
06-Apr-2011 10:59:30.069 queries: client 59.151.213.106#1027: query: xinmdaengwang.com IN SOA -
06-Apr-2011 10:59:30.070 queries: client 219.141.157.71#19803: query: cnlemdai.com IN MX -ED
06-Apr-2011 10:59:30.070 queries: client 72.14.202.82#45226: query: 0531-7963334.11004.com.cn IN A -
06-Apr-2011 10:59:30.072 queries: client 221.232.247.211#43678: query: www.jfmfy.com IN A -ED
06-Apr-2011 10:59:30.079 queries: client 222.221.0.11#60340: query: panyu.gdoodjob.cn IN A -EDC
06-Apr-2011 10:59:30.082 queries: client 202.96.136.239#50156: query: jotodyda.com IN MX -
06-Apr-2011 10:59:30.083 queries: client 61.128.114.166#55841: query: ksbdm.cjdrc.com.cn IN A -
06-Apr-2011 10:59:30.087 queries: client 61.235.70.98#50278: query: www.hzqdmd.net IN AAAA -ED
06-Apr-2011 10:59:30.091 queries: client 59.151.213.106#55959: query: zhengsdgutang.com IN IXFR -
06-Apr-2011 10:59:30.098 queries: client 61.140.11.169#45639: query: mail.feidsyunjx.com IN A -
....

使用shell命令进行统计,很费时,效率不高,不知用C与PERL哪个效率会更高些,望高手指点一下
# awk 'BEGIN{FS="[ #]"};{if($0~/06-Apr-2011 /)a[$5]+=1};END{for(j in a)print j,"\t",a[j]}' log.log| sort -n -r -k +2|head -10
59.151.213.106 21792
59.151.213.112 6241
218.85.157.74 944
218.85.157.67 914
159.226.202.9 757
159.226.202.7 634
159.226.202.18 521
1.202.214.5 399
61.140.11.214 395
220.181.125.132 391
...全文
96 回复 打赏 收藏 转发到动态 举报
写回复
用AI写文章
回复
切换为时间正序
请发表友善的回复…
发表回复

69,382

社区成员

发帖
与我相关
我的任务
社区描述
C语言相关问题讨论
社区管理员
  • C语言
  • 花神庙码农
  • 架构师李肯
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧