linux 面试题

longai123 2017-06-21 03:51:51

一
有个文件如下：
http://a.domain.com/1.html
http://b.domain.com/1.html
http://c.domain.com/1.html
http://a.domain.com/2.html
http://b.domain.com/2.html
http://a.domain.com/3.html
要求：得到主机名（和域名），并统计哪个网址出现的次数，并排序。可以shell或C。
得到的结果应该是:
3 a.domain.com
2 b.domain.com
1 c.domain.com
root@mail ~]# awk ‘BEGIN{FS=”/”}{arr[$3]++}END{for(i in arr) print
3 a.domain.com
2 b.domain.com
1 c.domain.com

awk ‘BEGIN{FS=”/”}{arr[$3]++}END{for(i in arr) print
cat file | sed -e ' s/http:\/\///' -e ' s/\/.*//' | sort | uniq -c | sort -rn
awk -F/ '{print $3}' file |sort -r|uniq -c|awk '{print $1"\t",$2}'
这三个命令都可以
本人对awk sed grep工具命令不太熟悉，真大神具体说说。。。
二
查询file1里面空行的所在行号 awk ‘{if($0~/^$/)print NR}’ file
这个也不太明白。。

...全文

389 4 打赏收藏转发到动态举报

写回复

用AI写文章

4 条回复

切换为时间正序

请发表友善的回复…

发表回复

longai123 2017-06-24

打赏
举报

sed -e ' s/http:\/\///' -e ' s/\/.*//' 如何去掉首尾 \/\/ 转义字符\ \/\/就是// 为什么还要多两个// 后面的去尾部也不明白

ipqtjmqj 2017-06-22

打赏
举报

第3句更像是前两名命令的结合，就不需要解释了查询file里面空行的所在行号

 awk ‘{if($0~/^$/)print NR}’ file

这句很简单，$0表示整行，~是正则表达式专用的运算符，/^$/是正则表达式匹配空行，NR是内建变量，表示行号(Number of Record)

ipqtjmqj 2017-06-22

打赏
举报

第2句命令

 cat file | sed -e ' s/http:\/\///' -e ' s/\/.*//' | sort | uniq -c | sort -rn

两个sed分别是去掉首尾，sort是排序，因为之后uniq的输入必须是排好序的，uniq的选项-c就是加前缀表示出来的次数，最后再sort一下，-rn两个选项，r是reverse表示从大到小，n是numeric表示按数值排序而不是ASCII码

ipqtjmqj 2017-06-22

打赏
举报

第1句你没发完整吧，应该是

awk 'BEGIN{FS="/"}{arr[$3]++}END{for(i in arr) print arr[i] " "  i}' testfile

testfile中存放测试内容。命令解释：分为三段， 1. awk命令名 2.单引号来的awk命令内容，也可以通过-f选项从外部文件中读入 3.要处理的内容文件，本例中名字为testfile 通过man awk可以获取awk命令的用法


GAWK(1)                                                           Utility Commands                                                          GAWK(1)

NAME
       gawk - pattern scanning and processing language

SYNOPSIS
       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...

       pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
       pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...

       dgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...

awk语法的语法，分为三段，BEGIN{}, {}, END{} 其中BEGIN{}与END{}是可选中，中间的正文是必须的中，中间的正文用来循环处理每一行。在本例中，在BEGIN中设置了FS,field seperator即每一项的分隔为斜杠，在正文中变量$3,是第3项，即域名，arr为新建的变量，类似于c++中的std::map类型，key为域名，value为计数。在END中打印出arr