请教perl文本处理问题
我要对一个文本进行处理,要把有特征词polymorphism的行输出到另一个文件。原文件的格式是这样的:
Disease variants: 20001
Polymorphisms: 36549
Unclassified variants: 5717
Total: 62267
Main Swiss-Prot Seq AA Type of
gene name Entry name AC FTId pos change variant dbSNP Disease name
_________ ___________________ __________ _____ ______ ____________ __________ _____________________
A1BG A1BG_HUMAN P04217 VAR_018369 52 R -> H Polymorphism rs893184 -
A1CF A1CF_HUMAN Q9NQ94 VAR_052201 555 V -> M Polymorphism rs9073 -
A4GALT A4GAT_HUMAN Q9NPC4 VAR_014297 183 M -> K Unclassified - -
A4GALT A4GAT_HUMAN Q9NPC4 VAR_017508 187 G -> D Polymorphism rs28940572 -
A4GNT A4GCT_HUMAN Q9UNA3 VAR_022096 218 A -> D Polymorphism rs2246945 -
AAAS AAAS_HUMAN Q9NRG9 VAR_012804 15 Q -> K Disease - Achalasia-addisonianism-
AAAS AAAS_HUMAN Q9NRG9 VAR_037060 108 K -> M Polymorphism rs13330 -
AAAS AAAS_HUMAN Q9NRG9 VAR_012805 160 H -> R Disease - Achalasia-addisonianism-
另外还有很多行,这只是一小部分,我用perl写出如下代码:
use warnings;
open IN, "<humsavar.txt";
open OUT, ">Polymorphism.txt";
select OUT;
while(<IN>)
{$lines=<IN>;
if($lines=~m/.*Polymorphism.*/i)
{ print "$lines";
}
}
close IN;
close OUT;
最后得到的结果是18307条结果,但实际结果是36549,我始终找不出问题出在哪里,求高手指教啊。
PS:虽然用excel排序的方法也能解决,但我还是觉的perl好一些,因为还有其他的数据要同样处理,那些情况下excel就不行了。