python抓取文本内容
用python脚本对timit语音库中所有的.txt文本的路径及内容进行抓取,其中在进行内容抓取的时候需要去除标点除了单引号和连接符号之外的所有符号(如” : ! ~ ? . )等。最后得到的文本格式如下:
data/train/dr1/fcjf0/Untitled/sa1 SHE HAD YOUR DARK SUIT IN GREASY WASH WATER ALL YEAR
data/train/dr1/fcjf0/Untitled/sa2 DON'T ASK ME TO CARRY AN OILY RAG LIKE THAT
data/train/dr1/fcjf0/Untitled/si1027 EVEN THEN IF SHE TOOK ONE STEP FORWARD HE COULD CATCH HER
data/train/dr1/fcjf0/Untitled/si1657 OR BORROW SOME MONEY FROM SOMEONE AND GO HOME BY BUS
data/train/dr1/fcjf0/Untitled/si648 A SAILBOAT MAY HAVE A BONE IN HER TEETH ONE MINUTE AND LIE
求大神指点代码!!!!!