学习python中,做个小作业,用beautifulsoup写爬虫, 爬取腾讯网站上面的信息,包含每个新闻的标题、链接、文字内容,写的代码和bug显示如下,希望各位大大帮忙看一下
import urllib.request as urlrequest
from bs4 import BeautifulSoup
url = "http://games.qq.com/"
content = urlrequest.urlopen(url).read().decode('gbk','ignore').encode('utf-8')
soup = BeautifulSoup(content,'html.parser')
all_top_news = soup.find_all(class_='pic_txt_list t_news_list')
with open('qq_daily_news.txt','w') as outputfile:
for each_news in all_top_news:
item_href = each_news.find('a')['href']
item_name = each_news.find('img')['alt']
print('{} {}'.format(item_href, item_name))
try:
news_content = urlrequest.urlopen(item_href).read().decode('gbk')
soup2 = BeautifulSoup(news_content,'html.parser')
content = soup2.find_all(class_= 'Cnt-Main-Article-QQ')
alltext = content[0].find_all('p')
text =[]
for i in alltext:
text.append(i.get_text())
except:
pass
outputfile.write('\n{}\n{}\n{}'.format(item_href, item_name, text))
显示结果如下:
http://games.qq.com/a/20171010/000046.htm 充满了恋爱气息!王者荣耀花嫁小乔COS
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-31-65b191dc6dab> in <module>()
26 pass
27
---> 28 outputfile.write('\n{}\n{}\n{}'.format(item_href, item_name, text))
NameError: name 'text' is not defined