Mac Python 3.5 环境用Python写爬虫问题

leon_0907 2017-10-10 10:46:17

学习python中，做个小作业，用beautifulsoup写爬虫，爬取腾讯网站上面的信息，包含每个新闻的标题、链接、文字内容，写的代码和bug显示如下，希望各位大大帮忙看一下

import urllib.request as urlrequest
from bs4 import BeautifulSoup

url = "http://games.qq.com/"
content = urlrequest.urlopen(url).read().decode('gbk','ignore').encode('utf-8')
soup = BeautifulSoup(content,'html.parser')

all_top_news = soup.find_all(class_='pic_txt_list t_news_list')

with open('qq_daily_news.txt','w') as outputfile:

for each_news in all_top_news:
item_href = each_news.find('a')['href']
item_name = each_news.find('img')['alt']
print('{} {}'.format(item_href, item_name))

try:
news_content = urlrequest.urlopen(item_href).read().decode('gbk')
soup2 = BeautifulSoup(news_content,'html.parser')
content = soup2.find_all(class_= 'Cnt-Main-Article-QQ')
alltext = content[0].find_all('p')
text =[]
for i in alltext:
text.append(i.get_text())
except:
pass

outputfile.write('\n{}\n{}\n{}'.format(item_href, item_name, text))

显示结果如下：

http://games.qq.com/a/20171010/000046.htm 充满了恋爱气息！王者荣耀花嫁小乔COS
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-31-65b191dc6dab> in <module>()
26 pass
27
---> 28 outputfile.write('\n{}\n{}\n{}'.format(item_href, item_name, text))

NameError: name 'text' is not defined

...全文

200 1 打赏收藏转发到动态举报

写回复

用AI写文章

1 条回复

切换为时间正序

请发表友善的回复…

发表回复

oyljerry 2017-10-11

打赏
举报


      text =[]
      try:
            news_content = urlrequest.urlopen(item_href).read().decode('gbk')
            soup2 = BeautifulSoup(news_content,'html.parser')
            content = soup2.find_all(class_= 'Cnt-Main-Article-QQ')
            alltext = content[0].find_all('p')
           
            for i in alltext:
                text.append(i.get_text())
        except:
            pass
        
        outputfile.write('\n{}\n{}\n{}'.format(item_href, item_name, "".join(text)))