求大神指导,python问题,期待您的回答,万分感谢!

handami 2018-05-22 10:31:11
下面是一段爬虫代码,运行没有报错,但没有爬出来任何东西
#enconding:utf-8
import requests
from lxml import etree

def getNewsURLLIST(baseURL,headers):
x=requests.get(baseURL,headers)
x.encoding = "utf-8"
html = x.content

selector =etree.HTML(html)
contents = selector.xpath('//div[@id="content_right"]/div[@class="content_list"]/ul/li[div]')
for eachlink in contents:
url = eachlink.xpath('div/a/@href')[0]
title = eachlink.xpath('div/a/text()')[0]
ptime = eachlink.xpath('div[@class="dd_time"]/text()')[0]
yield title,url,ptime

if __name__=='__main__':
urltemplate = 'http://www.chinanews.com/scroll-news/{0}/{1}{2}/news.shtml'
testurl = urltemplate.format('2018','5','21')
header = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
print (testurl)
urllist = getNewsURLLIST(testurl,header)
for title,url,ptime in urllist:
print (title,url,ptime)



...全文
972 7 打赏 收藏 转发到动态 举报
写回复
用AI写文章
7 条回复
切换为时间正序
请发表友善的回复…
发表回复
handami 2018-05-22
  • 打赏
  • 举报
回复
看这个吧,那个有些错误,谢谢 #enconding:utf-8 import requests from lxml import etree def getNewsURLLIST(baseURL,headers): x=requests.get(baseURL,headers) x.encoding = "utf-8" html = x.content selector =etree.HTML(html) contents = selector.xpath('//div[@id="content_right"]/div[@class="content_list"]/ul/li') for eachlink in contents: url = eachlink.xpath('/div[@class="dd_bt"/a/@href')[0] title = eachlink.xpath('/div[@class="dd_bt"/a/text()')[0] ptime = eachlink.xpath('/div[@class="dd_time"]/text()')[0] yield title,url,ptime # def getNewsContent(urlliast): # for title,url,ptime in urllist: # x=requests.get(url) # x.encoding="utf-8" # html = x.contnet # selector = etree.HTML(html) # contents =selector.xpath('/div[@class="left_zw"]/p/text()') # news = '\r\n'.join(contents) # yield title,url,ptime,news if __name__=='__main__': urltemplate = 'http://www.chinanews.com/scroll-news/mil/{0}/{1}{2}/news.shtml' testurl = urltemplate.format('2018','05','21') header = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'} print (testurl) # # urllist = getNewsURLLIST(testurl,header) # # for title,url,ptime in urllist: # # print (title,url,ptime) # # newscontents = getNewsContent(urllist) # # f = open('news.txt','w') # # w = lambda x:f.write(x+u'\r\n') # # for title,url,ptime,news in newscontents: # # w(u'~'*100) # # w(title) # # w(url) # # w(ptime) # # w(news) # f.close()
handami 2018-05-22
  • 打赏
  • 举报
回复
现在网址输入没有问题了,就是抓不出来内容啊 #enconding:utf-8 import requests from lxml import etree def getNewsURLLIST(baseURL,headers): x=requests.get(baseURL,headers) x.encoding = "utf-8" html = x.content selector =etree.HTML(html) contents = selector.xpath('//div[@id="content_right"]/div[@class="content_list"]/ul/li') for eachlink in contents: url = eachlink.xpath('/div[@class="dd_lm"/a/@href')[0] title = eachlink.xpath('/div[@class="dd_bt"/a/text()')[1] ptime = eachlink.xpath('/div[@class="dd_time"]/text()')[2] yield title,url,ptime # def getNewsContent(urlliast): # for title,url,ptime in urllist: # x=requests.get(url) # x.encoding="utf-8" # html = x.contnet # selector = etree.HTML(html) # contents =selector.xpath('/div[@class="left_zw"]/p/text()') # news = '\r\n'.join(contents) # yield title,url,ptime,news if __name__=='__main__': urltemplate = 'http://www.chinanews.com/scroll-news/mil/{0}/{1}{2}/news.shtml' testurl = urltemplate.format('2018','05','21') header = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'} print (testurl)
chuifengde 2018-05-22
  • 打赏
  • 举报
回复
testurl = urltemplate.format('2018','5','21')===>>testurl = urltemplate.format('2018','05','21')
oyljerry 2018-05-22
  • 打赏
  • 举报
回复
月份前要加0

urltemplate =  'http://www.chinanews.com/scroll-news/{0}/{1:02d}{2}/news.shtml'
handami 2018-05-22
  • 打赏
  • 举报
回复
嗯嗯,好像确实是网址出了问题,但是我想要建立网址模板,来获取不同日期的网页,这个该怎么办到呢?
天愚 2018-05-22
  • 打赏
  • 举报
回复

你的网址不对啊
chuifengde 2018-05-22
  • 打赏
  • 举报
回复
一楼的代码,改那个日期部分,得到下面的内容
http://www.chinanews.com/scroll-news/2018/0521/news.shtml
国际 http://www.chinanews.com/world.shtml 5-21 23:58
国际 http://www.chinanews.com/world.shtml 5-21 23:56
国际 http://www.chinanews.com/world.shtml 5-21 23:52
港澳 http://www.chinanews.com/compatriot.shtml 5-21 23:51
港澳 http://www.chinanews.com/compatriot.shtml 5-21 23:51
社会 http://www.chinanews.com/society.shtml 5-21 23:01
国际 http://www.chinanews.com/world.shtml 5-21 23:00
社会 http://www.chinanews.com/society.shtml 5-21 22:50
财经 http://finance.chinanews.com/economic.shtml 5-21 22:45
国内 http://www.chinanews.com/china.shtml 5-21 22:33
国内 http://www.chinanews.com/china.shtml 5-21 22:30
体育 http://www.chinanews.com/sports.shtml 5-21 22:25
文化 http://www.chinanews.com/wenhua.shtml 5-21 22:23
财经 http://finance.chinanews.com/economic.shtml 5-21 22:02
财经 http://finance.chinanews.com/economic.shtml 5-21 22:01

37,720

社区成员

发帖
与我相关
我的任务
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
  • 脚本语言(Perl/Python)社区
  • IT.BOB
加入社区
  • 近7日
  • 近30日
  • 至今

试试用AI创作助手写篇文章吧