37,719
社区成员
发帖
与我相关
我的任务
分享
zou@zou-VirtualBox:~/qsbk$ tree
.
items.py
qsbk
nit__.py
items.py
pipelines.py
settings.py
spiders
_init__.py
qsbk_spider.py
scrapy.cfg
-------------------------
vi items.py
from scrapy.item import Item,Field
class TutorialItem(Item):
# define the fields for your item here like:
# name = Field()
pass
class Qsbk(Item):
title = Field()
link = Field()
desc = Field()
-----------------------
vi qsbk/spiders/qsbk_spider.py
from scrapy.spider import Spider
class QsbkSpider(Spider):
name = "qsbk"
allowed_domains = ["qiushibaike.com"]
start_urls = ["http://www.qiushibaike.com"]
def parse(self, response):
filename = response
open(filename, 'wb').write(response.body)
zou@zou-VirtualBox:~/qsbk$ scrapy shell http://www.qiushibaike.com
/home/zou/qsbk/qsbk/spiders/qsbk_spider.py:1: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
from scrapy.spider import Spider
2015-12-21 00:18:30 [scrapy] INFO: Scrapy 1.0.3 started (bot: qsbk)
2015-12-21 00:18:30 [scrapy] INFO: Optional features available: ssl, http11
2015-12-21 00:18:30 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'qsbk.spiders', 'SPIDER_MODULES': ['qsbk.spiders'], 'LOGSTATS_INTERVAL': 0, 'BOT_NAME': 'qsbk'}
2015-12-21 00:18:30 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2015-12-21 00:18:30 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-12-21 00:18:30 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-12-21 00:18:30 [scrapy] INFO: Enabled item pipelines:
2015-12-21 00:18:30 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-12-21 00:18:30 [scrapy] INFO: Spider opened
2015-12-21 00:18:30 [scrapy] DEBUG: Retrying <GET http://www.qiushibaike.com> (failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-12-21 00:18:30 [scrapy] DEBUG: Retrying <GET http://www.qiushibaike.com> (failed 2 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
2015-12-21 00:18:30 [scrapy] DEBUG: Gave up retrying <GET http://www.qiushibaike.com> (failed 3 times): [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/shell.py", line 63, in run
shell.start(url=url)
File "/usr/local/lib/python2.7/dist-packages/scrapy/shell.py", line 44, in start
self.fetch(url, spider)
File "/usr/local/lib/python2.7/dist-packages/scrapy/shell.py", line 87, in fetch
reactor, self._schedule, request, spider)
File "/usr/lib/python2.7/dist-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "<string>", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure <class 'twisted.internet.error.ConnectionDone'>>]