循环里重复调用scrapy爬虫报错：twisted.internet.error.ReactorNotRestartable

pobaby 2017-07-29 02:08:19

报如下错误：
twisted.internet.error.ReactorNotRestartable

请问如何在调度里或者循环里，重复执行爬虫，谢谢大家，在线等。。。

def aqi(crawler, spider):

    try:

        runner = CrawlerRunner(settings)

        db = myDbConnect()

        spider = db.query(TSpiderC).filter(TSpiderC.uuid == u'1e627cd3c6ee8c540318006de209983b').one()

        # crawler.crawl(CSpider,rule=spider)

        # crawler.start()

        d = runner.crawl(CSpider, rule=spider)

        d = runner.join()

        d.addBoth(lambda _: reactor.stop())

        try:

            reactor.run()

        except Exception as e:

            print e

    except Exception, e:

        print e,e.message

        pass



if __name__ == '__main__':

    settings = get_project_settings()

    crawler = CrawlerProcess(settings)

    scheduler = BackgroundScheduler()

    # scheduler = TwistedScheduler()

    scheduler.daemonic=False

    cron = CronTrigger(second='*/20')

    scheduler.add_job(aqi, cron, args=[crawler, None])

    scheduler.start()

    settings = get_project_settings()

    configure_logging(settings)





    while True:

        time.sleep(1)

        print 'sleep..................'

...全文

3611 4 打赏收藏转发到动态举报

写回复

用AI写文章

4 条回复

切换为时间正序

请发表友善的回复…

发表回复

负刀入梦里 2018-09-11

打赏
举报

请问楼主解决了吗,如果重复调用爬虫?
我把cmdline代码放在close_spider里面,这样运行起来报错
raise error.ReactorAlreadyRunning()
twisted.internet.error.ReactorAlreadyRunning

qq_38891604 2018-02-23

打赏
举报

楼主，你能布置那本《精通python爬虫框架scrapy》的环境吗？我用vagrant的时候显示没有资源，你有遇过这种问题吗？

袁宝东 2017-12-20

打赏
举报

import sys reload(sys) sys.setdefaultencoding('utf8') import os from spider import * from xml2json import * from twisted.internet import reactor, defer if __name__ == '__main__': if len(sys.argv) != 2: print "usage: python main.py config.xml" print "output: config.json" exit() input_path = sys.argv[1] pathDir = os.listdir(input_path) dfs = set() for input_file in pathDir: #input_file = sys.argv[1] input_file = input_path + '\\' + input_file file_name, file_ext = input_file.split('.') save_path = 'JSON\\' + input_path if not os.path.exists(save_path): os.makedirs(save_path) output_file = 'JSON\\' + file_name + '.json' xml = open(input_file, 'r').read() result = Xml2Json(xml).result configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'}) runner = CrawlerRunner() d = runner.crawl(QuotesSpider, output=output_file, **result['config']) dfs.add(d) #d.addBoth(lambda _: reactor.stop()) defer.DeferredList(dfs).addBoth(lambda _: reactor.stop()) reactor.run() 改成红色部分那样