Python爬虫解析Json问题
小白一枚,最近在学习Python的网络爬虫技术,编制了简易的一个抓取国家地震台网的数据的爬虫,使用拼接URL地址的方式请求数据,但是在运行的时候一直会报错,测试后发现Urlencode方面没有问题,能够正常地输出每一页数据的拼接URL地址,然而在提取数据的过程中,在抓取完每页20个数据后就会开始报错,并不能实现使用for循环来遍历每一页数据,请问这种情况如何解决?
错误信息:
Saved to Mongo(这里是每页最后一个数据抓取成功后保存到数据库的提示)
Error ("'dict' object has no attribute 'loads'",)
Error ("'NoneType' object has no attribute 'loads'",)
……
全部代码:
# --coding:utf-8 --
import requests
import json
from urllib.parse import urlencode
from pymongo import MongoClient
import time
base_url = 'http://www.ceic.ac.cn/ajax/search?'
headers = {
'Host': 'www.ceic.ac.cn',
'Referer': 'http://www.ceic.ac.cn/history',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko',
'X-Requested-With': 'XMLHttpRequest',
}
client = MongoClient()
db = client['EarthQuake']
collection = db['EarthQuake']
max_page = 300
def get_page(page):
params = {
"start": "",
"end": "",
"jingdu1": "",
"jingdu2": "",
"weidu1": "",
"weidu2": "",
"height1": "",
"height2": "",
"zhenji1": "",
"zhenji2": "",
"page": page,
}
url = base_url + urlencode(params)
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
length = len(response.text)-1
text1 = response.text
text = text1[1:length]
return json.loads(text)
except Exception as e:
print('Error', e.args)
def parse_page(json):
if json:
items = json.get('shuju')
for item in items:
#item = item.get('shuju')
eq = {}
eq['DEPTH'] = item.get('EPI_DEPTH')
eq['LAT'] = item.get('EPI_LAT')
eq['LON'] = item.get('EPI_LON')
eq['LOCATION'] = item.get('LOCATION_C')
eq['TIME'] = item.get('O_TIME')
eq['LEVEL'] = item.get('M')
yield eq
def save_to_mongo(result):
if collection.insert(result):
print('Saved to Mongo')
def savedata(results):
for result in results:
print(result)
save_to_mongo(result)
if __name__ == '__main__':
for page in range(1, max_page + 1):
json = get_page(page)
results = parse_page(json)
time.sleep(5)
savedata(results)
请问这种情况应该如何处理?