re提取不到数据,求大神解答

weixin_46609022 2020-07-08 12:27:28
提取到的关键网页代码:
<tr onclick="location.href='/city/sz.html';" style="cursor: pointer;">
<th>1</th>
<th>
<a href="/city/sz.html" title="深圳房价行情,房价概况走势,数据分析"> 深圳</a>
</th>
<th>74,929</th>
<th class="red">+18.96%</th>
<th class="red">+2.86%</th>
</tr>, <tr onclick="location.href='/city/bj.html';" style="cursor: pointer;">
<th>2</th>
<th>
<a href="/city/bj.html" title="北京房价行情,房价概况走势,数据分析"> 北京</a>
</th>
<th>62,567</th>
<th class="green">-2.09%</th>
<th class="green">-4.76%</th>
</tr>, <tr onclick="location.href='/city/sh.html';" style="cursor: pointer;">
......后边同类型

我的代码:
import requests
from tool import useragenttool
import bs4
import re
import openpyxl

def open_url(url):
"""解析网址,获取源码信息"""
res = requests.get(url, headers=useragenttool.get_headers())
return res

def find_data(res):
datas = []
soup = bs4.BeautifulSoup(res.text, "html.parser")
content = soup.find(class_="gb-dataListBox")
# print(content)
target = content.find_all("tr", style="cursor: pointer;")
# print(target)
target = iter(target)

for each in target:
# print(each.text)
if each.text.isnumeric():
datas.append([
re.search(r'(.+)', next(target).text).group(1),
re.search(r'\d.*', next(target).text).group(),
re.search(r'\d.*', next(target).text).group(),
re.search(r'\d.*', next(target).text).group()])
print(datas)

return datas


def main():
url = "https://www.creprice.cn/rank/cityforsale.html"
res = open_url(url)
datas = find_data(res)


if __name__ == '__main__':
main()

为什么 print(datas)出来的datas列表空的啊,我要爬城市,房价还有后边两个百分数,新手百思不得其解,求大神解答
...全文
6033 3 打赏 收藏 转发到动态 举报
写回复
用AI写文章
3 条回复
切换为时间正序
请发表友善的回复…
发表回复
AutumnSea03 2020-07-28
  • 打赏
  • 举报
回复
帖子是不是可以结一下?
weixin_46609022 2020-07-10
  • 打赏
  • 举报
回复
谢谢大佬!!
AutumnSea03 2020-07-09
  • 打赏
  • 举报
回复
import requests
import bs4
import re


def find_data():
    head = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
        'Connection': 'keep-alive'}
    res = requests.get('https://www.creprice.cn/rank/cityforsale.html',headers=head)
    content = bs4.BeautifulSoup(res.text, "html.parser").find(class_="gb-dataListBox")
    target = content.find_all("tr", style="cursor: pointer;")
    info_list = []
    for each in target:
        tmp_dic = dict()
        city = re.search('[^\x00-\xff]+',each.text).group()
        price = re.search('\d+,\d+', each.text).group()
        rate = re.findall('[+-]\d+.*%', each.text)
        tmp_dic[city] = [price,rate[1],rate[0]]
        info_list.append(tmp_dic)
    print(info_list)

if __name__ == '__main__':
    find_data()
[{'深圳': ['74,929', '+18.96%', '+2.86%']}, {'北京': ['62,567', '-2.09%', '-4.76%']}, {'上海': ['54,911', '+5.85%', '-0.25%']}, {'厦门': ['47,817', '+5.66%', '+0.27%']}, {'三亚': ['38,291', '+12.01%', '+3.72%']}, {'广州': ['35,934', '+6.13%', '+5.43%']}, {'杭州': ['31,487', '+4.1%', '+3.1%']}, {'南京': ['31,416', '+2.87%', '-0.24%']}, {'福州': ['26,288', '+0.55%', '+1.78%']}, {'天津': ['25,751', '+0.14%', '+1.4%']}, {'宁波': ['23,544', '+15.65%', '+0.5%']}, {'珠海': ['23,473', '+1.43%', '-0.37%']}, {'苏州': ['23,294', '+6.32%', '-1.96%']}, {'青岛': ['21,890', '+1.65%', '+0.76%']}, {'温州': ['21,777', '+7.11%', '-1.31%']}, {'丽水': ['19,428', '+7.9%', '-2.74%']}, {'武汉': ['18,942', '+4.89%', '+0.3%']}, {'东莞': ['17,921', '+11.79%', '+0.86%']}, {'金华': ['17,279', '+5.54%', '-0.69%']}, {'成都': ['16,726', '+7.34%', '+3.11%']}, {'无锡': ['16,675', '+12.46%', '+0.13%']}, {'合肥': ['16,500', '+4.93%', '-0.73%']},...., {'鹤岗': ['2,307', '-2.19%', '-2.92%']}]

37,719

社区成员

发帖
与我相关
我的任务
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
  • 脚本语言(Perl/Python)社区
  • IT.BOB
加入社区
  • 近7日
  • 近30日
  • 至今

试试用AI创作助手写篇文章吧