selenium webdriver 不能得到网页全部内容

iwsyang 2018-01-30 05:54:30

https://www.forrent.com/apartment-community-profile/1012635

I am trying to parse a web page, such as this one. Selenium could return some of the content of this page, but not all of them. For example "Professionally Managed by: B & A Associates" is in the web page, but its not returned by the variable 'content' in the script. Any idea why is that, how to solve this problem?
我的脚本程序试着在分析一个页面，如 https://www.forrent.com/apartment-community-profile/1012635

但是selenium webdrive 不能返回页面中的完整内容。比如说页面中的‘Professionally Managed by: B & A Associates’。请问这是什么原因？有什么解决办法吗？



from time import sleep                                                                                                                                                                                          

from geocoder_helpers import normalized                                                                                                                                                                         

                                                                                                                                                                                                                

import os                                                                                                                                                                                                       

import urllib2                                                                                                                                                                                                  

from bs4 import BeautifulSoup                                                                                                                                                                                   

import json                                                                                                                                                                                                     

                                                                                                        

from selenium import webdriver                                                                                                                                                                                  

from selenium.common.exceptions import TimeoutException                                                                                                                                                         

from selenium.webdriver.support.ui import WebDriverWait                                                                                                                                                         

from selenium.webdriver.common.by import By                                                                                                                                                                     

from selenium.webdriver.support import expected_conditions as EC                                                                                                                                                

from pyvirtualdisplay import Display                                                                                                                                                                            

                                                                                                                                                                                                                

display = Display(visible=0, size=(800, 600))                                                                                                                                                                   

display.start()



url = 'https://www.forrent.com/apartment-community-profile/1012635'  

driver = webdriver.Firefox(executable_path='/home/yliu/repos/funnel_objects/listing_sites/geckodriver')                                                                                                     

try:                                                                                                                                                                                                        

    driver.set_page_load_timeout(20)                                                                                                                                                                       

    driver.get(url)                                                                                                                                                                                         



    #WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, "contactHeading")))                                                                                                             

    WebDriverWait(driver, 40)                                                                                                                                                                               

    html = driver.page_source                                                                                                                                                                               

    content = BeautifulSoup(html,"lxml")                                                                                                                                                                    

    driver.quit()                                                                                                                                                                                           

    return content                                                                                                                                                                                          

except TimeoutException:                                                                                                                                                                                    

    print('time out from contact')                                                                                                                                                                          

    return None

...全文

3306 7 打赏收藏转发到动态举报

写回复

用AI写文章

7 条回复

切换为时间正序

请发表友善的回复…

发表回复

Scala没有静态 2018-09-13

打赏
举报

我也是可以打开浏览器,但是得到的源代码不是js渲染之后的

CHn_Lef 2018-08-09

打赏
举报

这个页面我打不开，应该是iframe标签的问题。跳到那个标签去就行了。比如这个

<iframe id="jerichotabiframe_0" class="jerichotab" name="jerichotabiframe_0" src="/portal/a/index/welcome" frameborder="0" scrolling="yes" style="width: 1123px; height: 579px; border: 0px;"></iframe>

使用这句跳转

driver.switch_to_frame('jerichotabiframe_0')

有时候跳转不过去的话可以先跳回默认：

driver.switch_to.default_content()

或者根据页面源码进行跳转

iwsyang 2018-01-31

打赏
举报

引用 1 楼 oyljerry 的回复:

是不是页面中有ajax等没有加载完成。

我等待了40秒钟，还没加载完成，有这种可能吗？

oyljerry 2018-01-31

打赏
举报

引用 2 楼 iwsyang 的回复:

[quote=引用 1 楼 oyljerry 的回复:] 是不是页面中有ajax等没有加载完成。

我等待了40秒钟，还没加载完成，有这种可能吗？[/quote] 还可能网络问题，部分内容加载失败了

虾米馅煎包 2018-01-31

打赏
举报

没有网速和代码问题的话有些页面是你必须要拖到滚动条内容才会加载出来的你可以试试用浏览器驱动去做这个爬虫处理。

oyljerry 2018-01-30

打赏
举报

是不是页面中有ajax等没有加载完成。

selenium jar包；版本为2.45；方便使用Eclipse完成web自动化；希望能对后续用户的工作提供便利

selenium官方下载版，即selenium webdriver

WebDriver 模拟火狐浏览器登录包 selenium-java-2.45.0.jar selenium-java-2.45.0-srcs.jar

功能测试软件，使用selenium-ide-2.5.0功能测试组件,只要打开firefox(火狐)浏览器，工具附加组件，从文件安装附件组件，附加组件就行！

In order to create scripts that interact with the Selenium Server (Selenium RC, Selenium Remote Webdriver) or create local Selenium WebDriver script you need to make use of language-specific client drivers. Unless otherwise specified, drivers include both 1.x and 2.x style drivers.

脚本语言

37,742

社区成员

34,211

社区内容

发帖

与我相关

我的任务

社区管理员

加入社区

近7日
近30日
至今

加载中

查看更多榜单

试试用AI创作助手写篇文章吧

+ 用AI写文章