来八八美团页面的防采集方法

huerniu 2014-07-14 02:57:58
偶然在美团的页面上发现了防采集的做法,转过来给大家一起分析分析:

页面地址:http://hz.meituan.com/deal/15565584.html

在这个页面上找到显示商家地址的位置,如下:
胤隆汇 商家累积评分:4.8分1471人评价地址:江干区新塘路杭海路238号(森禾广场)查看地图公交/驾车去这里电话:0571-86963688/86963788

但是,如果在页面上点击右键查看源文件,可以看到对应的位置是:
<div id="J-biz-pos" data-poi="[{"shopid":566878,"name":"\u80e4\u9686\u6c47","address":"\u6c5f\u5e72\u533a\u65b0\u5858\u8def\u676d\u6d77\u8def238\u53f7\uff08\u68ee\u79be\u5e7f\u573a\uff09","range":"\u94b1\u6c5f\u65b0\u57ce","rangeid":5455,"disid":58,"disname":"\u6c5f\u5e72\u533a","dpshopid":0,"mapurl":"","trafficinfo":"","phone":"0571-86963688\/86963788","latlng":"[30.253548,120.206872]","city":50,"url":"","poiid":6307803,"poilevel":{"avgscore":"4.8","fbcount":1471},"cityname":"\u676d\u5dde","status":0,"subwayname":"","subwaydis":0,"subwayslug":"","appointmentDay":0}]" data-reservationPhoneNumber="" class="all-biz cf" data-uix="collapse" data-params="{lazyload:true, triggerEvent:'hover', group:'.biz-info', trigger:'.biz-info__title', openClass:'biz-info--open'}"></div>


由此可以看出商家的名称、地址、电话等内容是采用了json_encode()来处理,但为什么原文件里看到是加密的内容,而经过html解释道页面后有又显示为明文呢,是不是可以理解是php加密,js解密。这样页面直接被复制或抓取是无法直接读出明文的。

大牛们看看他具体是怎么实现的呢?

...全文
509 3 打赏 收藏 转发到动态 举报
写回复
用AI写文章
3 条回复
切换为时间正序
请发表友善的回复…
发表回复
huerniu 2014-07-15
  • 打赏
  • 举报
回复
再仔细分析了下他的页面代码,里面用JS调用的combo文件起到了解释的作用。这是YUI框架的原因啊
猪崽儿0o0 2014-07-14
  • 打赏
  • 举报
回复
只是json后的一种数据格式,你可以在json解码转过来就可以了,然后页面时采用的ajax进行局部数据输出的,当你只是采集的时候无法获取到对应的文字内容很正常。
xuzuning 2014-07-14
  • 打赏
  • 举报
回复
这与防采集无关,而与他用的 UI 有关 data-poi 的值是(url编码到达 js 时已还原)
[{"shopid":566878,"name":"\u80e4\u9686\u6c47","address":"\u6c5f\u5e72\u533a\u65b0\u5858\u8def\u676d\u6d77\u8def238\u53f7\uff08\u68ee\u79be\u5e7f\u573a\uff09","range":"\u94b1\u6c5f\u65b0\u57ce","rangeid":5455,"disid":58,"disname":"\u6c5f\u5e72\u533a","dpshopid":0,"mapurl":"","trafficinfo":"","phone":"0571-86963688\/86963788","latlng":"[30.253548,120.206872]","city":50,"url":"","poiid":6307803,"poilevel":{"avgscore":"4.8","fbcount":1471},"cityname":"\u676d\u5dde","status":0,"subwayname":"","subwaydis":0,"subwayslug":"","appointmentDay":0}]
json 解码后是
Array
(
    [0] => stdClass Object
        (
            [shopid] => 566878
            [name] => 胤隆汇
            [address] => 江干区新塘路杭海路238号(森禾广场)
            [range] => 钱江新城
            [rangeid] => 5455
            [disid] => 58
            [disname] => 江干区
            [dpshopid] => 0
            [mapurl] => 
            [trafficinfo] => 
            [phone] => 0571-86963688/86963788
            [latlng] => [30.253548,120.206872]
            [city] => 50
            [url] => 
            [poiid] => 6307803
            [poilevel] => stdClass Object
                (
                    [avgscore] => 4.8
                    [fbcount] => 1471
                )

            [cityname] => 杭州
            [status] => 0
            [subwayname] => 
            [subwaydis] => 0
            [subwayslug] => 
            [appointmentDay] => 0
        )

)

20,359

社区成员

发帖
与我相关
我的任务
社区描述
“超文本预处理器”,是在服务器端执行的脚本语言,尤其适用于Web开发并可嵌入HTML中。PHP语法利用了C、Java和Perl,该语言的主要目标是允许web开发人员快速编写动态网页。
phpphpstorm 技术论坛(原bbs)
社区管理员
  • 开源资源社区
  • phpstory
  • xuzuning
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧