crul抓取网页失败
利用crul抓取一些网页内容,有个网址怎么都抓取不到内容,不知道什么原因
我的抓取代码:
$cookie_jar = 'cookie.tmp';
$response = request("http://www.liebiao.com/luan/yiliao/33836530.html",$cookie_jar,"www.baidu.com");
echo $response;
function request($url,$cookie_jar,$referer){
$ch = curl_init();
$options = array(CURLOPT_URL => $url,
CURLOPT_HEADER => 0,
CURLOPT_NOBODY => 0,
CURLOPT_PORT => 80,
CURLOPT_POST => 0,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_USERAGENT => ' Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1',
CURLOPT_COOKIEJAR => $cookie_jar,
CURLOPT_COOKIEFILE => $cookie_jar,
CURLOPT_REFERER => $referer
);
curl_setopt_array($ch, $options);
$code = curl_exec($ch);
curl_close($ch);
return $code;
}
利用抓包软件抓到的内容如下:
GET /luan/yiliao/33836530.html HTTP/1.1
Host: www.liebiao.com
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: zh-CN,zh;q=0.8
Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3
Cookie: defaultcity=2249; _referid=0; Hm_lvt_0a20d90497ff8686d88e96f187962eee=1343870597482,1343875334177,1343887021449;
Hm_lpvt_0a20d90497ff8686d88e96f187962eee=1343887868779