如何爬去淘宝商品的所有评论
我最近在做一个项目,需要爬取淘宝网上一个商品的所有评论,项目用java开发,使用了htmlUnit开源框架,发现淘宝网页用了ajax技术,我以前也没接触过ajax,现在有点不值所措,求诸位帮忙。我的代码是这样写的:
package nankai.SpiderDemo4;
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class HtmlUnitSpider {
private String urlString;
public HtmlUnitSpider(String urlString) {
this.urlString = urlString;
}
public String getUrlString() {
return urlString;
}
public void setUrlString(String urlString) {
this.urlString = urlString;
}
public void run() {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_10);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setThrowExceptionOnScriptError(false);
try {
HtmlPage page = webClient.getPage(this.getUrlString());
System.out.println(<span style="color: #FF0000;">page.asXml()</span>);
} catch (FailingHttpStatusCodeException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Throwable e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
运行结果中里面也没有评论相关的,据说是淘宝使用了懒加载技术,请问各位:怎么破!