网页信息抽取
用java写的抽取一个网站信息的代码,为什么读取网页源代码的时候,第一个字符读不出来呢
如<html>读出来的是html> </div> 读出来是 /div>
显示html代码 部分的java代码是:
URL testURL = new URL(url);
URLConnection connection = testURL.openConnection();
connection.connect();
InputStream urlStream = connection.getInputStream();
BufferedReader urlreader = new BufferedReader(new InputStreamReader(urlStream));
while(urlreader.read() > 0){
String str = urlreader.readLine();
System.out.println(str);
}