【精品问题】有关JDK1.4中有HTMLDocument用来解析Html文档的

RedGuest 2004-04-09 05:20:27
这个API是怎么使用的,哪位高手给我个例子啊?

-------------------------------------------
我使用socket连接web服务器的80断口,抓回来了一些html文档,但是我不知道怎么使用这个类来解析html文档
...全文
191 5 打赏 收藏 转发到动态 举报
写回复
用AI写文章
5 条回复
切换为时间正序
请发表友善的回复…
发表回复
songbo_pp 2004-04-09
  • 打赏
  • 举报
回复
2.Getting the Links in an HTML Document
// This method takes a URI which can be either a filename (e.g. file://c:/dir/file.html)
// or a URL (e.g. http://host.com/page.html) and returns all HREF links in the document.
public static String[] getLinks(String uriStr) {
List result = new ArrayList();

try {
// Create a reader on the HTML content
URL url = new URI(uriStr).toURL();
URLConnection conn = url.openConnection();
Reader rd = new InputStreamReader(conn.getInputStream());

// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
kit.read(rd, doc, 0);

// Find all the A elements in the HTML document
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();

String link = (String)s.getAttribute(HTML.Attribute.HREF);
if (link != null) {
// Add the link to the result list
result.add(link);
}
it.next();
}
} catch (MalformedURLException e) {
} catch (URISyntaxException e) {
} catch (BadLocationException e) {
} catch (IOException e) {
}

// Return all found links
return (String[])result.toArray(new String[result.size()]);
}

Related Examples

songbo_pp 2004-04-09
  • 打赏
  • 举报
回复
1.Getting the Text in an HTML Document

// This method takes a URI which can be either a filename (e.g. file://c:/dir/file.html)
// or a URL (e.g. http://host.com/page.html) and returns all text in the document.
public static String getText(String uriStr) {
final StringBuffer buf = new StringBuffer(1000);

try {
// Create an HTML document that appends all text to buf
HTMLDocument doc = new HTMLDocument() {
public HTMLEditorKit.ParserCallback getReader(int pos) {
return new HTMLEditorKit.ParserCallback() {
// This method is whenever text is encountered in the HTML file
public void handleText(char[] data, int pos) {
buf.append(data);
buf.append('\n');
}
};
}
};

// Create a reader on the HTML content
URL url = new URI(uriStr).toURL();
URLConnection conn = url.openConnection();
Reader rd = new InputStreamReader(conn.getInputStream());

// Parse the HTML
EditorKit kit = new HTMLEditorKit();
kit.read(rd, doc, 0);
} catch (MalformedURLException e) {
} catch (URISyntaxException e) {
} catch (BadLocationException e) {
} catch (IOException e) {
}

// Return the text
return buf.toString();
}

阎罗 2004-04-09
  • 打赏
  • 举报
回复
import java.net.*;
import java.io.*;

public class GetHTML {
public static void main(String args[]){
if (args.length < 1){
System.out.println("USAGE: java GetHTML httpaddress");
System.exit(1);
}
String sURLAddress = new String(args[0]);
URL url = null;
try{
url = new URL(sURLAddress);
}catch(MalformedURLException e){
System.err.println(e.toString());
System.exit(1);
}
try{
InputStream ins = url.openStream();
BufferedReader breader = new BufferedReader(new InputStreamReader(ins));
String info = breader.readLine();
while(info != null){
System.out.println(info);
info = breader.readLine();
}
}
catch(IOException e){
System.err.println(e.toString());
System.exit(1);
}
}
}
zgpp 2004-04-09
  • 打赏
  • 举报
回复
研究API呀?
xylohouse 2004-04-09
  • 打赏
  • 举报
回复
你想怎么解析它?

62,622

社区成员

发帖
与我相关
我的任务
社区描述
Java 2 Standard Edition
社区管理员
  • Java SE
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧