XML文件数据清洗

weixin_42443454 2018-10-24 12:00:04
我现在有一个XML文件,需要提取出所有字段和信息,xpath给我看懵了,求大神指点啊
文件如下:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE business:PatentDocumentAndRelated SYSTEM "/DTDS/ExternalStandards/ipphdb-entities.dtd"[]>
<business:PatentDocumentAndRelated xmlns:base="http://www.sipo.gov.cn/XMLSchema/base" xmlns:business="http://www.sipo.gov.cn/XMLSchema/business" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:tbl="http://oasis-open.org/specs/soextblx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sipo.gov.cn/XMLSchema/business /DTDS/PatentDocument/Elements/OtherElements.xsd" xsdVersion="V2.2.1" file="CN102016000986887CN00001084879860ABIAZH20180904CN00Q.XML" dateProduced="20180830" status="C" lang="zh" country="CN" docNumber="108487986" kind="A" datePublication="20180904">
<business:BibliographicData>
<business:PublicationReference dataFormat="original" sourceDB="national office" sequence="1">
<base:DocumentID>
<base:WIPOST3Code>CN</base:WIPOST3Code>
<base:DocNumber>108487986</base:DocNumber>
<base:Kind>A</base:Kind>
<base:Date>20180904</base:Date>
</base:DocumentID>
</business:PublicationReference>
<business:PublicationReference dataFormat="standard" sequence="1">
<base:DocumentID>
<base:WIPOST3Code>CN</base:WIPOST3Code>
<base:DocNumber>108487986</base:DocNumber>
<base:Kind>A</base:Kind>
<base:Date>20180904</base:Date>
</base:DocumentID>
</business:PublicationReference>
<business:PublicAvailabilityDate>
<business:GazetteReference>
<business:GazetteNumber>34-3601</business:GazetteNumber>
<base:Date>20180904</base:Date>
</business:GazetteReference>
</business:PublicAvailabilityDate>
<business:ApplicationReference applType="10" dataFormat="original" sequence="1" sourceDB="national office">
<base:DocumentID>
<base:WIPOST3Code>CN</base:WIPOST3Code>
<base:DocNumber>201610986887.5</base:DocNumber>
<base:Date>20161110</base:Date>
</base:DocumentID>
</business:ApplicationReference>
<business:ApplicationReference applType="10" dataFormat="standard" sequence="1">
<base:DocumentID>
<base:WIPOST3Code>CN</base:WIPOST3Code>
<base:DocNumber>102016000986887</base:DocNumber>
<base:Date>20161110</base:Date>
</base:DocumentID>
</business:ApplicationReference>
<business:ClassificationIPCRDetails creator="03" processingType="original">
<business:ClassificationIPCR sequence="1">
<business:IPCVersionDate>20060101</business:IPCVersionDate>
<business:Section>F</business:Section>
<business:MainClass>02</business:MainClass>
<business:Subclass>B</business:Subclass>
<business:MainGroup>45</business:MainGroup>
<business:Subgroup>00</business:Subgroup>
<business:GeneratingOffice>
<base:WIPOST3Code>CN</base:WIPOST3Code>
</business:GeneratingOffice>
<business:ClassificationDataSource>H</business:ClassificationDataSource>
<base:Text>F02B 45/00 (2006.01)</base:Text>
</business:ClassificationIPCR>
<business:ClassificationIPCR sequence="2">
<business:IPCVersionDate>20060101</business:IPCVersionDate>
<business:Section>F</business:Section>
<business:MainClass>02</business:MainClass>
<business:Subclass>B</business:Subclass>
<business:MainGroup>63</business:MainGroup>
<business:Subgroup>06</business:Subgroup>
<business:GeneratingOffice>
<base:WIPOST3Code>CN</base:WIPOST3Code>
</business:GeneratingOffice>
<business:ClassificationDataSource>H</business:ClassificationDataSource>
<base:Text>F02B 63/06 (2006.01)</base:Text>
</business:ClassificationIPCR>
</business:ClassificationIPCRDetails>
<business:InventionTitle lang="zh" dataFormat="original" sourceDB="national office" processingType="original" creator="03">一种核内爆式流体活塞二冲程发动机-水泵联合体</business:InventionTitle>
<business:Parties>
<business:ApplicantDetails representative="1">
<business:Applicant sequence="1" appType="applicant-inventor" dataFormat="original" sourceDB="national office" lang="zh" creator="03" processingType="original">
<base:AddressBook lang="zh">
<base:Name>王德斌</base:Name>
<base:Address>
<base:AddressLine>0</base:AddressLine>
<base:AddressMailCode>0</base:AddressMailCode>
<base:PostBox>0</base:PostBox>
<base:AddressRoom>0</base:AddressRoom>
<base:AddressFloor>0</base:AddressFloor>
<base:AddressBuilding>0</base:AddressBuilding>
<base:Street>0</base:Street>
<base:AddressCity>0</base:AddressCity>
<base:County>秦都区</base:County>
<base:City>咸阳市</base:City>
<base:Province>陕西省</base:Province>
<base:PostCode>712021</base:PostCode>
<base:WIPOST3Code>CN</base:WIPOST3Code>
<base:Text>712021 陕西省咸阳市秦都区宝泉路5号电建小区19号楼3-6-西</base:Text>
</base:Address>
</base:AddressBook>
<business:OrganizationCode createDate="00000000" creator="00">0000000000</business:OrganizationCode>
</business:Applicant>
</business:ApplicantDetails>
<business:InventorDetails>
<business:Inventor sequence="1" dataFormat="original" sourceDB="national office" lang="zh" publicationMark="0" creator="03" processingType="original">
<base:AddressBook lang="zh">
<base:Name>王德斌</base:Name>
</base:AddressBook>
</business:Inventor>
</business:InventorDetails>
</business:Parties>
</business:BibliographicData>
<business:Abstract dataFormat="original" lang="zh" sourceDB="national office" processingType="original" creator="03">
<base:Paragraphs num="0001">本发明公开了一种核内爆式流体活塞二冲程发动机‑水泵联合体(简称“核泵”,或“核内爆式水泵”),在地下岩石中开挖出一个空腔作为该发动机的爆发室兼作气缸,从低位容器通过进水管向爆发室注水至所需高度,通过填炮管向爆发室内介质水中安放核爆炸装置至适当位置,关闭进水阀和炮闩,打开排水阀,起爆,介质水汽化产生推力,推动流体活塞位移,将过泵水经排水管注入高位容器,然后关闭排水阀,打开排气阀,继之打开进水阀,直到爆发室内液位再次符合运行要求,完成填炮程序及相应阀体操作后,再次起爆,如此往复循环。</base:Paragraphs>
<business:AbstractFigure>
<base:Figure num="0001">
<base:Image he="528.99" wi="661" file="201610986887.TIF" imgContent="undefined" imgFormat="TIFF" />
</base:Figure>
</business:AbstractFigure>
</business:Abstract>
</business:PatentDocumentAndRelated>
...全文
286 1 打赏 收藏 转发到动态 举报
写回复
用AI写文章
1 条回复
切换为时间正序
请发表友善的回复…
发表回复

62,614

社区成员

发帖
与我相关
我的任务
社区描述
Java 2 Standard Edition
社区管理员
  • Java SE
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧