关于pdfbox解析pdf文件的疑问,请高手们赐教一下刘涛 (用户名:yetaodiao),
我的pdf文件稍微大一点(40M),就会爆内存溢出,我把pdf文件分割成1M一个的,用for循环去load也会爆内存溢出,该怎么解决啊?
File[] files = new File(file).listFiles();
for (File f : files)
{
System.out.println(f.getName());
FileInputStream fis = new FileInputStream(f);
// 内存中存储的PDF Document
PDDocument document = PDDocument.load(fis);
// System.out.println(stripper.getText(document));
document.close();
fis.close();
}
Exception in thread "main" org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
at com.trends.pdfbox.Test2.geText(Test2.java:33)
at com.trends.pdfbox.Test2.main(Test2.java:13)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:448)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
... 4 more
我把内存设置成2000M还是一样内存溢出~~~