PDFminer3k 将PDF转换为TXT时出错pdfminer.pdfparser.PDFSyntaxError: Invalid object number

weixin_43350191 2019-07-02 11:39:46

PDFminer3k 将PDF转换为TXT时出错，求大神解决方案！

"C:\Program Files\Python37\python.exe" D:/PYTHON/PythonWS/0702/0702.py
WARNING:root:Wrong type: <PDFStream(3): raw=278, {'Type': /Metadata, 'Subtype': /XML, 'Length': 278, 'Filter': /FlateDecode}> required: <class 'dict'>
WARNING:root:Cannot locate objid=221
Mark
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdfparser.py", line 377, in _getobj
obj = objs[i]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:/PYTHON/PythonWS/0702/0702.py", line 51, in <module>
readPDF(path, toPath)
File "D:/PYTHON/PythonWS/0702/0702.py", line 39, in readPDF
for page in pdfFile.get_pages():
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdfparser.py", line 568, in get_pages
for (pageid,tree) in search(self.catalog['Pages'], self.catalog):
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdfparser.py", line 552, in search
tree = dict_value(obj, strict=True).copy()
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdftypes.py", line 92, in typecheck_value
x = resolve1(x)
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdftypes.py", line 58, in resolve1
x = x.resolve()
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdftypes.py", line 47, in resolve
return self.doc.getobj(self.objid)
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdfparser.py", line 532, in getobj
result = self._getobj(objid)
File "C:\Program Files\Python37\lib\site-packages\pdfminer\pdfparser.py", line 379, in _getobj
raise PDFSyntaxError('Invalid object number: objid=%r' % (objid))
pdfminer.pdfparser.PDFSyntaxError: Invalid object number: objid=2

Process finished with exit code 1

...全文