python3使用PDFMiner读取pdf文件时如何保存LTImage类型即图片怎么保存的啊

weixin_38065391 2019-02-20 09:43:56
Python使用PDFMiner解析PDF 其中有个LTFigure类型 现在已经知道可以从LTfigure提取LTImage类型的图片了 请教,LTImage类型即图片怎么保存的啊
...全文
609 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
qissme 2020-03-21
  • 打赏
  • 举报
回复
请问怎样从LTfigure提取LTImage类型的图片?
weixin_38092709 2019-05-24
  • 打赏
  • 举报
回复
已放弃,多谢诸位
weixin_38082682 2019-03-20
  • 打赏
  • 举报
回复
def parse_lt_objs (lt_objs, page_number, images_folder, text=[]): #Iterate through the list of LT* objects and capture the text or image data contained in each# text_content = [] for lt_obj in lt_objs: if isinstance(lt_obj, LTTextBox) or isinstance(lt_obj, LTTextLine): # text text_content.append(lt_obj.get_text()) elif isinstance(lt_obj, LTImage): # text_content.append('<img src=tt" />') # an image, so save it to the designated folder, and note it's place in the text saved_file = save_image(lt_obj, page_number, images_folder) if saved_file: use html style <img /> tag to mark the position of the image within the text text_content.append('<img src="'+os.path.join(images_folder, saved_file)+'" />') else: print >> sys.stderr, "Error saving image on page", page_number, lt_obj.__repr__ elif isinstance(lt_obj, LTFigure): LTFigure objects are containers for other LT* objects, so recurse through the children text_content.append('<Figure src=tt" />') text_content.append(parse_lt_objs(lt_obj.objs, page_number, images_folder, text_content)) #这句话报错,你知道为什么吗?提示说lt_obj没有objs属性 return '\n'.join(text_content) def save_image (lt_image, page_number, images_folder): #Try to save the image data from this LTImage object, and return the file name, if successful# result = None if lt_image.stream: file_stream = lt_image.stream.get_rawdata() file_ext = determine_image_type(file_stream[0:4]) if file_ext: file_name = ''.join([str(page_number), '_', lt_image.name, file_ext]) if write_file(images_folder, file_name, lt_image.stream.get_rawdata(), flags='wb'): result = file_name return result def determine_image_type (stream_first_4_bytes): #Find out the image file type based on the magic number comparison of the first 4 (or 2) bytes# file_type = None bytes_as_hex = b2a_hex(stream_first_4_bytes) if bytes_as_hex.startswith('ffd8'): file_type = '.jpeg' elif bytes_as_hex == '89504e47': file_type = ',png' elif bytes_as_hex == '47494638': file_type = '.gif' elif bytes_as_hex.startswith('424d'): file_type = '.bmp' return file_type def write_file (folder, filename, filedata, flags='w'): #Write the file data to the folder and filename combination #(flags: 'w' for write text, 'wb' for write binary, use 'a' instead of 'w' for append)# result = False if os.path.isdir(folder): try: file_obj = open(os.path.join(folder, filename), flags) file_obj.write(filedata) file_obj.close() result = True except IOError: pass return result 按照文档来说这个应该是可以的
weixin_38077297 2019-03-06
  • 打赏
  • 举报
回复
看下有帮助没:https://www.jianshu.com/p/938763947de3

433

社区成员

发帖
与我相关
我的任务
社区描述
其他技术讨论专区
其他 技术论坛(原bbs)
社区管理员
  • 其他技术讨论专区社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧