使用Aspose.Pdf.dll或pdfbox提出中文时,出现乱码。乱码原因已明白,是编码的原因。因为它是Identity-H的,而使用foxit等生成的pdf是ANSI之类的编码不会有问题。
pdfbox代码:
OpenFileDialog open = new OpenFileDialog();
open.Title = "请选择要导入的pdf文件";
open.Filter = "pdf文件(*.pdf)|*.pdf";
string fileName = "";
if (open.ShowDialog() == DialogResult.OK)
{
fileName = open.FileName;
}
else
return;
PDDocument doc = PDDocument.load(fileName);
PDFTextStripper pdfStripper = new PDFTextStripper();
string str= pdfStripper.getText(doc);
textBox1.Text = str;
Aspose:
string file = "";
if (fileDialog.ShowDialog() == DialogResult.OK)
{
file = fileDialog.FileName;
}
else
return;
d = new Document(file);
TextAbsorber txt = new TextAbsorber();
d.Pages[1].Accept(txt);
textBox1.Text = txt.Text;
请问怎么处理,任一办法都可以 附件为样本