Convert PDF to Image and PDF to Text in C#

PDF is more and more popular used in read world business now. It can hold text, image, table, list and nearly all the content in the pdf page. As it’s not be able modified by the receivers, it always used to as invoice or some other important information file.

pdf2image - Convert PDF to Image and PDF to Text in C#

However, there is so much need to get the content out of PDF, or convert PDF document to other formats, such as images, Office Word or Office Excel document. iDiTect provide professional PDF editing and processing library and SDK to help C# and VB .NET developers to help this problem. This C# component support to convert pdf to images and extract text from pdf.

Convert PDF document to image files using C# .NET, iDiTect.Converter C# tool can be used in any of your Winforms, Console application and ASP.NET web projects. You can convert pdf page to raster image format, such as jpg/jpeg, png, tif/tiff, bmp and gif. The converted image’s dpi can be customized by needed, large dpi is leading to high quality images. If you want to optimize the output jpg image size, you can compress the jpg using the api embedded in the library, it’s so easy and simple to use.

PdfToImageConverter converter = new PdfToImageConverter();
          
converter.Load(File.ReadAllBytes("sample.pdf"));

//Default is 72, the higher DPI, the bigger size out image will be
converter.DPI = 96;
//The value need to be 1-100. If set to 100, the converted image will take the
//original quality with less time and memory. If set to 1, the converted image 
//will be compressed to minimum size with more time and memory.
//converter.CompressedRatio = 80;

for (int i = 0; i < converter.PageCount; i++)
{
    //The converted image will keep the original size of PDF page
    Image pageImage = converter.PageToImage(i);
    //To specific the converted image size by width and height
    //Image pageImage = converter.PageToImage(i, 100, 150);
    //You can save this Image object to jpeg, tiff and png format to local file.
    //Or you can make it in memory to other use.
    pageImage.Save(i.ToString() + ".jpg", ImageFormat.Jpeg);
}

 

Besides, C# developers can convert entire pdf document to one multiple pages tiff image. The converted multi-page tiff will keep the document structure of the original pdf file, and contains the same page count of the original pdf file.

PdfToImageConverter converter = new PdfToImageConverter();

//Default is 72, the higher DPI, the bigger size out image will be
converter.DPI = 96;

using (Stream stream = File.OpenRead("sample.pdf"))
{
    converter.Load(stream);
    //Save pdf to multiple pages tiff to local file
    converter.DocumentToMultiPageTiff("convert.tiff");
    //Or save the multiple pages tiff in memory to other use
    //Image multipageTif = converter.DocumentToMultiPageTiff();
}

Extracting text from PDF using C# and VB.NET, iDiTect.Converter .NET toolkit help developers to extract text from PDF, the output text will keep the layout in the PDF page. Text in the pdf header and footer can be recognized, text in the pdf table, list, paragraph and other sections can be also found and extracted. After converting pdf to text, you can modify the output text, such as find target word, or replace, cut or save it.

PdfToTxtConverter converter = new PdfToTxtConverter();
converter.Load(File.ReadAllBytes("sample.pdf"));

//Set whole document text property
StringBuilder total = new StringBuilder();

for (int i = 0; i < converter.PageCount; i++)
{
    //Extract each page text from PDF with original layout
    string pageText = converter.PageToText(i);
    //You can save the page text to local file, or left in memory to other use
    File.WriteAllText(i.ToString() + ".txt", pageText, Encoding.UTF8);
    //Add each page text together
    total.Append(pageText);
}

How to convert pdf to image in c#

pinit fg en rect red 28 - Convert PDF to Image and PDF to Text in C#

Leave a Reply

Your email address will not be published. Required fields are marked *