A long while ago when I posted the code to extract text from a PDF using iTextSharp, a VBF member asked me to write a function to extract images too. I was busy at the time and didn't dig too deep into it. And recently, while trying to find a way to extracted hyperlinks from a PDF (asked by a VBF member), I also figured out how to get the images. So I thought I would post the code here to share with everyone.
Note1: You'll need to add a reference of iTextSharp.dll to your project. It can be downloaded by Googling for "itextsharp download" if you don't already have it.
Note2: This code were written targetting .Net 2.0 framework. It will still work on .Net 1.x if you replace every occurances of "List(Of Image)" in the code with an ArrayList.
vb.net Code:
Public Shared Function ExtractImages(ByVal sourcePdf As String) As List(Of Image) Dim imgList As New List(Of Image) Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing Dim reader As iTextSharp.text.pdf.PdfReader = Nothing Dim pdfObj As iTextSharp.text.pdf.PdfObject = Nothing Dim pdfStrem As iTextSharp.text.pdf.PdfStream = Nothing Try raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf) reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing) For i As Integer = 0 To reader.XrefSize - 1 pdfObj = reader.GetPdfObject(i) If Not IsNothing(pdfObj) AndAlso pdfObj.IsStream() Then pdfStrem = DirectCast(pdfObj, iTextSharp.text.pdf.PdfStream) Dim subtype As iTextSharp.text.pdf.PdfObject = pdfStrem.Get(iTextSharp.text.pdf.PdfName.SUBTYPE) If Not IsNothing(subtype) AndAlso subtype.ToString = iTextSharp.text.pdf.PdfName.IMAGE.ToString Then Dim bytes() As Byte = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw(CType(pdfStrem, iTextSharp.text.pdf.PRStream)) If Not IsNothing(bytes) Then Try Using memStream As New System.IO.MemoryStream(bytes) memStream.Position = 0 Dim img As Image = Image.FromStream(memStream) imgList.Add(img) End Using Catch ex As Exception 'Most likely the image is in an unsupported format 'Do nothing 'You can add your own code to handle this exception if you want to End Try End If End If End If Next reader.Close() Catch ex As Exception MessageBox.Show(ex.Message) End Try Return imgList End Function




Reply With Quote