[VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

**stanav** · Sep 28th, 2007, 10:11 AM

This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):

1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.

2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress

>>>> Last updated on 4/9/2012 <<<<

Please verify the version of iTextSharp you're using and download the correct class.

The current version of PdfManipulation2 class supports AES_256 encryption provided that your itextsharp.dll version is 5.1.x or higher.

Below is the list of public methods in the new PdfManipulation2 class

vb.net Code:

'Remove all restrictions from a pdf file
    Public Shared Function RemoveRestrictions(ByVal restrictedPdf As String, Optional ByVal password As String = Nothing, Optional ByVal saveABackup As Boolean = True) As Boolean
    
    'Parse text from a specified range of pdf pages    
    Public Shared Function ParsePdfText(ByVal sourcePDF As String, _
                                  Optional ByVal fromPageNum As Integer = 0, _
                                  Optional ByVal toPageNum As Integer = 0) As String
    
    'Parse all text from a pdf
    Public Shared Function ParseAllPdfText(ByVal sourcePDF As String) As Dictionary(Of Integer, String)
    
    'Page to page comparision of 2 pdf files and write the differences to a resulting text file    
    Public Shared Sub ComparePdfs(ByVal pdf1 As String, ByVal pdf2 As String, _
                                  ByVal resultFile As String, _
                                  Optional ByVal fromPageNum As Integer = 0, _
                                  Optional ByVal toPageNum As Integer = 0)
   
    'Extract specified pages from a pdf to create a new pdf
    Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
 
    'Split a pdf into specified number of pdfs
    Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
    
    'Split a pdf into multiple pdfs each containing a specified number of pages.  
    Public Shared Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
    
    'Extract pages from multiple source pdfs and merge into a final pdf    
    Public Shared Sub ExtractAndMergePdfPages(ByVal sourceTable As DataTable, ByVal outPdf As String)
     
    'Set security password on an existing pdf file  
    Public Shared Sub SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)
     
    'Add watermark to pdf pages using an image   
    Public Shared Sub AddWatermarkImage(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkImage As String)
    
    'Add water mark to all pdf pages using text    
    Public Shared Sub AddWatermarkText(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkText() As String, _
                                       Optional ByVal watermarkFont As iTextSharp.text.pdf.BaseFont = Nothing, _
                                       Optional ByVal watermarkFontSize As Single = 48, _
                                       Optional ByVal watermarkFontColor As iTextSharp.text.BaseColor = Nothing, _
                                       Optional ByVal watermarkFontOpacity As Single = 0.3F, _
                                       Optional ByVal watermarkRotation As Single = 45.0F)
 
    'Merge multiple pdfs into a single one.   
    Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String, _
                                         Optional ByVal authorName As String = "", _
                                         Optional ByVal creatorName As String = "", _
                                         Optional ByVal subject As String = "", _
                                         Optional ByVal title As String = "", _
                                         Optional ByVal keywords As String = "") As Boolean
 
    'Merge multiple pdf's into one with all bookmarks preserved
    Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
        
    'Add document outline (bookmarks) to a pdf
    Public Shared Sub AddDocumentOutline(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal outlineTable As System.Data.DataTable)
     
    'Extract urls from a pdf   
    Public Shared Function ExtractURLs(ByVal sourcePdf As String, Optional ByVal pageNumbers() As Integer = Nothing) As System.Data.DataTable
        
    'Extract images from a pdf
    Public Shared Function ExtractImages(ByVal sourcePdf As String) As List(Of Image)
     
    'Fill a form   
    Public Shared Sub FillAcroForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String) 
 
    Public Shared Sub FillMyForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String)
 
    'Add annotatation
    Public Shared Sub AddTextAnnotation(ByVal sourcePdf As String, ByVal outputPdf As String)
 
    Public Shared Function GetAcroFieldData(ByVal sourcePdf As String) As Dictionary(Of String, String)
        
    Public Shared Function GetPdfSummary(ByVal sourcePdf As String) As DataTable
        
    Public Shared Function ReplacePagesWithBlank(ByVal sourcePdf As String, _
                                                 ByVal pagesToReplace As List(Of Integer), _
                                                 ByVal outPdf As String, _
                                                 Optional ByVal templatePdf As String = "") As Boolean
       
    Public Shared Function InsertPages(ByVal sourcePdf As String, _
                                       ByVal pagesToInsert As Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage), _
                                       ByVal outPdf As String) As Boolean
       
    Public Shared Function RemovePages(ByVal sourcePdf As String, ByVal pagesToRemove As List(Of Integer), ByVal outputPdf As String) As Boolean
     
    'A demo on how to draw various shapes in itextsharp   
    Public Shared Sub DrawShapesDemo(ByVal sourcePdf As String, ByVal outputPdf As String)
         
    Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)

Any comments are welcomed.
Happy coding

Stanav.

**nbrege** · Dec 14th, 2007, 11:32 AM

Stanav ... thanks for posting these code samples. They helped me on a project that I am currently working on. I would like to request that you post another sample: I need to be able to extract specified pages from multiple documents & save them to one combined PDF. ie. take pages 3 & 7 from Doc1.pdf, 4-6 from Doc2.pdf & 1, 5 & 12 from Doc3.pdf and save them in Doc4.pdf Is this "do-able"?

**stanav** · Dec 17th, 2007, 09:09 AM

Yes, it's doable. However, I'm on vaction right now and I do not have access to my work computer which has all the needed tools to write code. What you can do right now is to create a function that returns a hashtable or a dictionary with the file names (string) being the keys and the pages to extract (integer array) being the values. Once you have this hashtable/dictionary, you can modify the ExtractPdfPage sub such that it will create a single new pdf file and then loop trhu the hashtable/dictionary to extract the pages and add them o the output pdf. It's just a matter of setting up the loop right such that in each loop, you read an entry and extract pages from that file.
If you can wait until later this week when I return to work, I can try to come up with something for you in code.
Best regards,
Stanav.

**nbrege** · Dec 17th, 2007, 09:19 AM

If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing. Enjoy the rest of your vacation...

**stanav** · Dec 20th, 2007, 09:26 AM

Originally Posted by nbrege

If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing. Enjoy the rest of your vacation...

I've added a method to do what you need. Since the total text is more than 1000 characters, I had to put all the code in to a class (PdfManipulation.vb) and post it as an attachment. Hope it helps.

**gaigoi113** · Jul 31st, 2008, 10:26 AM

Hi Stanav,

Do you have any code sample that will convert pdf to multipage tiff? - thanks

**stanav** · Jul 31st, 2008, 11:36 AM

Originally Posted by gaigoi113

Hi Stanav,

Do you have any code sample that will convert pdf to multipage tiff? - thanks

It's impossible to use iTextSharp to convert pdf to multipage tiff. However, you can use PDFBox to convert each pdf page to an image file (it only outputs to jpg's or png's), then merge these images into a multipage tiff.

To download PDFBox, go here:
http://www.pdfbox.org/index.html

To merge multiple images into 1 multipage tiff, check out this codeproject article:
http://www.codeproject.com/KB/GDI-pl...ipageTiff.aspx

And good luck

**MasterRipper** · Jul 8th, 2009, 05:13 AM

Hi all.

I know this thread is old, but I am using the iTextSharp library in this exact way.

I have a PDF with 4 pages and use this code to extract page 3 in a quick example prog I made.

However, the original PDF has text fields I can edit ( acrofields ) and after extraction the 3rd page, loses these fields.

Any idea(s) what I can change / do to keep these editable fields in the resulting page 3.

Thanks.

**cthai** · Mar 5th, 2010, 01:59 PM

Hi,

I'm trying to extract a single page from a multi page pdf and I'm using the code below; however, I'm getting an error that it's not recognizing <param name>. Any help would be great. Thanks.

Code:

''' <summary>
    ''' Extract a single page from source pdf to a new pdf
    ''' </summary>
    <param name="sourcePdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"</param>
    <param name="pageNumberToExtract">"P1T1"</param>
    <param name="outPdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40a.pdf"</param>
    ''' <remarks></remarks>
    Public Shared Sub ExtractPdfPage(ByVal sourcePdf As String, ByVal pageNumberToExtract As Integer, ByVal outPdf As String)
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim doc As iTextSharp.text.Document = Nothing
        Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Try
            reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
            doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
            pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
            doc.Open()
            page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
            pdfCpy.AddPage(page)
            doc.Close()
            reader.Close()
        Catch ex As Exception
            Throw ex
        End Try
    End Sub

**stanav** · Mar 5th, 2010, 05:19 PM

Why are you putting your arguments in the code comments? That's not how you do it. You need to call the sub and pass in your arguments, something like this:

vb.net Code:

'Specified the path to the source pdf file
Dim sourcePdf as sgtring = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"
 
'Extract page # 2 off this above pdf file
Dim pageNumberToExtract As Integer = 2
 
'And then save it to a new pdf named 'table40_page2.pdf'
Dim outputPdf As String = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40_page2.pdf"
 
'Call the sub somewhere in your program passing in the above arguments
PdfManipulation.ExtractPdfPage("C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf", pageNumberToExtract, outputPdf)

**slow&steady** · Mar 24th, 2010, 04:09 PM

Stanav :

i have tried itextsharp for putting watermark on pdfs.It worked fine.

Now i am trying to edit Header on existing pdf files to desired header.

Is it possible.

if its possible then i have to try to use it on the bunch of pdf files in one single folder

Thanks for the help

Sri

**stanav** · Mar 24th, 2010, 07:44 PM

Originally Posted by slow&steady

Stanav :

i have tried itextsharp for putting watermark on pdfs.It worked fine.

Now i am trying to edit Header on existing pdf files to desired header.

Is it possible.

if its possible then i have to try to use it on the bunch of pdf files in one single folder

Thanks for the help

Sri

Yes, it's possible to add/change the header/footer of an existing pdf file and save the result to a new file. Please post your question in VB.Net forum because it's a different subject and doeasn't belong to this code bank thread.

**vijy** · Apr 6th, 2010, 02:13 AM

Hi Stanav,
its possible to extract the PDF pages with bookmarks?

**stanav** · Apr 16th, 2010, 10:11 AM

Originally Posted by vijy

Hi Stanav,
its possible to extract the PDF pages with bookmarks?

Yes, I THINK it is quite possible, but it would involve much more work (obviously). I gave it a shot as seen in the code below but frankly, the method I was using only works to some extends. It only preserves the 1st level bookmarks . My approach was to export the bookmarks in the original pdf to a collection, and, select the pages to be extract from the reader, use pdfstamper to copy the original pdf (with now only the selected pages) to a new pdf. Since pdfstamper automatically preserves ALL the bookmarks from the original, I had to edit the bookmark collection to remove the unused ones. This approach should work but I don't know why it only preserves 1st level bookmarks. Some more work is needed to work that bug out, but I don't have the time right now. I will post just what I have so far.

vb.net Code:

''' <summary>
    ''' Extract pages from an existing pdf file to create a new pdf with bookmarks preserved
    ''' </summary>
    ''' <param name="sourcePdf">full path to sthe source pdf</param>
    ''' <param name="pageNumbersToExtract">an integer array containing the page number of the pages to be extracted</param>
    ''' <param name="outPdf">the full path to the output pdf</param>
    ''' <remarks></remarks>
    Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
 
        Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim outlines As System.Collections.ArrayList = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
        Dim hshTable As System.Collections.Hashtable = Nothing
        Try
            raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
            reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
            outlines = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)
            reader.SelectPages(New System.Collections.ArrayList(pageNumbersToExtract))
            stamper = New iTextSharp.text.pdf.PdfStamper(reader, New IO.FileStream(outPdf, IO.FileMode.Create))
            RemoveUnusedBookmarks(outlines, pageNumbersToExtract)
            stamper.Outlines = outlines
            stamper.Close()
            reader.Close()
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
    End Sub
 
    Private Shared Sub RemoveUnusedBookmarks(ByRef bookmarks As System.Collections.ArrayList, ByVal pagesToKeep() As Integer)
        Dim bookmark As System.Collections.Hashtable = Nothing
        Dim obj As Object = Nothing
        For i As Integer = bookmarks.Count - 1 To 0 Step -1
            obj = bookmarks(i)
            If TypeOf obj Is System.Collections.ArrayList Then
                RemoveUnusedBookmarks(DirectCast(obj, System.Collections.ArrayList), pagesToKeep)
            ElseIf TypeOf obj Is System.Collections.Hashtable Then
                bookmark = DirectCast(obj, System.Collections.Hashtable)
                If bookmark.ContainsKey("Page") Then
                    Dim value As String = DirectCast(bookmark.Item("Page"), String)
                    If Not String.IsNullOrEmpty(value) Then
                        Dim parts() As String = value.Split(" "c)
                        If parts.Length > 0 Then
                            Dim pageNum As Integer = -1
                            If Integer.TryParse(parts(0), pageNum) Then
                                Dim idx As Integer = System.Array.IndexOf(pagesToKeep, pageNum)
                                If idx < 0 Then
                                    bookmarks.Remove(obj)
                                Else
                                    parts(0) = (idx + 1).ToString
                                    value = String.Join(" ", parts)
                                    bookmark.Item("Page") = value
                                End If
                            End If
                        End If
                    End If
                End If
            End If
        Next
    End Sub

Another approach I thought of was to export the original bookmarks to an XML file and edit that file. Once done, import it back to the new pdf file (which contains only the extracted pages). But like I said, I'm currently donot have a lot of free time to play with it. So I leave it to you to try

Good luck.

**vijy** · Apr 21st, 2010, 04:50 AM

Thanks stanav...
yep i tried and i get...

Splitting Code:

Public Function SplitPdfFiles(ByVal iStartPage As String, ByVal iEndPage As String, ByVal sPDFPath As String) As Boolean
        Try
            'Variables to hold the split file informations
           
            Dim reader As PdfReader = New PdfReader(sPDFPath)
            reader.RemoveUnusedObjects()
            reader.ConsolidateNamedDestinations()
 
            Dim importedPage As PdfImportedPage = Nothing
            Dim currentDocument As New Document
            Dim pdfWriter As PdfSmartCopy = Nothing
 
            
            Dim bIsFirst As Boolean = True
            For j As Integer = iStartPage To iEndPage
                If bIsFirst Then
                    bIsFirst = False
                    currentDocument = New Document(reader.GetPageSizeWithRotation(1))
                    pdfWriter = New PdfSmartCopy(currentDocument, New System.IO.FileStream(System.IO.Path.GetDirectoryName(sInFile) & "\" & sSplitName, System.IO.FileMode.Create))
                    pdfWriter.SetFullCompression()
                    ' pdfWriter.CompressionLevel = PdfStream.BEST_COMPRESSION
                    pdfWriter.PdfVersion = reader.PdfVersion
                    currentDocument.Open()
                End If
 
                importedPage = pdfWriter.GetImportedPage(reader, j)
                pdfWriter.AddPage(importedPage)
            Next
 
            Dim bookMark As New ArrayList
            bookMark = SimpleBookmark.GetBookmark(reader)
          
            If bookMark IsNot Nothing Then
                SimpleBookmark.EliminatePages(bookMark, New Integer() {iEndPage + 1, reader.NumberOfPages})
                If iStartPage > 1 Then
                    SimpleBookmark.EliminatePages(bookMark, New Integer() {1, iStartPage})
                    SimpleBookmark.ShiftPageNumbers(bookMark, -(iStartPage - 1), Nothing)
                End If
                pdfWriter.Outlines = bookMark
            End If
            currentDocument.Close()
            pdfWriter.Close()
            Return True
        Catch ex As Exception
        End Try
        Return False
    End Function

this one working fine.. and the pdf extracting with actual bookmarks..

This approach should work but I don't know why it only preserves 1st level bookmarks

the problem is its preserving first level bookmarks.. Stanav, its possible to get atleast the child bookmarks collection..??

**selnahwy** · May 19th, 2010, 12:14 PM

Has anyone found a code example on how to convert PDF to image using iTextSharp or PDFBox?

**mpires** · Jul 27th, 2010, 06:18 AM

Hi Stanav,

First nice work, you help me allot, wit you example but i have a question,

I'm using the "SplitPdfByPages" and is working ok, but is there any reason for the extraction pdf's end with a larger size that the original that as 5.pag?

Ex.:

Original pdf with 5.pag ( 72KB )

I extract the 5.pag with your example code, and etch pag ends with 85KB

Is there any way to compress the extraction pages? or some reason for this?

Regards,

**prabakarank** · Aug 5th, 2010, 08:57 AM

Hi,
I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported".

Please give the solutions for the above problem. Please do the needful.

**stanav** · Aug 5th, 2010, 11:43 AM

Originally Posted by prabakarank

Hi,
I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported".

Please give the solutions for the above problem. Please do the needful.

You download the file and save it to a temp location 1st. After that, you can split it as usual. If you don't need the original pdf after done splitting, you can delete it.
To download a file from an url, you can use a WebClient or simply use
My.Computer.Network.DownloadFile(url, saveLocation).

**prabakarank** · Aug 6th, 2010, 02:37 AM

Hi ,
I need to pass the parameter like this ("http://localhost:1870/PDFWCFService/1.pdf",1,"http://localhost:1870/PDFWCFService/2.pdf") in the SplitPdfByPages method..
The output file in the format of URL.
It returns following error "Uri format is not supported".
Please give the solutions for the above problem. Please do the needful.

**stanav** · Aug 6th, 2010, 07:44 AM

You need to supply the physical file paths... There's no way around it because we rely on iTextSharp to do the work, and if iTextSharp doesn't support it, there's not much we can do to.
However, that is not a problem. The problem is with your methodology of doing things. While you can access (download) a file from an url, you cannot upload the file using an url. If you are to run the splitting task any PC, you will need to download the file to the local PC, split it and then upload it back. If you're to run that splitting task on the server that host your web site, you have to give it the direct physical paths and not the url's. You cannot treat an url the same as a conventional file path.

**prabakarank** · Aug 6th, 2010, 10:13 AM

Hi,
i got the below error
Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'.

Whats the reason i got that error. How we avoid this type error. Is there any solution for this problem.

**stanav** · Aug 6th, 2010, 10:17 AM

Show the code where the error occured...

**prabakarank** · Aug 6th, 2010, 10:21 AM

Below is the code. I converted from Vb.net to C#.

iTextSharp.text.pdf.PdfReader reader = null;
iTextSharp.text.Document doc = null;
iTextSharp.text.pdf.PdfCopy pdfCpy = null;
iTextSharp.text.pdf.PdfImportedPage page = null;
int pageCount = 0;
try
{
reader = new iTextSharp.text.pdf.PdfReader(sourcePdf);
pageCount = reader.NumberOfPages;
if (pageCount < numOfPages)
{
return -1;
throw new ArgumentException("Not enough pages in source pdf to split");
}
else
{
string ext = System.IO.Path.GetExtension(baseNameOutPdf);
string outfile = string.Empty;
int n = Convert.ToInt32(Math.Ceiling(Convert.ToDouble(pageCount) / Convert.ToDouble(numOfPages)));
int currentPage = 1;
for (int i = 1; i <= n; i++)
{
outfile = baseNameOutPdf.Replace(ext, "_" + i + ext);
doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));

//pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
//pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, System.Net.HttpWebRequest.Create(outfile).GetResponse().GetResponseStream());
doc.Open();
if (i < n)
{
for (int j = 1; j <= numOfPages; j++)
{

page = pdfCpy.GetImportedPage(reader, currentPage);
pdfCpy.AddPage(page);--------Here only error is happen.
currentPage += 1;
}
}
else
{
for (int j = currentPage; j <= pageCount; j++)
{
page = pdfCpy.GetImportedPage(reader, j);
pdfCpy.AddPage(page);
}
}
doc.Close();

}
}
reader.Close();
return 1;
}
catch (Exception ex)--When i see the exception it will that error.
{
return -1;
throw ex;
}

is this error happen because of particular PDF????

**stanav** · Aug 6th, 2010, 10:54 AM

Originally Posted by prabakarank

is this error happen because of particular PDF????

Probably... Can you upload a copy of that particluar pdf file so that I can use it to investigate further?

**prabakarank** · Aug 8th, 2010, 11:45 PM

Hi i uploaded the pdf file. please check the application with the PDF file.
This pdf file is 3 page pdf file. First page is successfully splitted. When second page split it gives the following error "Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'."

Please let me know How can we solved the issue??

**vijy** · Aug 11th, 2010, 08:40 AM

I passed your pdf for the below method, its spliiting all pages exactly.

Code:

SplitPdfByParts("E:\Vijay\E-Pub RandE\ComparedEPubPDF\ComparedEPubPDF\bin\Debug\2.pdf", 3, "temp.pdf")

vb Code:

Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
        Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
        Dim doc As iTextSharp.text.Document = Nothing
        Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
        Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
        Dim pageCount As Integer = 0
        Try
            reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
            pageCount = reader.NumberOfPages
            If pageCount < parts Then
                Throw New ArgumentException("Not enough pages in source pdf to split")
            Else
                Dim n As Integer = pageCount \ parts
                Dim currentPage As Integer = 1
                Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
                Dim outfile As String = String.Empty
                For i As Integer = 1 To parts
                    outfile = baseNameOutPdf.Replace(ext, "_" & i & ext)
                    doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
                    pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
                    doc.Open()
                    If i < parts Then
                        For j As Integer = 1 To n
                            page = pdfCpy.GetImportedPage(reader, currentPage)
                            pdfCpy.AddPage(page)
                            currentPage += 1
                        Next j
                    Else
                        For j As Integer = currentPage To pageCount
                            page = pdfCpy.GetImportedPage(reader, j)
                            pdfCpy.AddPage(page)
                        Next j
                    End If
                    doc.Close()
                Next
            End If
            reader.Close()
        Catch ex As Exception
            Throw ex
        End Try
    End Sub

**prabakarank** · Aug 11th, 2010, 09:31 AM

Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
I have used "itextsharp-5.0.2-dll" .
Please check with once again whether its working or not.. please be sure that
all splitted pdf files are created.

**prabakarank** · Oct 6th, 2010, 01:56 AM

Hi..
I have one question. Is there any possible to set password for the each splitted pdf file.
Please tell me how we can do this.

**stanav** · Oct 6th, 2010, 08:17 AM

Originally Posted by prabakarank

Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
I have used "itextsharp-5.0.2-dll" .
Please check with once again whether its working or not.. please be sure that
all splitted pdf files are created.

I've uploaded the new PdfManipulation2 class which works with itextsharp 5.0.2.

**stanav** · Oct 6th, 2010, 08:59 AM

Originally Posted by prabakarank

Hi..
I have one question. Is there any possible to set password for the each splitted pdf file.
Please tell me how we can do this.

I don't know anyway to set passwords to the splitted pdf's on the fly. However, you can certainly do it on a 2nd pass.
1st pass: split the pdf as usual.
2nd pass: use PdfEncryptor.Encrypt method to set the user and/or owner passwords to those newly spliited pdfs. You can do this in a separate method after done splitting or you can set the password to each splitted pdf right after it is created. The 2nd approach is preferred. It's just a few extra line of codes. If you have trouble figuring it out, let me know.

**nbrege** · Oct 6th, 2010, 09:10 AM

stanav ... what functions are included in your new class?

**stanav** · Oct 6th, 2010, 11:46 AM

Originally Posted by nbrege

stanav ... what functions are included in your new class?

I updated my original post to include a list of public methods in the new class.

**blofvendahl** · Oct 7th, 2010, 01:43 PM

Does the MergePdfFiles routine also merge bookmarks?

**stanav** · Oct 7th, 2010, 04:03 PM

Originally Posted by blofvendahl

Does the MergePdfFiles routine also merge bookmarks?

No, it doesn't...

**prabakarank** · Oct 8th, 2010, 12:19 AM

Hi,
I got the below error.
"PdfReader not opened with owner password"
What we have to resolve the issue??

Thanks

**prabakarank** · Oct 8th, 2010, 12:40 AM

Hi,
Can you give me the code to set password for each split pdf files.

Thanks

**stanav** · Oct 8th, 2010, 07:48 AM

Originally Posted by prabakarank

Hi,
Can you give me the code to set password for each split pdf files.

Thanks

It's already in the PdfManipulation2 class. The method is:

Code:

SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)

**stanav** · Oct 8th, 2010, 08:09 AM

Originally Posted by prabakarank

Hi,
I got the below error.
"PdfReader not opened with owner password"
What we have to resolve the issue??

Thanks

1. You need to know the owner password of the pdf you're working on.
2. Use the 2nd overload of the PdfReader class contructor which allows you to supply the owner password as a byte array when you create a pdfreader object. Something like this:

Code:

 Dim ownerPwd As String = "put the owner password here"
            Dim pwdBytes() As Byte = System.Text.Encoding.Default.GetBytes(ownerPwd)
            Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePDF, pwdBytes)

The rest of the code is the same.

3. If you forget the owner password for some reason, you will have to remove all restrictions on that pdf using the RemoveRestrictions method and save the new unrestricted pdf to a temp location. You then can work on that temporary unrestricted pdf as normal. When done, delete it if you don't want to keep it.

**blofvendahl** · Oct 8th, 2010, 12:15 PM

Hey Stanav,

Which method in your class, if any, can be used to extract bookmark info from a pdf?

thanks
Brian

Thread: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Thread Tools

Display

[VB.NET] Pdf Manipulation Class Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

Posting Permissions