[VB.NET] Extract Pages and Split Pdf Files Using iTextSharp-VBForums
Page 1 of 4 1234 LastLast
Results 1 to 40 of 129

Thread: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

  1. #1

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    [VB.NET] Pdf Manipulation Class Using iTextSharp

    This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):

    1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.

    2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress >>>> Last updated on 4/9/2012 <<<<

    Please verify the version of iTextSharp you're using and download the correct class.

    The current version of PdfManipulation2 class supports AES_256 encryption provided that your itextsharp.dll version is 5.1.x or higher.

    Below is the list of public methods in the new PdfManipulation2 class
    vb.net Code:
    1. 'Remove all restrictions from a pdf file
    2.     Public Shared Function RemoveRestrictions(ByVal restrictedPdf As String, Optional ByVal password As String = Nothing, Optional ByVal saveABackup As Boolean = True) As Boolean
    3.    
    4.     'Parse text from a specified range of pdf pages    
    5.     Public Shared Function ParsePdfText(ByVal sourcePDF As String, _
    6.                                   Optional ByVal fromPageNum As Integer = 0, _
    7.                                   Optional ByVal toPageNum As Integer = 0) As String
    8.    
    9.     'Parse all text from a pdf
    10.     Public Shared Function ParseAllPdfText(ByVal sourcePDF As String) As Dictionary(Of Integer, String)
    11.    
    12.     'Page to page comparision of 2 pdf files and write the differences to a resulting text file    
    13.     Public Shared Sub ComparePdfs(ByVal pdf1 As String, ByVal pdf2 As String, _
    14.                                   ByVal resultFile As String, _
    15.                                   Optional ByVal fromPageNum As Integer = 0, _
    16.                                   Optional ByVal toPageNum As Integer = 0)
    17.    
    18.     'Extract specified pages from a pdf to create a new pdf
    19.     Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
    20.  
    21.     'Split a pdf into specified number of pdfs
    22.     Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
    23.    
    24.     'Split a pdf into multiple pdfs each containing a specified number of pages.  
    25.     Public Shared Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
    26.    
    27.     'Extract pages from multiple source pdfs and merge into a final pdf    
    28.     Public Shared Sub ExtractAndMergePdfPages(ByVal sourceTable As DataTable, ByVal outPdf As String)
    29.      
    30.     'Set security password on an existing pdf file  
    31.     Public Shared Sub SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)
    32.      
    33.     'Add watermark to pdf pages using an image  
    34.     Public Shared Sub AddWatermarkImage(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkImage As String)
    35.    
    36.     'Add water mark to all pdf pages using text    
    37.     Public Shared Sub AddWatermarkText(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkText() As String, _
    38.                                        Optional ByVal watermarkFont As iTextSharp.text.pdf.BaseFont = Nothing, _
    39.                                        Optional ByVal watermarkFontSize As Single = 48, _
    40.                                        Optional ByVal watermarkFontColor As iTextSharp.text.BaseColor = Nothing, _
    41.                                        Optional ByVal watermarkFontOpacity As Single = 0.3F, _
    42.                                        Optional ByVal watermarkRotation As Single = 45.0F)
    43.  
    44.     'Merge multiple pdfs into a single one.  
    45.     Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String, _
    46.                                          Optional ByVal authorName As String = "", _
    47.                                          Optional ByVal creatorName As String = "", _
    48.                                          Optional ByVal subject As String = "", _
    49.                                          Optional ByVal title As String = "", _
    50.                                          Optional ByVal keywords As String = "") As Boolean
    51.  
    52.     'Merge multiple pdf's into one with all bookmarks preserved
    53.     Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
    54.        
    55.     'Add document outline (bookmarks) to a pdf
    56.     Public Shared Sub AddDocumentOutline(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal outlineTable As System.Data.DataTable)
    57.      
    58.     'Extract urls from a pdf  
    59.     Public Shared Function ExtractURLs(ByVal sourcePdf As String, Optional ByVal pageNumbers() As Integer = Nothing) As System.Data.DataTable
    60.        
    61.     'Extract images from a pdf
    62.     Public Shared Function ExtractImages(ByVal sourcePdf As String) As List(Of Image)
    63.      
    64.     'Fill a form  
    65.     Public Shared Sub FillAcroForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String)
    66.  
    67.     Public Shared Sub FillMyForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String)
    68.  
    69.     'Add annotatation
    70.     Public Shared Sub AddTextAnnotation(ByVal sourcePdf As String, ByVal outputPdf As String)
    71.  
    72.     Public Shared Function GetAcroFieldData(ByVal sourcePdf As String) As Dictionary(Of String, String)
    73.        
    74.     Public Shared Function GetPdfSummary(ByVal sourcePdf As String) As DataTable
    75.        
    76.     Public Shared Function ReplacePagesWithBlank(ByVal sourcePdf As String, _
    77.                                                  ByVal pagesToReplace As List(Of Integer), _
    78.                                                  ByVal outPdf As String, _
    79.                                                  Optional ByVal templatePdf As String = "") As Boolean
    80.        
    81.     Public Shared Function InsertPages(ByVal sourcePdf As String, _
    82.                                        ByVal pagesToInsert As Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage), _
    83.                                        ByVal outPdf As String) As Boolean
    84.        
    85.     Public Shared Function RemovePages(ByVal sourcePdf As String, ByVal pagesToRemove As List(Of Integer), ByVal outputPdf As String) As Boolean
    86.      
    87.     'A demo on how to draw various shapes in itextsharp  
    88.     Public Shared Sub DrawShapesDemo(ByVal sourcePdf As String, ByVal outputPdf As String)
    89.          
    90.     Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)


    Any comments are welcomed.
    Happy coding
    Stanav.
    Attached Files Attached Files
    Last edited by stanav; Apr 9th, 2012 at 02:36 PM. Reason: New version of PdfManipulation2 class now supports AES-256 encryption

  2. #2
    Frenzied Member
    Join Date
    Jul 2006
    Location
    MI
    Posts
    1,597

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Stanav ... thanks for posting these code samples. They helped me on a project that I am currently working on. I would like to request that you post another sample: I need to be able to extract specified pages from multiple documents & save them to one combined PDF. ie. take pages 3 & 7 from Doc1.pdf, 4-6 from Doc2.pdf & 1, 5 & 12 from Doc3.pdf and save them in Doc4.pdf Is this "do-able"?
    Last edited by nbrege; Dec 14th, 2007 at 10:36 AM.

  3. #3

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Yes, it's doable. However, I'm on vaction right now and I do not have access to my work computer which has all the needed tools to write code. What you can do right now is to create a function that returns a hashtable or a dictionary with the file names (string) being the keys and the pages to extract (integer array) being the values. Once you have this hashtable/dictionary, you can modify the ExtractPdfPage sub such that it will create a single new pdf file and then loop trhu the hashtable/dictionary to extract the pages and add them o the output pdf. It's just a matter of setting up the loop right such that in each loop, you read an entry and extract pages from that file.
    If you can wait until later this week when I return to work, I can try to come up with something for you in code.
    Best regards,
    Stanav.

  4. #4
    Frenzied Member
    Join Date
    Jul 2006
    Location
    MI
    Posts
    1,597

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing. Enjoy the rest of your vacation...

  5. #5

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by nbrege
    If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing. Enjoy the rest of your vacation...
    I've added a method to do what you need. Since the total text is more than 1000 characters, I had to put all the code in to a class (PdfManipulation.vb) and post it as an attachment. Hope it helps.

  6. #6
    New Member
    Join Date
    Jul 2008
    Posts
    2

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,

    Do you have any code sample that will convert pdf to multipage tiff? - thanks

  7. #7

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by gaigoi113
    Hi Stanav,

    Do you have any code sample that will convert pdf to multipage tiff? - thanks
    It's impossible to use iTextSharp to convert pdf to multipage tiff. However, you can use PDFBox to convert each pdf page to an image file (it only outputs to jpg's or png's), then merge these images into a multipage tiff.

    To download PDFBox, go here:
    http://www.pdfbox.org/index.html

    To merge multiple images into 1 multipage tiff, check out this codeproject article:
    http://www.codeproject.com/KB/GDI-pl...ipageTiff.aspx

    And good luck
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  8. #8
    New Member
    Join Date
    Jul 2009
    Posts
    2

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi all.

    I know this thread is old, but I am using the iTextSharp library in this exact way.

    I have a PDF with 4 pages and use this code to extract page 3 in a quick example prog I made.

    However, the original PDF has text fields I can edit ( acrofields ) and after extraction the 3rd page, loses these fields.

    Any idea(s) what I can change / do to keep these editable fields in the resulting page 3.

    Thanks.

  9. #9
    Member
    Join Date
    Mar 2007
    Posts
    34

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,

    I'm trying to extract a single page from a multi page pdf and I'm using the code below; however, I'm getting an error that it's not recognizing <param name>. Any help would be great. Thanks.

    Code:
    ''' <summary>
        ''' Extract a single page from source pdf to a new pdf
        ''' </summary>
        <param name="sourcePdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"</param>
        <param name="pageNumberToExtract">"P1T1"</param>
        <param name="outPdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40a.pdf"</param>
        ''' <remarks></remarks>
        Public Shared Sub ExtractPdfPage(ByVal sourcePdf As String, ByVal pageNumberToExtract As Integer, ByVal outPdf As String)
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim doc As iTextSharp.text.Document = Nothing
            Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Try
                reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
                doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
                pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
                doc.Open()
                page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
                pdfCpy.AddPage(page)
                doc.Close()
                reader.Close()
            Catch ex As Exception
                Throw ex
            End Try
        End Sub

  10. #10

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Why are you putting your arguments in the code comments? That's not how you do it. You need to call the sub and pass in your arguments, something like this:
    vb.net Code:
    1. 'Specified the path to the source pdf file
    2. Dim sourcePdf as sgtring = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"
    3.  
    4. 'Extract page # 2 off this above pdf file
    5. Dim pageNumberToExtract As Integer = 2
    6.  
    7. 'And then save it to a new pdf named 'table40_page2.pdf'
    8. Dim outputPdf As String = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40_page2.pdf"
    9.  
    10. 'Call the sub somewhere in your program passing in the above arguments
    11. PdfManipulation.ExtractPdfPage("C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf", pageNumberToExtract, outputPdf)
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  11. #11
    New Member
    Join Date
    Oct 2009
    Posts
    5

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Stanav :

    i have tried itextsharp for putting watermark on pdfs.It worked fine.

    Now i am trying to edit Header on existing pdf files to desired header.

    Is it possible.

    if its possible then i have to try to use it on the bunch of pdf files in one single folder

    Thanks for the help

    Sri

  12. #12

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by slow&steady View Post
    Stanav :

    i have tried itextsharp for putting watermark on pdfs.It worked fine.

    Now i am trying to edit Header on existing pdf files to desired header.

    Is it possible.

    if its possible then i have to try to use it on the bunch of pdf files in one single folder

    Thanks for the help

    Sri
    Yes, it's possible to add/change the header/footer of an existing pdf file and save the result to a new file. Please post your question in VB.Net forum because it's a different subject and doeasn't belong to this code bank thread.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  13. #13
    Fanatic Member vijy's Avatar
    Join Date
    May 2007
    Location
    India
    Posts
    542

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,
    its possible to extract the PDF pages with bookmarks?
    Visual Studio.net 2010
    If this post is useful, rate it


  14. #14

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by vijy View Post
    Hi Stanav,
    its possible to extract the PDF pages with bookmarks?
    Yes, I THINK it is quite possible, but it would involve much more work (obviously). I gave it a shot as seen in the code below but frankly, the method I was using only works to some extends. It only preserves the 1st level bookmarks . My approach was to export the bookmarks in the original pdf to a collection, and, select the pages to be extract from the reader, use pdfstamper to copy the original pdf (with now only the selected pages) to a new pdf. Since pdfstamper automatically preserves ALL the bookmarks from the original, I had to edit the bookmark collection to remove the unused ones. This approach should work but I don't know why it only preserves 1st level bookmarks. Some more work is needed to work that bug out, but I don't have the time right now. I will post just what I have so far.
    vb.net Code:
    1. ''' <summary>
    2.     ''' Extract pages from an existing pdf file to create a new pdf with bookmarks preserved
    3.     ''' </summary>
    4.     ''' <param name="sourcePdf">full path to sthe source pdf</param>
    5.     ''' <param name="pageNumbersToExtract">an integer array containing the page number of the pages to be extracted</param>
    6.     ''' <param name="outPdf">the full path to the output pdf</param>
    7.     ''' <remarks></remarks>
    8.     Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
    9.  
    10.         Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
    11.         Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    12.         Dim outlines As System.Collections.ArrayList = Nothing
    13.         Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
    14.         Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
    15.         Dim hshTable As System.Collections.Hashtable = Nothing
    16.         Try
    17.             raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
    18.             reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
    19.             outlines = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)
    20.             reader.SelectPages(New System.Collections.ArrayList(pageNumbersToExtract))
    21.             stamper = New iTextSharp.text.pdf.PdfStamper(reader, New IO.FileStream(outPdf, IO.FileMode.Create))
    22.             RemoveUnusedBookmarks(outlines, pageNumbersToExtract)
    23.             stamper.Outlines = outlines
    24.             stamper.Close()
    25.             reader.Close()
    26.         Catch ex As Exception
    27.             MessageBox.Show(ex.Message)
    28.         End Try
    29.     End Sub
    30.  
    31.     Private Shared Sub RemoveUnusedBookmarks(ByRef bookmarks As System.Collections.ArrayList, ByVal pagesToKeep() As Integer)
    32.         Dim bookmark As System.Collections.Hashtable = Nothing
    33.         Dim obj As Object = Nothing
    34.         For i As Integer = bookmarks.Count - 1 To 0 Step -1
    35.             obj = bookmarks(i)
    36.             If TypeOf obj Is System.Collections.ArrayList Then
    37.                 RemoveUnusedBookmarks(DirectCast(obj, System.Collections.ArrayList), pagesToKeep)
    38.             ElseIf TypeOf obj Is System.Collections.Hashtable Then
    39.                 bookmark = DirectCast(obj, System.Collections.Hashtable)
    40.                 If bookmark.ContainsKey("Page") Then
    41.                     Dim value As String = DirectCast(bookmark.Item("Page"), String)
    42.                     If Not String.IsNullOrEmpty(value) Then
    43.                         Dim parts() As String = value.Split(" "c)
    44.                         If parts.Length > 0 Then
    45.                             Dim pageNum As Integer = -1
    46.                             If Integer.TryParse(parts(0), pageNum) Then
    47.                                 Dim idx As Integer = System.Array.IndexOf(pagesToKeep, pageNum)
    48.                                 If idx < 0 Then
    49.                                     bookmarks.Remove(obj)
    50.                                 Else
    51.                                     parts(0) = (idx + 1).ToString
    52.                                     value = String.Join(" ", parts)
    53.                                     bookmark.Item("Page") = value
    54.                                 End If
    55.                             End If
    56.                         End If
    57.                     End If
    58.                 End If
    59.             End If
    60.         Next
    61.     End Sub

    Another approach I thought of was to export the original bookmarks to an XML file and edit that file. Once done, import it back to the new pdf file (which contains only the extracted pages). But like I said, I'm currently donot have a lot of free time to play with it. So I leave it to you to try

    Good luck.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  15. #15
    Fanatic Member vijy's Avatar
    Join Date
    May 2007
    Location
    India
    Posts
    542

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Thanks stanav...
    yep i tried and i get...

    Splitting Code:
    1. Public Function SplitPdfFiles(ByVal iStartPage As String, ByVal iEndPage As String, ByVal sPDFPath As String) As Boolean
    2.         Try
    3.             'Variables to hold the split file informations
    4.            
    5.             Dim reader As PdfReader = New PdfReader(sPDFPath)
    6.             reader.RemoveUnusedObjects()
    7.             reader.ConsolidateNamedDestinations()
    8.  
    9.             Dim importedPage As PdfImportedPage = Nothing
    10.             Dim currentDocument As New Document
    11.             Dim pdfWriter As PdfSmartCopy = Nothing
    12.  
    13.            
    14.             Dim bIsFirst As Boolean = True
    15.             For j As Integer = iStartPage To iEndPage
    16.                 If bIsFirst Then
    17.                     bIsFirst = False
    18.                     currentDocument = New Document(reader.GetPageSizeWithRotation(1))
    19.                     pdfWriter = New PdfSmartCopy(currentDocument, New System.IO.FileStream(System.IO.Path.GetDirectoryName(sInFile) & "\" & sSplitName, System.IO.FileMode.Create))
    20.                     pdfWriter.SetFullCompression()
    21.                     ' pdfWriter.CompressionLevel = PdfStream.BEST_COMPRESSION
    22.                     pdfWriter.PdfVersion = reader.PdfVersion
    23.                     currentDocument.Open()
    24.                 End If
    25.  
    26.                 importedPage = pdfWriter.GetImportedPage(reader, j)
    27.                 pdfWriter.AddPage(importedPage)
    28.             Next
    29.  
    30.             Dim bookMark As New ArrayList
    31.             bookMark = SimpleBookmark.GetBookmark(reader)
    32.          
    33.             If bookMark IsNot Nothing Then
    34.                 SimpleBookmark.EliminatePages(bookMark, New Integer() {iEndPage + 1, reader.NumberOfPages})
    35.                 If iStartPage > 1 Then
    36.                     SimpleBookmark.EliminatePages(bookMark, New Integer() {1, iStartPage})
    37.                     SimpleBookmark.ShiftPageNumbers(bookMark, -(iStartPage - 1), Nothing)
    38.                 End If
    39.                 pdfWriter.Outlines = bookMark
    40.             End If
    41.             currentDocument.Close()
    42.             pdfWriter.Close()
    43.             Return True
    44.         Catch ex As Exception
    45.         End Try
    46.         Return False
    47.     End Function

    this one working fine.. and the pdf extracting with actual bookmarks..

    This approach should work but I don't know why it only preserves 1st level bookmarks
    the problem is its preserving first level bookmarks.. Stanav, its possible to get atleast the child bookmarks collection..??
    Visual Studio.net 2010
    If this post is useful, rate it


  16. #16
    New Member
    Join Date
    May 2010
    Posts
    1

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Has anyone found a code example on how to convert PDF to image using iTextSharp or PDFBox?

  17. #17
    New Member
    Join Date
    Feb 2009
    Posts
    5

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,

    First nice work, you help me allot, wit you example but i have a question,

    I'm using the "SplitPdfByPages" and is working ok, but is there any reason for the extraction pdf's end with a larger size that the original that as 5.pag?


    Ex.:

    Original pdf with 5.pag ( 72KB )

    I extract the 5.pag with your example code, and etch pag ends with 85KB

    Is there any way to compress the extraction pages? or some reason for this?


    Regards,

  18. #18
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,
    I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported".

    Please give the solutions for the above problem. Please do the needful.

  19. #19

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi,
    I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported".

    Please give the solutions for the above problem. Please do the needful.
    You download the file and save it to a temp location 1st. After that, you can split it as usual. If you don't need the original pdf after done splitting, you can delete it.
    To download a file from an url, you can use a WebClient or simply use
    My.Computer.Network.DownloadFile(url, saveLocation).
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  20. #20
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi ,
    I need to pass the parameter like this ("http://localhost:1870/PDFWCFService/1.pdf",1,"http://localhost:1870/PDFWCFService/2.pdf") in the SplitPdfByPages method..
    The output file in the format of URL.
    It returns following error "Uri format is not supported".
    Please give the solutions for the above problem. Please do the needful.

  21. #21

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    You need to supply the physical file paths... There's no way around it because we rely on iTextSharp to do the work, and if iTextSharp doesn't support it, there's not much we can do to.
    However, that is not a problem. The problem is with your methodology of doing things. While you can access (download) a file from an url, you cannot upload the file using an url. If you are to run the splitting task any PC, you will need to download the file to the local PC, split it and then upload it back. If you're to run that splitting task on the server that host your web site, you have to give it the direct physical paths and not the url's. You cannot treat an url the same as a conventional file path.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  22. #22
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,
    i got the below error
    Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'.

    Whats the reason i got that error. How we avoid this type error. Is there any solution for this problem.

  23. #23

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Show the code where the error occured...
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  24. #24
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Below is the code. I converted from Vb.net to C#.

    iTextSharp.text.pdf.PdfReader reader = null;
    iTextSharp.text.Document doc = null;
    iTextSharp.text.pdf.PdfCopy pdfCpy = null;
    iTextSharp.text.pdf.PdfImportedPage page = null;
    int pageCount = 0;
    try
    {
    reader = new iTextSharp.text.pdf.PdfReader(sourcePdf);
    pageCount = reader.NumberOfPages;
    if (pageCount < numOfPages)
    {
    return -1;
    throw new ArgumentException("Not enough pages in source pdf to split");
    }
    else
    {
    string ext = System.IO.Path.GetExtension(baseNameOutPdf);
    string outfile = string.Empty;
    int n = Convert.ToInt32(Math.Ceiling(Convert.ToDouble(pageCount) / Convert.ToDouble(numOfPages)));
    int currentPage = 1;
    for (int i = 1; i <= n; i++)
    {
    outfile = baseNameOutPdf.Replace(ext, "_" + i + ext);
    doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));

    //pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
    pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
    //pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, System.Net.HttpWebRequest.Create(outfile).GetResponse().GetResponseStream());
    doc.Open();
    if (i < n)
    {
    for (int j = 1; j <= numOfPages; j++)
    {

    page = pdfCpy.GetImportedPage(reader, currentPage);
    pdfCpy.AddPage(page);--------Here only error is happen.
    currentPage += 1;
    }
    }
    else
    {
    for (int j = currentPage; j <= pageCount; j++)
    {
    page = pdfCpy.GetImportedPage(reader, j);
    pdfCpy.AddPage(page);
    }
    }
    doc.Close();

    }
    }
    reader.Close();
    return 1;
    }
    catch (Exception ex)--When i see the exception it will that error.
    {
    return -1;
    throw ex;
    }



    is this error happen because of particular PDF????

  25. #25

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    is this error happen because of particular PDF????
    Probably... Can you upload a copy of that particluar pdf file so that I can use it to investigate further?
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  26. #26
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi i uploaded the pdf file. please check the application with the PDF file.
    This pdf file is 3 page pdf file. First page is successfully splitted. When second page split it gives the following error "Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'."

    Please let me know How can we solved the issue??
    Attached Images Attached Images
    • File Type: pdf 2.pdf (174.6 KB, 691 views)

  27. #27
    Fanatic Member vijy's Avatar
    Join Date
    May 2007
    Location
    India
    Posts
    542

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    I passed your pdf for the below method, its spliiting all pages exactly.
    Code:
    SplitPdfByParts("E:\Vijay\E-Pub RandE\ComparedEPubPDF\ComparedEPubPDF\bin\Debug\2.pdf", 3, "temp.pdf")
    vb Code:
    1. Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
    2.         Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    3.         Dim doc As iTextSharp.text.Document = Nothing
    4.         Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
    5.         Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
    6.         Dim pageCount As Integer = 0
    7.         Try
    8.             reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
    9.             pageCount = reader.NumberOfPages
    10.             If pageCount < parts Then
    11.                 Throw New ArgumentException("Not enough pages in source pdf to split")
    12.             Else
    13.                 Dim n As Integer = pageCount \ parts
    14.                 Dim currentPage As Integer = 1
    15.                 Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
    16.                 Dim outfile As String = String.Empty
    17.                 For i As Integer = 1 To parts
    18.                     outfile = baseNameOutPdf.Replace(ext, "_" & i & ext)
    19.                     doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
    20.                     pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
    21.                     doc.Open()
    22.                     If i < parts Then
    23.                         For j As Integer = 1 To n
    24.                             page = pdfCpy.GetImportedPage(reader, currentPage)
    25.                             pdfCpy.AddPage(page)
    26.                             currentPage += 1
    27.                         Next j
    28.                     Else
    29.                         For j As Integer = currentPage To pageCount
    30.                             page = pdfCpy.GetImportedPage(reader, j)
    31.                             pdfCpy.AddPage(page)
    32.                         Next j
    33.                     End If
    34.                     doc.Close()
    35.                 Next
    36.             End If
    37.             reader.Close()
    38.         Catch ex As Exception
    39.             Throw ex
    40.         End Try
    41.     End Sub
    Visual Studio.net 2010
    If this post is useful, rate it


  28. #28
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
    I have used "itextsharp-5.0.2-dll" .
    Please check with once again whether its working or not.. please be sure that
    all splitted pdf files are created.

  29. #29
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi..
    I have one question. Is there any possible to set password for the each splitted pdf file.
    Please tell me how we can do this.

  30. #30

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
    I have used "itextsharp-5.0.2-dll" .
    Please check with once again whether its working or not.. please be sure that
    all splitted pdf files are created.
    I've uploaded the new PdfManipulation2 class which works with itextsharp 5.0.2.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  31. #31

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi..
    I have one question. Is there any possible to set password for the each splitted pdf file.
    Please tell me how we can do this.
    I don't know anyway to set passwords to the splitted pdf's on the fly. However, you can certainly do it on a 2nd pass.
    1st pass: split the pdf as usual.
    2nd pass: use PdfEncryptor.Encrypt method to set the user and/or owner passwords to those newly spliited pdfs. You can do this in a separate method after done splitting or you can set the password to each splitted pdf right after it is created. The 2nd approach is preferred. It's just a few extra line of codes. If you have trouble figuring it out, let me know.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  32. #32
    Frenzied Member
    Join Date
    Jul 2006
    Location
    MI
    Posts
    1,597

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    stanav ... what functions are included in your new class?

  33. #33

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by nbrege View Post
    stanav ... what functions are included in your new class?
    I updated my original post to include a list of public methods in the new class.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  34. #34
    New Member
    Join Date
    Oct 2010
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Does the MergePdfFiles routine also merge bookmarks?

  35. #35

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by blofvendahl View Post
    Does the MergePdfFiles routine also merge bookmarks?
    No, it doesn't...
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  36. #36
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,
    I got the below error.
    "PdfReader not opened with owner password"
    What we have to resolve the issue??

    Thanks

  37. #37
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,
    Can you give me the code to set password for each split pdf files.

    Thanks

  38. #38

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi,
    Can you give me the code to set password for each split pdf files.

    Thanks
    It's already in the PdfManipulation2 class. The method is:
    Code:
    SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  39. #39

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi,
    I got the below error.
    "PdfReader not opened with owner password"
    What we have to resolve the issue??

    Thanks
    1. You need to know the owner password of the pdf you're working on.
    2. Use the 2nd overload of the PdfReader class contructor which allows you to supply the owner password as a byte array when you create a pdfreader object. Something like this:
    Code:
     Dim ownerPwd As String = "put the owner password here"
                Dim pwdBytes() As Byte = System.Text.Encoding.Default.GetBytes(ownerPwd)
                Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePDF, pwdBytes)
    The rest of the code is the same.

    3. If you forget the owner password for some reason, you will have to remove all restrictions on that pdf using the RemoveRestrictions method and save the new unrestricted pdf to a temp location. You then can work on that temporary unrestricted pdf as normal. When done, delete it if you don't want to keep it.
    Last edited by stanav; Oct 8th, 2010 at 08:13 AM.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  40. #40
    New Member
    Join Date
    Oct 2010
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hey Stanav,

    Which method in your class, if any, can be used to extract bookmark info from a pdf?

    thanks
    Brian

Page 1 of 4 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width

Survey posted by VBForums.