VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)-VBForums
Results 1 to 17 of 17

Thread: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

  1. #1

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    Hello all,
    I was recently working on a job assignment dealing with pdf files. My company produces hundreds of daily reports in pdf format where each report is for a specific division/sub-company. Some top executives want to look at only a single report that contains all divisions/sub-companies instead of looking at each one seperately, so my job is to merge those reports together into a single pdf file with bookmarks for easy navigation. Originally, I had used Acrobat COM object approach but the management didn't want to spend $ to buy a full version of Adobe Acrobat for every PC that runs my program, so I had to rewrite without relying on Acrobat. I then found the open source PDFBox package which can be downloaded here... Once you had the package downloaded and unzipped to a directory in your local machine, you need to add the following references to your project:
    Code:
    IKVM.GNU.Classpath
    IKVM.Runtime
    PDFBox-0.7.3
    To make the story short, here are the steps I did:
    1. Create a list of pdf files to be merge.
    2. Merge those pdf files into a temp file. The merging order will follow the order of the items in the list.
    3. Create a data table to hold bookmark data. Each datarow contains the bookmark title and the page number it points to.
    4. Open the merged temp file and insert bookmarks to it using info from the bookmark data table, then save it to a new file.
    5. If all successful, delete the temp file

    Code of interests:
    vb Code:
    1. Private Function MergePdfFiles(ByVal pdfFileList As List(Of String), _
    2.                                    ByVal outputFileFullName As String) As Boolean
    3.         Dim result As Boolean = False
    4.         Dim pdfMerger As PDFMergerUtility = Nothing
    5.         Dim fileCount As Integer = pdfFileList.Count
    6.         If fileCount > 1 Then
    7.             Try
    8.                 'Instantiate an instance of Pdf Merger Utility
    9.                 pdfMerger = New PDFMergerUtility()
    10.                 With pdfMerger
    11.                     'Set output destination
    12.                     .setDestinationFileName(outputFileFullName)
    13.                     'Looping thru the file list and add source to the merger
    14.                     For i As Integer = 0 To fileCount - 1 Step 1
    15.                         .addSource(pdfFileList(i))
    16.                     Next i
    17.                     'Merge the documents
    18.                     pdfMerger.mergeDocuments()
    19.                     result = True
    20.                 End With
    21.             Catch ex As Exception
    22.                 WriteToLog("MergePDFFile(" & outputFileFullName & "): " & ex.Message)
    23.                 Return False
    24.             End Try
    25.         End If
    26.         Return result
    27.     End Function
    28.  
    29.     Private Function CreateBookmarkDataTable(ByVal pdfFileList As List(Of String)) As DataTable
    30.         Dim bookmarkData As New DataTable
    31.         Dim row As DataRow = Nothing
    32.         Dim bookmarkTitle As String = String.Empty
    33.         Dim pageNumber As Integer = 0
    34.         Try
    35.             bookmarkData.Columns.Add("BookmarkTitle", GetType(String))
    36.             bookmarkData.Columns.Add("PageNumber", GetType(Integer))
    37.             Dim count As Integer = pdfFileList.Count
    38.             If count > 0 Then
    39.                 For i As Integer = 0 To count - 1 Step 1
    40.                     bookmarkTitle = Path.GetFileNameWithoutExtension(pdfFileList(i))
    41.                     row = bookmarkData.NewRow()
    42.                     row.Item("BookmarkTitle") = bookmarkTitle
    43.                     row.Item("PageNumber") = pageNumber
    44.                     bookmarkData.Rows.Add(row)
    45.                     pageNumber += GetPageCount(pdfFileList(i))
    46.                 Next
    47.             End If
    48.         Catch ex As Exception
    49.             WriteToLog("CreateBookmarkDataTable(): " & ex.Message)
    50.             Return Nothing
    51.         End Try
    52.         Return bookmarkData
    53.     End Function
    54.  
    55.     Private Function GetPageCount(ByVal pdfFile As String) As Integer
    56.         Dim pageCount As Integer
    57.         Dim pdfDoc As PDDocument = Nothing
    58.         Try
    59.             pdfDoc = PDDocument.load(pdfFile)
    60.             pageCount = pdfDoc.getNumberOfPages
    61.         Catch ex As Exception
    62.             WriteToLog("GetPageCount(" & pdfFile & "): " & ex.Message)
    63.             Return 0
    64.         Finally
    65.             If Not pdfDoc Is Nothing Then
    66.                 pdfDoc.close()
    67.             End If
    68.         End Try
    69.         Return pageCount
    70.     End Function
    71.  
    72.     Private Function AddBookMarks(ByVal pdfFile As String, _
    73.                                   ByVal bookmarkTable As DataTable) As Boolean
    74.         Dim result As Boolean = False
    75.         Dim PdfDoc As PDDocument = Nothing
    76.         Dim outFile As String = String.Empty
    77.         Dim rowCount As Integer = bookmarkTable.Rows.Count
    78.         Try
    79.             If rowCount > 0 Then
    80.                 'Set the output file full path
    81.                 outFile = pdfFile.Replace("temp_", "")
    82.                 'Load the input pdf file
    83.                 PdfDoc = PDDocument.load(pdfFile)
    84.                 If Not PdfDoc.isEncrypted() Then
    85.                     'Create new document outline and assign it to the pdf document
    86.                     Dim outline As PDDocumentOutline = New PDDocumentOutline()
    87.                     PdfDoc.getDocumentCatalog().setDocumentOutline(outline)
    88.  
    89.                     'Create new outline item for the document outline
    90.                     Dim pagesOutline As PDOutlineItem = New PDOutlineItem()
    91.                     pagesOutline.setTitle("All Pages")
    92.                     outline.appendChild(pagesOutline)
    93.  
    94.                     'Get the list of pages in the document
    95.                     Dim pages As List = PdfDoc.getDocumentCatalog().getAllPages()
    96.  
    97.                     Dim i, pageNumber As Integer
    98.                     Dim row As DataRow = Nothing
    99.                     Dim bookmarkTitle As String = String.Empty
    100.                     'loop thru the bookmark datatable and add bookmarks to the document accordingly
    101.                     For i = 0 To rowCount - 1 Step 1
    102.                         'Read the row's data
    103.                         row = bookmarkTable.Rows(i)
    104.                         pageNumber = CInt(row.Item("PageNumber"))
    105.                         bookmarkTitle = CStr(row.Item("BookmarkTitle"))
    106.                         'Get the page at pageNumber from pages list
    107.                         Dim page As PDPage = CType(pages.get(pageNumber), PDPage)
    108.                         Dim dest As PDPageFitWidthDestination = New PDPageFitWidthDestination()
    109.                         dest.setPage(page)
    110.                         'Then set bookmark to it
    111.                         Dim bookmark As PDOutlineItem = New PDOutlineItem()
    112.                         bookmark.setDestination(dest)
    113.                         bookmark.setTitle(bookmarkTitle)
    114.                         'Add this bookmark to the document's outline
    115.                         pagesOutline.appendChild(bookmark)
    116.                     Next i
    117.                     'Expand the bookmark tree
    118.                     pagesOutline.openNode()
    119.                     outline.openNode()
    120.                     'Save the the document to a file
    121.                     PdfDoc.save(outFile)
    122.                     result = True
    123.                 Else
    124.                     WriteToLog("Can't add bookmarks to <" & pdfFile & "> because the document is encrypted.")
    125.                 End If
    126.             Else
    127.                 WriteToLog("Can't add bookmarks to <" & pdfFile & "> because BookmarkTable has no data.")
    128.             End If
    129.         Catch ex As Exception
    130.             WriteToLog("AddBookmarks(" & pdfFile & "): " & ex.Message)
    131.             Return False
    132.         Finally
    133.             If Not PdfDoc Is Nothing Then
    134.                 PdfDoc.close()
    135.             End If
    136.         End Try
    137.         Return result
    138.     End Function

    The full source code is attached (it's a console application)
    Attached Files Attached Files
    Last edited by stanav; Jun 26th, 2007 at 08:33 AM.

  2. #2
    Hyperactive Member
    Join Date
    Mar 2006
    Posts
    413

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It

    This looks like its going to save a lot of time.

    Thanks!!!!!!!!!!!
    Visual Studio .NET 2005/.NET Framework 2.0

  3. #3
    New Member
    Join Date
    May 2007
    Posts
    1

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It

    I tried implementing this in VB.Net console application but got the following error when running the application. The error occurred at the mergeDocuments call.

    Error: destination PDF is encrypted, can't append encrypted PDF documents.

    I used LinkedLists instead of List and modified the code to work for this collection type.

    Could you tell me what may be going wrong. Do I need to give rights to some user/group on the source/destination folders?

    Thanks.

  4. #4
    New Member
    Join Date
    May 2007
    Posts
    1

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It

    Hi,
    I get an exception (NullReferenceException-object reference not set to an instance of an object) at mergeDocuments() of the PDFMergerUtility. Here is my code

    Imports System.IO
    Imports org.pdfbox.pdmodel
    Imports org.pdfbox.util
    Imports org.pdfbox.pdmodel.interactive.documentnavigation.destination
    Imports org.pdfbox.pdmodel.interactive.documentnavigation.outline
    Imports java.util

    Module Module1

    Sub Main()

    'Create a pdf file list and add files to it
    Dim pdfList(2) As String
    pdfList(0) = "C:\reports\pdfFile1.pdf"
    pdfList(1) = "C:\reports\pdfFile2.pdf"

    Dim outFile As String = "C:\MergedPdf\temp_myMergedPdf.pdf"

    'Try to merge the pdf files
    If MergePdfFiles(pdfList, outFile) Then
    Console.WriteLine(" The files were merged!")
    Console.ReadLine()

    End If
    End Sub

    Private Function MergePdfFiles(ByVal pdfFileList As Array, _
    ByVal outputFileFullName As String)
    Dim result As Boolean = False

    Dim fileCount As Integer = 2
    If fileCount > 1 Then
    Try
    'Instantiate an instance of Pdf Merger Utility
    Dim pdfMerger As New PDFMergerUtility
    With pdfMerger
    'Set output destination
    .setDestinationFileName(outputFileFullName)
    'Looping thru the file list and add source to the merger
    For i As Integer = 0 To fileCount - 1 Step 1
    .addSource(pdfFileList(i))
    Next i
    'Merge the documents
    pdfMerger.mergeDocuments()
    result = True
    End With
    Catch ex As Exception
    End Try
    End If
    Return result
    End Function

    Now here's the catch....when i converted this application to vb.net 2.0 running in another system, the above code worked!

    Where did i go wrong??!!!

  5. #5

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It

    Quote Originally Posted by tzmjoseph
    I tried implementing this in VB.Net console application but got the following error when running the application. The error occurred at the mergeDocuments call.

    Error: destination PDF is encrypted, can't append encrypted PDF documents.

    I used LinkedLists instead of List and modified the code to work for this collection type.

    Could you tell me what may be going wrong. Do I need to give rights to some user/group on the source/destination folders?

    Thanks.
    The error itself explains it all... It appears that one of your pdf files is either encrypted or password protected, and PDFBox can't read that file.
    As for file access permission, it should be just standard stuff. That is, the account that runs the code needs to have read permission to read a file, and write permission to a folder to write the output file... If both the input files and output file reside in the same folder then the account running the code need to have both read and write permission to that folder.
    Last edited by stanav; Jun 11th, 2007 at 02:51 PM.

  6. #6
    Fanatic Member vijy's Avatar
    Join Date
    May 2007
    Location
    India
    Posts
    542

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    Hi Stanav,
    I am using the PDFBox to merge a list of pdf files..
    I refgerred all the com u mentioned..
    Am getting a error in the

    PdfMerger.MergeDocuments()

    Error::: Expected an integer type, actual='BC 3 s# \ C# o &} 42 +5C \C '


    Here is the code..

    Code:
        Private Sub frmMergingPdf_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            'Create a pdf file list and add files to it
            Dim pdfList As New List(Of String)
            pdfList.Add("d:\file1.pdf")
            pdfList.Add("d:\file2.pdf")
            Dim outFile As String = "d:\Pdf.pdf"
            MergePdfFiles(pdfList, outFile)
        End Sub
    
        Private Function MergePdfFiles(ByVal pdfFileList As List(Of String), ByVal outputFileFullName As String)
            Dim result As Boolean = False
            Dim fileCount As Integer = 2
            If fileCount > 1 Then
                Try
                    'Instantiate an instance of Pdf Merger Utility
                    Dim pdfMerger As New PDFMergerUtility
                    With pdfMerger
                        .setDestinationFileName(outputFileFullName)
                        For i As Integer = 0 To fileCount - 1 Step 1
                            .addSource(pdfFileList(i))
                        Next i
                        .mergeDocuments() 'Here am getting that error
                       result = True
                    End With
                Catch ex As Exception
                End Try
            End If
            Return result
        End Function
    Visual Studio.net 2010
    If this post is useful, rate it


  7. #7

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    @viji: Sometimes PdfBox encounter internal errors beyond what I can fix (such as the one you're having; mergeDocuments() is a public member of PDFMergerUtility class and we have no control over it). A better alternative is to use iTextSharp. It's faster and more reliable.

  8. #8
    New Member
    Join Date
    Sep 2008
    Posts
    3

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    This is a great tool, can anyone show me how to link this to an excel spreadsheet that has rows of pdf files to be merged.

    I am new to VB but pretty experienced in programing so even getting me started in the right direction would be a big help.

    Thanks,

    Will

  9. #9

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    You'd use ADO.Net to read the data from your Excel file. There are plenty of examples on that on this website. Just search for something like "Excel ADO.Net" and you should get some hits. Once you've read the data from your xls file into your program, it's just a matter of building a list of files to be merged and pass it to the merge function.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  10. #10
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    CT
    Posts
    15,754

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    PDFBox seems like an interesting product...

    Do you use it to initially create the PDF documents that you talk about in the first post here?

    What else can the PDFBox product do?

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  11. #11

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    Quote Originally Posted by szlamany
    PDFBox seems like an interesting product...

    Do you use it to initially create the PDF documents that you talk about in the first post here?

    What else can the PDFBox product do?
    PDFBox is used mainly for creating and manipulating pdf files on the fly. It's a pretty good product. However, I like iText/iTextSharp better because it is faster and doesn't add another 16MB of dependencies to my application as PDFBox does.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  12. #12
    New Member
    Join Date
    Sep 2008
    Posts
    3

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    I am having a problem using the code you show above. When i am merging two of the same file i have no problem but if i try to merge two different pdf files, PDFBox throws an exception and only the temp file is made.

    The exception that is thrown is COSVisitorException.

    What is the deal why can i merge two of the same file but have problems if i try to merge differnt files.

  13. #13

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,238

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    That exception is thrown by PDFBox itself, not by my code. My recommendation is to use iTextSharp instead since I find iTextSharp is faster and more reliable for creating and manipulating pdf files. Also the iTextSharp's footprint is a lot smaller than PDFBox. I myself have stopped using PDFBox, and also converted all of my programs that use PDFBox to use iTextSharp.

    Search this forum. I do have a thread or two on manipulating pdf files using iTextSharp.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  14. #14
    New Member
    Join Date
    Dec 2008
    Posts
    1

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    I wanted to thank you soooo much for this post!! I have been searching for weeks on how to merge an unknown number of pdf files and your code led me right down that path.

    Thanks again

  15. #15
    New Member
    Join Date
    Mar 2009
    Posts
    1

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    This works perfectly on my local machine but when I moved the executable file and the .dll's to our production server the merged PDF has an error message when I open it. It says, "Could not find the XObject named 'XIPLAYER0'." Does anyone know why?

    Any help or guidance I can get, would be appreciated.
    Thanks!

  16. #16
    New Member
    Join Date
    Apr 2009
    Posts
    1

    Exclamation Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    I am adding the script in the script task of SSIS. I don't know how to add reference to following in my project.

    IKVM.GNU.Classpath
    IKVM.Runtime
    PDFBox-0.7.3
    Thanks,
    Arch

  17. #17
    New Member
    Join Date
    Sep 2009
    Posts
    8

    Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)

    This is great!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width

Survey posted by VBForums.