VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
Hello all,
I was recently working on a job assignment dealing with pdf files. My company produces hundreds of daily reports in pdf format where each report is for a specific division/sub-company. Some top executives want to look at only a single report that contains all divisions/sub-companies instead of looking at each one seperately, so my job is to merge those reports together into a single pdf file with bookmarks for easy navigation. Originally, I had used Acrobat COM object approach but the management didn't want to spend $ to buy a full version of Adobe Acrobat for every PC that runs my program, so I had to rewrite without relying on Acrobat. I then found the open source PDFBox package which can be downloaded here... Once you had the package downloaded and unzipped to a directory in your local machine, you need to add the following references to your project:
Code:
IKVM.GNU.Classpath
IKVM.Runtime
PDFBox-0.7.3
To make the story short, here are the steps I did:
1. Create a list of pdf files to be merge.
2. Merge those pdf files into a temp file. The merging order will follow the order of the items in the list.
3. Create a data table to hold bookmark data. Each datarow contains the bookmark title and the page number it points to.
4. Open the merged temp file and insert bookmarks to it using info from the bookmark data table, then save it to a new file.
5. If all successful, delete the temp file
Code of interests:
vb Code:
Private Function MergePdfFiles(ByVal pdfFileList As List(Of String), _
ByVal outputFileFullName As String) As Boolean
Dim result As Boolean = False
Dim pdfMerger As PDFMergerUtility = Nothing
Dim fileCount As Integer = pdfFileList.Count
If fileCount > 1 Then
Try
'Instantiate an instance of Pdf Merger Utility
pdfMerger = New PDFMergerUtility()
With pdfMerger
'Set output destination
.setDestinationFileName(outputFileFullName)
'Looping thru the file list and add source to the merger
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It
I tried implementing this in VB.Net console application but got the following error when running the application. The error occurred at the mergeDocuments call.
Error: destination PDF is encrypted, can't append encrypted PDF documents.
I used LinkedLists instead of List and modified the code to work for this collection type.
Could you tell me what may be going wrong. Do I need to give rights to some user/group on the source/destination folders?
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It
Hi,
I get an exception (NullReferenceException-object reference not set to an instance of an object) at mergeDocuments() of the PDFMergerUtility. Here is my code
'Create a pdf file list and add files to it
Dim pdfList(2) As String
pdfList(0) = "C:\reports\pdfFile1.pdf"
pdfList(1) = "C:\reports\pdfFile2.pdf"
Dim outFile As String = "C:\MergedPdf\temp_myMergedPdf.pdf"
'Try to merge the pdf files
If MergePdfFiles(pdfList, outFile) Then
Console.WriteLine(" The files were merged!")
Console.ReadLine()
End If
End Sub
Private Function MergePdfFiles(ByVal pdfFileList As Array, _
ByVal outputFileFullName As String)
Dim result As Boolean = False
Dim fileCount As Integer = 2
If fileCount > 1 Then
Try
'Instantiate an instance of Pdf Merger Utility
Dim pdfMerger As New PDFMergerUtility
With pdfMerger
'Set output destination
.setDestinationFileName(outputFileFullName)
'Looping thru the file list and add source to the merger
For i As Integer = 0 To fileCount - 1 Step 1
.addSource(pdfFileList(i))
Next i
'Merge the documents
pdfMerger.mergeDocuments()
result = True
End With
Catch ex As Exception
End Try
End If
Return result
End Function
Now here's the catch....when i converted this application to vb.net 2.0 running in another system, the above code worked!
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It
Originally Posted by tzmjoseph
I tried implementing this in VB.Net console application but got the following error when running the application. The error occurred at the mergeDocuments call.
Error: destination PDF is encrypted, can't append encrypted PDF documents.
I used LinkedLists instead of List and modified the code to work for this collection type.
Could you tell me what may be going wrong. Do I need to give rights to some user/group on the source/destination folders?
Thanks.
The error itself explains it all... It appears that one of your pdf files is either encrypted or password protected, and PDFBox can't read that file.
As for file access permission, it should be just standard stuff. That is, the account that runs the code needs to have read permission to read a file, and write permission to a folder to write the output file... If both the input files and output file reside in the same folder then the account running the code need to have both read and write permission to that folder.
Last edited by stanav; Jun 11th, 2007 at 02:51 PM.
Private Sub frmMergingPdf_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
'Create a pdf file list and add files to it
Dim pdfList As New List(Of String)
pdfList.Add("d:\file1.pdf")
pdfList.Add("d:\file2.pdf")
Dim outFile As String = "d:\Pdf.pdf"
MergePdfFiles(pdfList, outFile)
End Sub
Private Function MergePdfFiles(ByVal pdfFileList As List(Of String), ByVal outputFileFullName As String)
Dim result As Boolean = False
Dim fileCount As Integer = 2
If fileCount > 1 Then
Try
'Instantiate an instance of Pdf Merger Utility
Dim pdfMerger As New PDFMergerUtility
With pdfMerger
.setDestinationFileName(outputFileFullName)
For i As Integer = 0 To fileCount - 1 Step 1
.addSource(pdfFileList(i))
Next i
.mergeDocuments() 'Here am getting that error
result = True
End With
Catch ex As Exception
End Try
End If
Return result
End Function
Visual Studio.net 2010
If this post is useful, rate it
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
@viji: Sometimes PdfBox encounter internal errors beyond what I can fix (such as the one you're having; mergeDocuments() is a public member of PDFMergerUtility class and we have no control over it). A better alternative is to use iTextSharp. It's faster and more reliable.
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
You'd use ADO.Net to read the data from your Excel file. There are plenty of examples on that on this website. Just search for something like "Excel ADO.Net" and you should get some hits. Once you've read the data from your xls file into your program, it's just a matter of building a list of files to be merged and pass it to the merge function.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
Originally Posted by szlamany
PDFBox seems like an interesting product...
Do you use it to initially create the PDF documents that you talk about in the first post here?
What else can the PDFBox product do?
PDFBox is used mainly for creating and manipulating pdf files on the fly. It's a pretty good product. However, I like iText/iTextSharp better because it is faster and doesn't add another 16MB of dependencies to my application as PDFBox does.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
I am having a problem using the code you show above. When i am merging two of the same file i have no problem but if i try to merge two different pdf files, PDFBox throws an exception and only the temp file is made.
The exception that is thrown is COSVisitorException.
What is the deal why can i merge two of the same file but have problems if i try to merge differnt files.
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
That exception is thrown by PDFBox itself, not by my code. My recommendation is to use iTextSharp instead since I find iTextSharp is faster and more reliable for creating and manipulating pdf files. Also the iTextSharp's footprint is a lot smaller than PDFBox. I myself have stopped using PDFBox, and also converted all of my programs that use PDFBox to use iTextSharp.
Search this forum. I do have a thread or two on manipulating pdf files using iTextSharp.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
I wanted to thank you soooo much for this post!! I have been searching for weeks on how to merge an unknown number of pdf files and your code led me right down that path.
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
This works perfectly on my local machine but when I moved the executable file and the .dll's to our production server the merged PDF has an error message when I open it. It says, "Could not find the XObject named 'XIPLAYER0'." Does anyone know why?
Any help or guidance I can get, would be appreciated.
Thanks!
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
Hello,
I have a requirement to spit existing pdf file into multiple pdf files. Say - one big pdf files contains 10 bills - i have to extract each bill spanning multiple pages into seperate bill.
I have information on which page each bill starts and which page bill ends.
The best friend of any programmer is a search engine
"Don't wish it was easier, wish you were better. Don't wish for less problems, wish for more skills. Don't wish for less challenges, wish for more wisdom" (J. Rohn)
“They did not know it was impossible so they did it” (Mark Twain)
Re: VB.Net - Merge Pdf Files and Add Bookmarks to It (Using PDFBox)
Originally Posted by KipoyRavena
Hi how to do it on visual basic 6.0?
Short answer: you don't.
Slightly longer answer: You might be able to if you install and build the .NET version and build it with the "Make COM Visible" (or something like that) option turned on. Then you might (might being the operative word) be able to reference it in VB6. But... that just feels like working with a house of cards.