This thread was originally about extracting and merging pdf files using iTextSharp. However, as time goes by, I have added a lot more code to do other stuff and put them all together into a handy class called PdfManipulation. There are 2 classes as below (choose the one that matches the iTextSharp version you're using):
1. The original PdfManipulation.vb class is coded based on itextsharp version 4. This class is obsolete and no longer maintained.
2. The updated PdfManipulation2.vb class is for the newer itextsharp version 5. This class also contains alot more methods than the original one and I highly recommend it over the old one. I will update this class from time to time to fix bugs and/or add more functionality. Consider it's a work in progress >>>> Last updated on 4/9/2012 <<<<
Please verify the version of iTextSharp you're using and download the correct class.
The current version of PdfManipulation2 class supports AES_256 encryption provided that your itextsharp.dll version is 5.1.x or higher.
Below is the list of public methods in the new PdfManipulation2 class
vb.net Code:
'Remove all restrictions from a pdf file
Public Shared Function RemoveRestrictions(ByVal restrictedPdf As String, Optional ByVal password As String = Nothing, Optional ByVal saveABackup As Boolean = True) As Boolean
'Parse text from a specified range of pdf pages
Public Shared Function ParsePdfText(ByVal sourcePDF As String, _
Optional ByVal fromPageNum As Integer = 0, _
Optional ByVal toPageNum As Integer = 0) As String
'Parse all text from a pdf
Public Shared Function ParseAllPdfText(ByVal sourcePDF As String) As Dictionary(Of Integer, String)
'Page to page comparision of 2 pdf files and write the differences to a resulting text file
Public Shared Sub ComparePdfs(ByVal pdf1 As String, ByVal pdf2 As String, _
ByVal resultFile As String, _
Optional ByVal fromPageNum As Integer = 0, _
Optional ByVal toPageNum As Integer = 0)
'Extract specified pages from a pdf to create a new pdf
Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
'Split a pdf into specified number of pdfs
Public Shared Sub SplitPdfByParts(ByVal sourcePdf As String, ByVal parts As Integer, ByVal baseNameOutPdf As String)
'Split a pdf into multiple pdfs each containing a specified number of pages.
Public Shared Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
'Extract pages from multiple source pdfs and merge into a final pdf
Public Shared Sub ExtractAndMergePdfPages(ByVal sourceTable As DataTable, ByVal outPdf As String)
'Set security password on an existing pdf file
Public Shared Sub SetSecurityPasswords(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal userPassword As String, ByVal ownerPassword As String)
'Add watermark to pdf pages using an image
Public Shared Sub AddWatermarkImage(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkImage As String)
'Add water mark to all pdf pages using text
Public Shared Sub AddWatermarkText(ByVal sourceFile As String, ByVal outputFile As String, ByVal watermarkText() As String, _
Optional ByVal watermarkFont As iTextSharp.text.pdf.BaseFont = Nothing, _
Optional ByVal watermarkFontSize As Single = 48, _
Optional ByVal watermarkFontColor As iTextSharp.text.BaseColor = Nothing, _
Optional ByVal watermarkFontOpacity As Single = 0.3F, _
Optional ByVal watermarkRotation As Single = 45.0F)
'Merge multiple pdfs into a single one.
Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String, _
Optional ByVal authorName As String = "", _
Optional ByVal creatorName As String = "", _
Optional ByVal subject As String = "", _
Optional ByVal title As String = "", _
Optional ByVal keywords As String = "") As Boolean
'Merge multiple pdf's into one with all bookmarks preserved
Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
'Add document outline (bookmarks) to a pdf
Public Shared Sub AddDocumentOutline(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal outlineTable As System.Data.DataTable)
'Extract urls from a pdf
Public Shared Function ExtractURLs(ByVal sourcePdf As String, Optional ByVal pageNumbers() As Integer = Nothing) As System.Data.DataTable
'Extract images from a pdf
Public Shared Function ExtractImages(ByVal sourcePdf As String) As List(Of Image)
'Fill a form
Public Shared Sub FillAcroForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String)
Public Shared Sub FillMyForm(ByVal sourcePdf As String, ByVal fieldData As DataRow, ByVal outputPdf As String)
'Add annotatation
Public Shared Sub AddTextAnnotation(ByVal sourcePdf As String, ByVal outputPdf As String)
Public Shared Function GetAcroFieldData(ByVal sourcePdf As String) As Dictionary(Of String, String)
Public Shared Function GetPdfSummary(ByVal sourcePdf As String) As DataTable
Public Shared Function ReplacePagesWithBlank(ByVal sourcePdf As String, _
ByVal pagesToReplace As List(Of Integer), _
ByVal outPdf As String, _
Optional ByVal templatePdf As String = "") As Boolean
Public Shared Function InsertPages(ByVal sourcePdf As String, _
ByVal pagesToInsert As Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage), _
ByVal outPdf As String) As Boolean
Public Shared Function RemovePages(ByVal sourcePdf As String, ByVal pagesToRemove As List(Of Integer), ByVal outputPdf As String) As Boolean
'A demo on how to draw various shapes in itextsharp
Public Shared Sub DrawShapesDemo(ByVal sourcePdf As String, ByVal outputPdf As String)
Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)
Any comments are welcomed.
Happy coding
Stanav.
Last edited by stanav; Apr 9th, 2012 at 02:36 PM.
Reason: New version of PdfManipulation2 class now supports AES-256 encryption
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Stanav ... thanks for posting these code samples. They helped me on a project that I am currently working on. I would like to request that you post another sample: I need to be able to extract specified pages from multiple documents & save them to one combined PDF. ie. take pages 3 & 7 from Doc1.pdf, 4-6 from Doc2.pdf & 1, 5 & 12 from Doc3.pdf and save them in Doc4.pdf Is this "do-able"?
Last edited by nbrege; Dec 14th, 2007 at 11:36 AM.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Yes, it's doable. However, I'm on vaction right now and I do not have access to my work computer which has all the needed tools to write code. What you can do right now is to create a function that returns a hashtable or a dictionary with the file names (string) being the keys and the pages to extract (integer array) being the values. Once you have this hashtable/dictionary, you can modify the ExtractPdfPage sub such that it will create a single new pdf file and then loop trhu the hashtable/dictionary to extract the pages and add them o the output pdf. It's just a matter of setting up the loop right such that in each loop, you read an entry and extract pages from that file.
If you can wait until later this week when I return to work, I can try to come up with something for you in code.
Best regards,
Stanav.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing. Enjoy the rest of your vacation...
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by nbrege
If you could post a quick code example when you get back that would help me immensely and may be of help to others trying to do the same thing. Enjoy the rest of your vacation...
I've added a method to do what you need. Since the total text is more than 1000 characters, I had to put all the code in to a class (PdfManipulation.vb) and post it as an attachment. Hope it helps.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by gaigoi113
Hi Stanav,
Do you have any code sample that will convert pdf to multipage tiff? - thanks
It's impossible to use iTextSharp to convert pdf to multipage tiff. However, you can use PDFBox to convert each pdf page to an image file (it only outputs to jpg's or png's), then merge these images into a multipage tiff.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi,
I'm trying to extract a single page from a multi page pdf and I'm using the code below; however, I'm getting an error that it's not recognizing <param name>. Any help would be great. Thanks.
Code:
''' <summary>
''' Extract a single page from source pdf to a new pdf
''' </summary>
<param name="sourcePdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"</param>
<param name="pageNumberToExtract">"P1T1"</param>
<param name="outPdf">"C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40a.pdf"</param>
''' <remarks></remarks>
Public Shared Sub ExtractPdfPage(ByVal sourcePdf As String, ByVal pageNumberToExtract As Integer, ByVal outPdf As String)
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim doc As iTextSharp.text.Document = Nothing
Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Try
reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
doc.Open()
page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
pdfCpy.AddPage(page)
doc.Close()
reader.Close()
Catch ex As Exception
Throw ex
End Try
End Sub
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Why are you putting your arguments in the code comments? That's not how you do it. You need to call the sub and pass in your arguments, something like this:
vb.net Code:
'Specified the path to the source pdf file
Dim sourcePdf as sgtring = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"
'Extract page # 2 off this above pdf file
Dim pageNumberToExtract As Integer = 2
'And then save it to a new pdf named 'table40_page2.pdf'
Dim outputPdf As String = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40_page2.pdf"
'Call the sub somewhere in your program passing in the above arguments
PdfManipulation.ExtractPdfPage("C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf", pageNumberToExtract, outputPdf)
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by slow&steady
Stanav :
i have tried itextsharp for putting watermark on pdfs.It worked fine.
Now i am trying to edit Header on existing pdf files to desired header.
Is it possible.
if its possible then i have to try to use it on the bunch of pdf files in one single folder
Thanks for the help
Sri
Yes, it's possible to add/change the header/footer of an existing pdf file and save the result to a new file. Please post your question in VB.Net forum because it's a different subject and doeasn't belong to this code bank thread.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by vijy
Hi Stanav,
its possible to extract the PDF pages with bookmarks?
Yes, I THINK it is quite possible, but it would involve much more work (obviously). I gave it a shot as seen in the code below but frankly, the method I was using only works to some extends. It only preserves the 1st level bookmarks . My approach was to export the bookmarks in the original pdf to a collection, and, select the pages to be extract from the reader, use pdfstamper to copy the original pdf (with now only the selected pages) to a new pdf. Since pdfstamper automatically preserves ALL the bookmarks from the original, I had to edit the bookmark collection to remove the unused ones. This approach should work but I don't know why it only preserves 1st level bookmarks. Some more work is needed to work that bug out, but I don't have the time right now. I will post just what I have so far.
vb.net Code:
''' <summary>
''' Extract pages from an existing pdf file to create a new pdf with bookmarks preserved
''' </summary>
''' <param name="sourcePdf">full path to sthe source pdf</param>
''' <param name="pageNumbersToExtract">an integer array containing the page number of the pages to be extracted</param>
''' <param name="outPdf">the full path to the output pdf</param>
''' <remarks></remarks>
Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim outlines As System.Collections.ArrayList = Nothing
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
Dim hshTable As System.Collections.Hashtable = Nothing
Try
raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
Dim value As String = DirectCast(bookmark.Item("Page"), String)
If Not String.IsNullOrEmpty(value) Then
Dim parts() As String = value.Split(" "c)
If parts.Length > 0 Then
Dim pageNum As Integer = -1
If Integer.TryParse(parts(0), pageNum) Then
Dim idx As Integer = System.Array.IndexOf(pagesToKeep, pageNum)
If idx < 0 Then
bookmarks.Remove(obj)
Else
parts(0) = (idx + 1).ToString
value = String.Join(" ", parts)
bookmark.Item("Page") = value
End If
End If
End If
End If
End If
End If
Next
End Sub
Another approach I thought of was to export the original bookmarks to an XML file and edit that file. Once done, import it back to the new pdf file (which contains only the extracted pages). But like I said, I'm currently donot have a lot of free time to play with it. So I leave it to you to try
Good luck.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi Stanav,
First nice work, you help me allot, wit you example but i have a question,
I'm using the "SplitPdfByPages" and is working ok, but is there any reason for the extraction pdf's end with a larger size that the original that as 5.pag?
Ex.:
Original pdf with 5.pag ( 72KB )
I extract the 5.pag with your example code, and etch pag ends with 85KB
Is there any way to compress the extraction pages? or some reason for this?
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi,
I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported".
Please give the solutions for the above problem. Please do the needful.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by prabakarank
Hi,
I have used "SplitPdfByPages" method. But i pass URLl(http://localhost:1870/PDFWCFService/1.pdf) for splitting...It returns following error "Uri format is not supported".
Please give the solutions for the above problem. Please do the needful.
You download the file and save it to a temp location 1st. After that, you can split it as usual. If you don't need the original pdf after done splitting, you can delete it.
To download a file from an url, you can use a WebClient or simply use
My.Computer.Network.DownloadFile(url, saveLocation).
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi ,
I need to pass the parameter like this ("http://localhost:1870/PDFWCFService/1.pdf",1,"http://localhost:1870/PDFWCFService/2.pdf") in the SplitPdfByPages method..
The output file in the format of URL.
It returns following error "Uri format is not supported".
Please give the solutions for the above problem. Please do the needful.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
You need to supply the physical file paths... There's no way around it because we rely on iTextSharp to do the work, and if iTextSharp doesn't support it, there's not much we can do to.
However, that is not a problem. The problem is with your methodology of doing things. While you can access (download) a file from an url, you cannot upload the file using an url. If you are to run the splitting task any PC, you will need to download the file to the local PC, split it and then upload it back. If you're to run that splitting task on the server that host your web site, you have to give it the direct physical paths and not the url's. You cannot treat an url the same as a conventional file path.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Below is the code. I converted from Vb.net to C#.
iTextSharp.text.pdf.PdfReader reader = null;
iTextSharp.text.Document doc = null;
iTextSharp.text.pdf.PdfCopy pdfCpy = null;
iTextSharp.text.pdf.PdfImportedPage page = null;
int pageCount = 0;
try
{
reader = new iTextSharp.text.pdf.PdfReader(sourcePdf);
pageCount = reader.NumberOfPages;
if (pageCount < numOfPages)
{
return -1;
throw new ArgumentException("Not enough pages in source pdf to split");
}
else
{
string ext = System.IO.Path.GetExtension(baseNameOutPdf);
string outfile = string.Empty;
int n = Convert.ToInt32(Math.Ceiling(Convert.ToDouble(pageCount) / Convert.ToDouble(numOfPages)));
int currentPage = 1;
for (int i = 1; i <= n; i++)
{
outfile = baseNameOutPdf.Replace(ext, "_" + i + ext);
doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));
//pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
//pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, System.Net.HttpWebRequest.Create(outfile).GetResponse().GetResponseStream());
doc.Open();
if (i < n)
{
for (int j = 1; j <= numOfPages; j++)
{
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi i uploaded the pdf file. please check the application with the PDF file.
This pdf file is 3 page pdf file. First page is successfully splitted. When second page split it gives the following error "Unable to cast object of type 'iTextSharp.text.pdf.PdfArray' to type 'iTextSharp.text.pdf.PRIndirectReference'."
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
I have used "itextsharp-5.0.2-dll" .
Please check with once again whether its working or not.. please be sure that
all splitted pdf files are created.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by prabakarank
Hi for me its not working.. Please tell me which version of iTextsharp dll u have used?
I have used "itextsharp-5.0.2-dll" .
Please check with once again whether its working or not.. please be sure that
all splitted pdf files are created.
I've uploaded the new PdfManipulation2 class which works with itextsharp 5.0.2.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by prabakarank
Hi..
I have one question. Is there any possible to set password for the each splitted pdf file.
Please tell me how we can do this.
I don't know anyway to set passwords to the splitted pdf's on the fly. However, you can certainly do it on a 2nd pass.
1st pass: split the pdf as usual.
2nd pass: use PdfEncryptor.Encrypt method to set the user and/or owner passwords to those newly spliited pdfs. You can do this in a separate method after done splitting or you can set the password to each splitted pdf right after it is created. The 2nd approach is preferred. It's just a few extra line of codes. If you have trouble figuring it out, let me know.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by prabakarank
Hi,
I got the below error.
"PdfReader not opened with owner password"
What we have to resolve the issue??
Thanks
1. You need to know the owner password of the pdf you're working on.
2. Use the 2nd overload of the PdfReader class contructor which allows you to supply the owner password as a byte array when you create a pdfreader object. Something like this:
Code:
Dim ownerPwd As String = "put the owner password here"
Dim pwdBytes() As Byte = System.Text.Encoding.Default.GetBytes(ownerPwd)
Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePDF, pwdBytes)
The rest of the code is the same.
3. If you forget the owner password for some reason, you will have to remove all restrictions on that pdf using the RemoveRestrictions method and save the new unrestricted pdf to a temp location. You then can work on that temporary unrestricted pdf as normal. When done, delete it if you don't want to keep it.
Last edited by stanav; Oct 8th, 2010 at 08:13 AM.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -