Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by blofvendahl
Hey Stanav,
Which method in your class, if any, can be used to extract bookmark info from a pdf?
thanks
Brian
You can use the SimpleBookmark class to extract all the bookmarks in a pdf and export it to an XML file if you want to. Here's how you do it
Code:
Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
Dim result as Boolean = False
Try
Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
Using outFile As New IO.StreamWriter(outputXML)
SimpleBookmark.ExportToXML(bookmarks, outFile, "ISO8859-1", True)
End Using
reader.Close()
result = True
Catch ex As Exception
Throw New ApplicationException(ex.Message, ex)
End Try
Return result
End Function
I'm also working on a method to merge pdf files with all bookmarks preserved. However, it works only with bookmarks that use the page number as the destination. Bookmarks that use named destination get broken after merged (that is you still see all the bookmarks but it doesn't work (go to a destination) when clicked on). That's why I'm not posting the solution yet.
Last edited by stanav; Oct 12th, 2010 at 07:45 AM.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi Stanav,
I would like to know if how can i set 10 .jpg in one pdf? and what will be the posible code that i am going to use. I am using asp.net using server side.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by cyberstaind
Hi Stanav,
I would like to know if how can i set 10 .jpg in one pdf? and what will be the posible code that i am going to use. I am using asp.net using server side.
Regards,
Staind
Your question is not related to the current thread at all. Please make a new post in VB.Net forum. Make sure you describe the question clearly too.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by stanav
You can use the SimpleBookmark class to extract all the bookmarks in a pdf and export it to an XML file if you want to. Here's how you do it
Code:
Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
Dim result as Boolean = False
Try
Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
Using outFile As New IO.StreamWriter(outputXML)
SimpleBookmark.ExportToXML(bookmarks, outFile, "ISO8859-1", True)
End Using
reader.Close()
result = True
Catch ex As Exception
Throw New ApplicationException(ex.Message, ex)
End Try
Return result
End Function
I'm also working on a method to merge pdf files with all bookmarks preserved. However, it works only with bookmarks that use the page number as the destination. Bookmarks that use named destination get broken after merged (that is you still see all the bookmarks but it doesn't work (go to a destination) when clicked on). That's why I'm not posting the solution yet.
Thanks Stanav. The SimpleBookmark solution worked great. I'll keep checking back for your method to merge PDF's and all bookmarks. That'll really come in handy
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
This is the updated method for merging pdf files with all the bookmarks preserved. It is also available in the PdfManipulation2 class.
Code:
Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
Dim result As Boolean = False
Dim pdfCount As Integer = 0 'total input pdf file count
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim pdfDoc As iTextSharp.text.Document = Nothing 'the output pdf document
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim pageCount As Integer = 0 'number of pages in the current pdf
Dim totalPages As Integer = 0 'number of pages so far in the merged pdf
Dim bookmarks As New System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object))
Dim tempBookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = Nothing
' Must have more than 1 source pdf's to merge
If sourcePdfs.Length > 1 Then
Try
For i As Integer = 0 To sourcePdfs.GetUpperBound(0)
reader = New iTextSharp.text.pdf.PdfReader(sourcePdfs(i))
reader.ConsolidateNamedDestinations()
pageCount = reader.NumberOfPages
tempBookmarks = SimpleBookmark.GetBookmark(reader)
If i = 0 Then
pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
pdfCpy = New iTextSharp.text.pdf.PdfCopy(pdfDoc, New System.IO.FileStream(outputPdf, IO.FileMode.Create))
pdfDoc.Open()
totalPages = pageCount
Else
If tempBookmarks IsNot Nothing Then
SimpleBookmark.ShiftPageNumbers(tempBookmarks, totalPages, Nothing)
End If
totalPages += pageCount
End If
If tempBookmarks IsNot Nothing Then
bookmarks.AddRange(tempBookmarks)
End If
For n As Integer = 1 To pageCount
page = pdfCpy.GetImportedPage(reader, n)
pdfCpy.AddPage(page)
Next
reader.Close()
Next
pdfCpy.Outlines = bookmarks
pdfDoc.Close()
result = True
Catch ex As Exception
Throw New ApplicationException(ex.Message, ex)
End Try
End If
Return result
End Function
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Dear Team,
I splitted the PDF successfully with your code. But in the original pdf file contains some hyperlinks in that PDF. But when split the PDF, the Hyperlinks are removed. Is there any possibility to stayed back the hyperlinks in the splitted PDF???
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by prabakarank
Dear Team,
I splitted the PDF successfully with your code. But in the original pdf file contains some hyperlinks in that PDF. But when split the PDF, the Hyperlinks are removed. Is there any possibility to stayed back the hyperlinks in the splitted PDF???
Please help me regarding this issue.
Can you upload a sample pdf so that I can test I out myself? I'm not promising anything, but if I have a sample file and figure out what the problem is, I may or may not be able to find a solution for you.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by prabakarank
Hi..
I have attached the Sample PDF for your reference. In this pdf, it contains the two hyperlinks.
If we splitted that pdf, that hyperlink is removed.
The sample pdf you uploaded has only 1 page with no hyper links. It also appears to me that this is a scanned pdf (one that is created by scanning a document through a scanner) - You cannot do much with this kind of pdf files.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi,
In this PDF, there is a word contains at the bottom of the page(www.craneyhill.com). If you click that it will open the site of craneyhill.
At the same time, in right side of the page, there is one logo(CRANNEY HILL KENNEL). If you click that logo, it will also open the site. Please check this.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi.
I use the below code for checking any annotation link is there or not.
In previous thread i attached 2.pdf. When it parsed, that page contains two
Link annotation. After splitted, it does not have hyperlinks(Linke annotation does not persist). Please help me..its is urgent.
PdfReader reader = new PdfReader(sourcePdf);
FileStream fs = new FileStream(outputPdf, System.IO.FileMode.Open, System.IO.FileAccess.Write);
PdfStamper stamper = new PdfStamper(reader, fs);
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Your sample pdf has only 1 page... How am I supposed to test splitting it? The only option for me to test splitting this file is to use the SplitByPages method and specify the number of page to split = 1. The hyperlinks work fine after splitted. For further testing, you need to provide me a sample file with more than 1 page.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Stanav,
Thanks for posting this! I have a question if you have time...
In your Public Shared Function InsertPages, you mention "To create the pagesToInsert dictionary, you can use the iTextSharp.text.pdf.PdfCopy class to open an existing pdf file and call the GetImportedPage method".
I am new to this, can you please provide an example of that.
I have been trying to do it but I keep getting an error that my PDF is in use.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by blinsner
Stanav,
Thanks for posting this! I have a question if you have time...
In your Public Shared Function InsertPages, you mention "To create the pagesToInsert dictionary, you can use the iTextSharp.text.pdf.PdfCopy class to open an existing pdf file and call the GetImportedPage method".
I am new to this, can you please provide an example of that.
I have been trying to do it but I keep getting an error that my PDF is in use.
Thanks,
Brian
OK... Supposed you have a pdf file named "pdf1" which you want to insert some pages into it. Those pages are in another pdf file called "pdf2". So you need to get the pages you need from pdf2, add it to a dictionary and then call InsertPages method to insert these pages from pdf2 into pdf1.
1. Let's say you need pages 2, 3 and 5 from pdf2 and to be inserted as page 6, 9 and 4 in pdf1. So the 1st thing you need is to create that dictionary
Code:
'Create the dictionary
Dim pdf2 As String = "path to your pdf2 file here"
Dim reader2 As New iTextSharp.text.pdf.PdfReader(pdf2)
Dim doc2 As New iTextSharp.text.Document(reader2.GetPageSizeWithRotation(1))
Dim pdfCpy As New iTextSharp.text.pdf.PdfCopy(doc2, New IO.MemoryStream())
Dim pageDict As New Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage)
'Get page 2, 3, and 5 from pdf2 and add it to the dictionary with key 6, 9 and 4
pageDict.Add(6, pdfCpy.GetImportedPage(reader2, 2))
pageDict.Add(9, pdfCpy.GetImportedPage(reader2, 3))
pageDict.Add(4, pdfCpy.GetImportedPage(reader2, 5))
'Insert those pages into pdf1
Dim pdf1 as string = "path to your pdf1 here"
Dim output as string = "path to the output pdf here"
PdfManipulation.InsertPages(pdf1, pageDict, output)
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi stanav,
Thanks.
I want to upload a new file. But i dont want to split the file, at the same time i want to set the password for that file.
How can i achieved this?
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by moti barski
can it add pictures to into an existing pdf ? if so walkthrough please
If you had read the original post (post#1), you should have seen the list of available methods the PdfManipulation2 class has. Among those methods, you should have spotted the AddImageToPage method which is probably what you need.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
is it needed to download iTextSharp to work the pdf classes ?
also :
Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)
can you examplify ? (implementation)
Last edited by moti barski; Mar 30th, 2011 at 01:27 PM.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by moti barski
is it needed to download iTextSharp to work the pdf classes ?
also :
can you examplify ? (implementation)
The thread title has the phrase " Using iTextSharp", so I think that should already answer your question. However, I just want to confirm it again: yes, you will need to download iTextSharp and reference itextsharp.dll in your project to use the code.
As for giving an example on calling a method, you simply call the method and pass in the required arguments. That's it.
What are the required arguments? Anything that is not optional.
1. sourcePdf: the full path to the source pdf file (the one that you want to add pictures to)
2. outputPdf: The full path to save the output pdf (pictures added pdf)
3. imgPath: The full path to the image (picture) file you want to use to add to the source pdf.
4. imgLocation: the (x, y) coordinate on the page where the picture should be placed - passed in as Point.
5. imgSize: how large the image will be sized to?
6. pages(): optional - the array of the page numbers to add the image to. If obmitted, the image will be added to every page in the pdf.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi Stanav,
In my application, i want to split the pdf. If i upload PDF contains 10 pages with file size 10 mb for split, After splitting the combine file size of each pdfs will result into above 20mb file size. If this possible to reduce the file size(each pdf).
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Originally Posted by tbird888
Hello Stanav,
I was wondering if there is a way to look at an image inside a PDF and pull its width and height in pixels in iTextSharp?
You can try extracting the images and get the Width and Height properties from them. In the PdfManipulation2 class, there is is function to extract images from a pdf.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Stanav, thank you for the reply. I'm using the ExtractImages function to pull the images like you suggested and am encountering an error with this line:
Dim img As Drawing.Image = Drawing.Image.FromStream(memStream)
The error that is shown during debugging is this:
Item = Argument not specified for parameter 'key' of 'Public Default Property Item(key As Object) As Object'.
Do you have any thoughts on what could be causing the error? Just above the Try...Catch block where the error occurred, the Byte array is being populated so something is breaking down inside the Using block.
I have parge PDF Files. 14000 pages or so. The Isharptext fails in
Public PdfDictionary() : base(DICTIONARY) {
hashmap =new Dictionary<PdfName,PdfObject>();
}
the error i get is "An Unhandled Exception of type 'System.StackOverflowException' Occuring in itextsharp.dll"
I know this is not your code but can you tell me where to get a resolution? It only occures on PDF's with large page counts.
I have parge PDF Files. 14000 pages or so. The Isharptext fails in
Public PdfDictionary() : base(DICTIONARY) {
hashmap =new Dictionary<PdfName,PdfObject>();
}
the error i get is "An Unhandled Exception of type 'System.StackOverflowException' Occuring in itextsharp.dll"
I know this is not your code but can you tell me where to get a resolution? It only occures on PDF's with large page counts.
What are you trying to do with those huge pdf files? Is it possible to split them into multiple smaller ones 1st then process these?
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
This Company sends us reports in PDF. I cant control the number of pages they put in a report. I have attemped to use The PDFManuipulation2.vb witch used thes itextsharp to extract them to single pages. Everything fails with the large files.. including spliting them? Stack overflow.. every time? Where do i go from here?
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
So you're splitting the original pdf into multiple 1 page pdf's, is that correct? If it is, then you should not have any problem, just need to change the way you read the original pdf. This should do it:
Code:
Public Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim doc As iTextSharp.text.Document = Nothing
Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Dim pageCount As Integer = 0
Try
raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
pageCount = reader.NumberOfPages
If pageCount < numOfPages Then
Throw New ArgumentException("Not enough pages in source pdf to split")
Else
Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
Dim outfile As String = String.Empty
Dim n As Integer = CInt(Math.Ceiling(pageCount / numOfPages))
Dim currentPage As Integer = 1
For i As Integer = 1 To n
outfile = baseNameOutPdf.Replace(ext, "_" & i & ext)
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
doc.Open()
If i < n Then
For j As Integer = 1 To numOfPages
page = pdfCpy.GetImportedPage(reader, currentPage)
pdfCpy.AddPage(page)
currentPage += 1
Next j
Else
For j As Integer = currentPage To pageCount
page = pdfCpy.GetImportedPage(reader, j)
pdfCpy.AddPage(page)
Next j
End If
doc.Close()
Next
End If
reader.Close()
Catch ex As Exception
Throw ex
End Try
End Sub
And you would the sub like this:
Code:
Dim sourcePdf as string = "path to your huge pdf file here"
Dim numOfPage = 1 '< 1 page per output pdf
Dim baseName as String = "Splitted-" '< the base file name for output pdf's.
SplitPdfByPages(sourcePdf, numOfPage, baseName)
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi Stanav,
Your code helps me a lot in my application. I am having a doubt in PDF hyperlinks. Please clarify me.
I have used your sample for splitting my PDF file. Consider a PDF file named sample.pdf containing 72 pages. This sample.pdf contains pages that have hyperlink that navigate to other page. Eg: In the page 4 there are three hyperlinks which when clicked navigates to corresponding 24th,27th,28th page. As same as the 4th page there are nearly 12 pages that is having this hyperlinks with them. Following your code I had splitted this PDF pages into 72 separate file and saved with the name as 1.pdf,2.pdf....72.pdf. So in the 4.pdf when clicking that hyperlinks I need to make the PDF navigate to 24.pdf,27.pdf,28.pdf. Please help me out here how can I set the hyperlinks in the 4.pdf so that it navigates to corresponding pdf files. i.e I need to edit the destination of the hyperlink and make it to point out to another destination or else i need to place some text (eg: pageTo:24) in the url of the link. Please help me.
NOTE: The link in the PDF holds the PRIndirectReference for linking within the pages. I have attached a sample PDF file(4.pdf). Please provide a sample code for this.
Thank you,
Ashok
Last edited by ashok.arumugam; Jul 6th, 2011 at 05:02 AM.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Hi Stanav,
Can you please help me to extract URLs from PRIndirectRefrence? The ExtractURLs function can able to fetch the URL from the link created with Anchor class. But I cannot able to extract the URLs or Link from the PRIndirectReference. I need to get the reference link and edit that link to navigate to another location. Eg: In the attached PDF 4.pdf there are 5 links(in the text 18,68,17,48,52) each navigating to the page no 18,page no 68... The 4.pdf is splitted from sample.pdf which holds 72 pages. Now once i splitted that PDF into 72 separate PDF files I need to navigate from one file to another. So that when clicking 18,68,17.. it must navigate to 18.pdf,68.pdf,17.pdf. Please help me to code how can I edit those links which are use PRIndirectRefrence.
I have been trying to get starting and ending page numbers for the lowest level bookmarks that exist, and then use them to extract those pages to a new PDF. I am able to get to the 1st level information, but I can not figure out how to dive deeper into the bookmark to get the kids.
i.e.
1st level bookmark 1
page number
2nd level bookmark 1
page number - 1st starting pdf extract page ** this is what I cannot figure out how to get to
3rd level bookmark 1
page number - 1st ending pdf extract page, also the 2nd starting pdf extract page ** this is what I cannot figure out how to get to
1st level bookmark 2
page number - 2nd ending pdf extract page
Last edited by kakahappns; Jul 22nd, 2011 at 05:59 AM.
Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp
Stanav,
Have you had any experience with the new PDF Portfolios (also known as Portable Collections)? I've been having trouble extracting more than the first page using iTextSharp (only recognizes the first page). Any thoughts?