Page 2 of 4 FirstFirst 1234 LastLast
Results 41 to 80 of 136

Thread: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

  1. #41

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by blofvendahl View Post
    Hey Stanav,

    Which method in your class, if any, can be used to extract bookmark info from a pdf?

    thanks
    Brian
    You can use the SimpleBookmark class to extract all the bookmarks in a pdf and export it to an XML file if you want to. Here's how you do it
    Code:
     Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
            Dim result as Boolean = False
            Try
                Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
                Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
                Using outFile As New IO.StreamWriter(outputXML)
                    SimpleBookmark.ExportToXML(bookmarks, outFile, "ISO8859-1", True)
                End Using
                reader.Close()
                result = True
            Catch ex As Exception
                Throw New ApplicationException(ex.Message, ex)
            End Try
            Return result
        End Function
    I'm also working on a method to merge pdf files with all bookmarks preserved. However, it works only with bookmarks that use the page number as the destination. Bookmarks that use named destination get broken after merged (that is you still see all the bookmarks but it doesn't work (go to a destination) when clicked on). That's why I'm not posting the solution yet.
    Last edited by stanav; Oct 12th, 2010 at 07:45 AM.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  2. #42
    New Member
    Join Date
    Oct 2010
    Posts
    1

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,

    I would like to know if how can i set 10 .jpg in one pdf? and what will be the posible code that i am going to use. I am using asp.net using server side.


    Regards,

    Staind

  3. #43

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by cyberstaind View Post
    Hi Stanav,

    I would like to know if how can i set 10 .jpg in one pdf? and what will be the posible code that i am going to use. I am using asp.net using server side.


    Regards,

    Staind
    Your question is not related to the current thread at all. Please make a new post in VB.Net forum. Make sure you describe the question clearly too.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  4. #44
    New Member
    Join Date
    Oct 2010
    Posts
    3

    Thumbs up Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by stanav View Post
    You can use the SimpleBookmark class to extract all the bookmarks in a pdf and export it to an XML file if you want to. Here's how you do it
    Code:
     Public Shared Function ExportBookmarksToXML(ByVal sourcePdf As String, ByVal outputXML As String) As Boolean
            Dim result as Boolean = False
            Try
                Dim reader As New iTextSharp.text.pdf.PdfReader(sourcePdf)
                Dim bookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = SimpleBookmark.GetBookmark(reader)
                Using outFile As New IO.StreamWriter(outputXML)
                    SimpleBookmark.ExportToXML(bookmarks, outFile, "ISO8859-1", True)
                End Using
                reader.Close()
                result = True
            Catch ex As Exception
                Throw New ApplicationException(ex.Message, ex)
            End Try
            Return result
        End Function
    I'm also working on a method to merge pdf files with all bookmarks preserved. However, it works only with bookmarks that use the page number as the destination. Bookmarks that use named destination get broken after merged (that is you still see all the bookmarks but it doesn't work (go to a destination) when clicked on). That's why I'm not posting the solution yet.

    Thanks Stanav. The SimpleBookmark solution worked great. I'll keep checking back for your method to merge PDF's and all bookmarks. That'll really come in handy

  5. #45

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    This is the updated method for merging pdf files with all the bookmarks preserved. It is also available in the PdfManipulation2 class.
    Code:
     Public Shared Function MergePdfFilesWithBookmarks(ByVal sourcePdfs() As String, ByVal outputPdf As String) As Boolean
            Dim result As Boolean = False
            Dim pdfCount As Integer = 0     'total input pdf file count
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim pdfDoc As iTextSharp.text.Document = Nothing    'the output pdf document
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
            Dim pageCount As Integer = 0    'number of pages in the current pdf
            Dim totalPages As Integer = 0   'number of pages so far in the merged pdf
            Dim bookmarks As New System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object))
            Dim tempBookmarks As System.Collections.Generic.List(Of System.Collections.Generic.Dictionary(Of String, Object)) = Nothing
            ' Must have more than 1 source pdf's to merge
            If sourcePdfs.Length > 1 Then
                Try
                    For i As Integer = 0 To sourcePdfs.GetUpperBound(0)
                        reader = New iTextSharp.text.pdf.PdfReader(sourcePdfs(i))
                        reader.ConsolidateNamedDestinations()
                        pageCount = reader.NumberOfPages
                        tempBookmarks = SimpleBookmark.GetBookmark(reader)
                        If i = 0 Then
                            pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
                            pdfCpy = New iTextSharp.text.pdf.PdfCopy(pdfDoc, New System.IO.FileStream(outputPdf, IO.FileMode.Create))
                            pdfDoc.Open()
                            totalPages = pageCount
                        Else
                            If tempBookmarks IsNot Nothing Then
                                SimpleBookmark.ShiftPageNumbers(tempBookmarks, totalPages, Nothing)
                            End If
                            totalPages += pageCount
                        End If
                        If tempBookmarks IsNot Nothing Then
                            bookmarks.AddRange(tempBookmarks)
                        End If
                        For n As Integer = 1 To pageCount
                            page = pdfCpy.GetImportedPage(reader, n)
                            pdfCpy.AddPage(page)
                        Next
                        reader.Close()
                    Next
                    pdfCpy.Outlines = bookmarks
                    pdfDoc.Close()
                    result = True
                Catch ex As Exception
                    Throw New ApplicationException(ex.Message, ex)
                End Try
            End If
            Return result
        End Function
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  6. #46
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Dear Team,

    I splitted the PDF successfully with your code. But in the original pdf file contains some hyperlinks in that PDF. But when split the PDF, the Hyperlinks are removed. Is there any possibility to stayed back the hyperlinks in the splitted PDF???

    Please help me regarding this issue.

  7. #47

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Dear Team,

    I splitted the PDF successfully with your code. But in the original pdf file contains some hyperlinks in that PDF. But when split the PDF, the Hyperlinks are removed. Is there any possibility to stayed back the hyperlinks in the splitted PDF???

    Please help me regarding this issue.
    Can you upload a sample pdf so that I can test I out myself? I'm not promising anything, but if I have a sample file and figure out what the problem is, I may or may not be able to find a solution for you.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  8. #48
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi..

    I have attached the Sample PDF for your reference. In this pdf, it contains the two hyperlinks.

    If we splitted that pdf, that hyperlink is removed.
    Attached Images Attached Images
    • File Type: pdf 2.pdf (176.1 KB, 578 views)

  9. #49

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi..

    I have attached the Sample PDF for your reference. In this pdf, it contains the two hyperlinks.

    If we splitted that pdf, that hyperlink is removed.
    The sample pdf you uploaded has only 1 page with no hyper links. It also appears to me that this is a scanned pdf (one that is created by scanning a document through a scanner) - You cannot do much with this kind of pdf files.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  10. #50
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,
    In this PDF, there is a word contains at the bottom of the page(www.craneyhill.com). If you click that it will open the site of craneyhill.

    At the same time, in right side of the page, there is one logo(CRANNEY HILL KENNEL). If you click that logo, it will also open the site. Please check this.

  11. #51
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi.

    I use the below code for checking any annotation link is there or not.
    In previous thread i attached 2.pdf. When it parsed, that page contains two
    Link annotation. After splitted, it does not have hyperlinks(Linke annotation does not persist). Please help me..its is urgent.

    PdfReader reader = new PdfReader(sourcePdf);
    FileStream fs = new FileStream(outputPdf, System.IO.FileMode.Open, System.IO.FileAccess.Write);
    PdfStamper stamper = new PdfStamper(reader, fs);

    PdfDictionary objPdfDictionary = reader.GetPageN(n);
    PdfArray annotarray = (PdfArray)PdfReader.GetPdfObject(objPdfDictionary.Get(PdfName.ANNOTS));
    if (annotarray != null && annotarray.Size > 0)
    {
    foreach (PdfIndirectReference annot in annotarray.ArrayList)
    {
    PdfDictionary annotationDic = (PdfDictionary)PdfReader.GetPdfObject(annot);
    PdfName subType = (PdfName)annotationDic.Get(PdfName.SUBTYPE);
    if (subType.Equals(PdfName.LINK))
    {
    }
    }

    }
    Last edited by prabakarank; Jan 28th, 2011 at 08:45 AM.

  12. #52

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Your sample pdf has only 1 page... How am I supposed to test splitting it? The only option for me to test splitting this file is to use the SplitByPages method and specify the number of page to split = 1. The hyperlinks work fine after splitted. For further testing, you need to provide me a sample file with more than 1 page.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  13. #53
    New Member
    Join Date
    Feb 2011
    Posts
    1

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Stanav,

    Thanks for posting this! I have a question if you have time...

    In your Public Shared Function InsertPages, you mention "To create the pagesToInsert dictionary, you can use the iTextSharp.text.pdf.PdfCopy class to open an existing pdf file and call the GetImportedPage method".

    I am new to this, can you please provide an example of that.
    I have been trying to do it but I keep getting an error that my PDF is in use.

    Thanks,
    Brian

  14. #54

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by blinsner View Post
    Stanav,

    Thanks for posting this! I have a question if you have time...

    In your Public Shared Function InsertPages, you mention "To create the pagesToInsert dictionary, you can use the iTextSharp.text.pdf.PdfCopy class to open an existing pdf file and call the GetImportedPage method".

    I am new to this, can you please provide an example of that.
    I have been trying to do it but I keep getting an error that my PDF is in use.

    Thanks,
    Brian
    OK... Supposed you have a pdf file named "pdf1" which you want to insert some pages into it. Those pages are in another pdf file called "pdf2". So you need to get the pages you need from pdf2, add it to a dictionary and then call InsertPages method to insert these pages from pdf2 into pdf1.
    1. Let's say you need pages 2, 3 and 5 from pdf2 and to be inserted as page 6, 9 and 4 in pdf1. So the 1st thing you need is to create that dictionary
    Code:
    'Create the dictionary
            Dim pdf2 As String = "path to your pdf2 file here"
            Dim reader2 As New iTextSharp.text.pdf.PdfReader(pdf2)
            Dim doc2 As New iTextSharp.text.Document(reader2.GetPageSizeWithRotation(1))
            Dim pdfCpy As New iTextSharp.text.pdf.PdfCopy(doc2, New IO.MemoryStream())
            Dim pageDict As New Dictionary(Of Integer, iTextSharp.text.pdf.PdfImportedPage)
            'Get page 2, 3, and 5 from pdf2 and add it to the dictionary with key 6, 9 and 4
            pageDict.Add(6, pdfCpy.GetImportedPage(reader2, 2))
            pageDict.Add(9, pdfCpy.GetImportedPage(reader2, 3))
            pageDict.Add(4, pdfCpy.GetImportedPage(reader2, 5))
            
    'Insert those pages into pdf1
    Dim pdf1 as string = "path to your pdf1 here"
    Dim output as string = "path to the output pdf here"
    PdfManipulation.InsertPages(pdf1, pageDict, output)
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  15. #55
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi,

    My file is Password Protected File. But i dono the password. I want to split that Password Protected PDF?

    How do i achieved this?

  16. #56

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi,

    My file is Password Protected File. But i dono the password. I want to split that Password Protected PDF?

    How do i achieved this?
    This is against the forum's AUP and thus we should not discuss it here. Why can't you get the password from the creator of that pdf?
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  17. #57
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi stanav,
    Thanks.
    I want to upload a new file. But i dont want to split the file, at the same time i want to set the password for that file.
    How can i achieved this?

  18. #58
    Banned
    Join Date
    Mar 2009
    Posts
    764

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    can it add pictures to into an existing pdf ? if so walkthrough please

  19. #59

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by moti barski View Post
    can it add pictures to into an existing pdf ? if so walkthrough please
    If you had read the original post (post#1), you should have seen the list of available methods the PdfManipulation2 class has. Among those methods, you should have spotted the AddImageToPage method which is probably what you need.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  20. #60
    Banned
    Join Date
    Mar 2009
    Posts
    764

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    is it needed to download iTextSharp to work the pdf classes ?
    also :
    Public Shared Sub AddImageToPage(ByVal sourcePdf As String, ByVal outputPdf As String, ByVal imgPath As String, ByVal imgLocation As Point, ByVal imgSize As Size, Optional ByVal pages() As Integer = Nothing)
    can you examplify ? (implementation)
    Last edited by moti barski; Mar 30th, 2011 at 01:27 PM.

  21. #61

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by moti barski View Post
    is it needed to download iTextSharp to work the pdf classes ?
    also :


    can you examplify ? (implementation)
    The thread title has the phrase " Using iTextSharp", so I think that should already answer your question. However, I just want to confirm it again: yes, you will need to download iTextSharp and reference itextsharp.dll in your project to use the code.

    As for giving an example on calling a method, you simply call the method and pass in the required arguments. That's it.
    What are the required arguments? Anything that is not optional.
    1. sourcePdf: the full path to the source pdf file (the one that you want to add pictures to)
    2. outputPdf: The full path to save the output pdf (pictures added pdf)
    3. imgPath: The full path to the image (picture) file you want to use to add to the source pdf.
    4. imgLocation: the (x, y) coordinate on the page where the picture should be placed - passed in as Point.
    5. imgSize: how large the image will be sized to?
    6. pages(): optional - the array of the page numbers to add the image to. If obmitted, the image will be added to every page in the pdf.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  22. #62
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,
    In my application, i want to split the pdf. If i upload PDF contains 10 pages with file size 10 mb for split, After splitting the combine file size of each pdfs will result into above 20mb file size. If this possible to reduce the file size(each pdf).

    Please let me know.

    Thanks in advance

  23. #63
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,

    Please give your skype id for contacting regarding the PDF Split.

    Thanks in advance.

  24. #64
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi, Stanav,

    Is this possible to read the annotation from the pdf using iTextSharp. Please do the needful.

    Thanks in advance

  25. #65
    New Member
    Join Date
    May 2011
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hello Stanav,

    I was wondering if there is a way to look at an image inside a PDF and pull its width and height in pixels in iTextSharp?

  26. #66

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by tbird888 View Post
    Hello Stanav,

    I was wondering if there is a way to look at an image inside a PDF and pull its width and height in pixels in iTextSharp?
    You can try extracting the images and get the Width and Height properties from them. In the PdfManipulation2 class, there is is function to extract images from a pdf.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  27. #67
    New Member
    Join Date
    May 2011
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Stanav, thank you for the reply. I'm using the ExtractImages function to pull the images like you suggested and am encountering an error with this line:

    Dim img As Drawing.Image = Drawing.Image.FromStream(memStream)

    The error that is shown during debugging is this:
    Item = Argument not specified for parameter 'key' of 'Public Default Property Item(key As Object) As Object'.

    Do you have any thoughts on what could be causing the error? Just above the Try...Catch block where the error occurred, the Byte array is being populated so something is breaking down inside the Using block.

  28. #68
    Junior Member
    Join Date
    Aug 2010
    Posts
    19

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,

    I need to convert a single page pdf in to image using iTextSharp. Is this possible? If so, please give me sample code.

    Thanks

  29. #69

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by prabakarank View Post
    Hi Stanav,

    I need to convert a single page pdf in to image using iTextSharp. Is this possible? If so, please give me sample code.

    Thanks
    No, it's not possible with iTextSharp.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  30. #70
    New Member
    Join Date
    May 2011
    Posts
    2

    Large PDF Failing

    I have parge PDF Files. 14000 pages or so. The Isharptext fails in
    Public PdfDictionary() : base(DICTIONARY) {
    hashmap =new Dictionary<PdfName,PdfObject>();
    }

    the error i get is "An Unhandled Exception of type 'System.StackOverflowException' Occuring in itextsharp.dll"

    I know this is not your code but can you tell me where to get a resolution? It only occures on PDF's with large page counts.

  31. #71

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: Large PDF Failing

    Quote Originally Posted by craigison View Post
    I have parge PDF Files. 14000 pages or so. The Isharptext fails in
    Public PdfDictionary() : base(DICTIONARY) {
    hashmap =new Dictionary<PdfName,PdfObject>();
    }

    the error i get is "An Unhandled Exception of type 'System.StackOverflowException' Occuring in itextsharp.dll"

    I know this is not your code but can you tell me where to get a resolution? It only occures on PDF's with large page counts.
    What are you trying to do with those huge pdf files? Is it possible to split them into multiple smaller ones 1st then process these?
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  32. #72
    New Member
    Join Date
    May 2011
    Posts
    2

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    This Company sends us reports in PDF. I cant control the number of pages they put in a report. I have attemped to use The PDFManuipulation2.vb witch used thes itextsharp to extract them to single pages. Everything fails with the large files.. including spliting them? Stack overflow.. every time? Where do i go from here?

  33. #73

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    So you're splitting the original pdf into multiple 1 page pdf's, is that correct? If it is, then you should not have any problem, just need to change the way you read the original pdf. This should do it:
    Code:
     Public Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
            Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim doc As iTextSharp.text.Document = Nothing
            Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Dim pageCount As Integer = 0
    
            Try
                raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
                reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
                pageCount = reader.NumberOfPages
                If pageCount < numOfPages Then
                    Throw New ArgumentException("Not enough pages in source pdf to split")
                Else
                    Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
                    Dim outfile As String = String.Empty
                    Dim n As Integer = CInt(Math.Ceiling(pageCount / numOfPages))
                    Dim currentPage As Integer = 1
                    For i As Integer = 1 To n
                        outfile = baseNameOutPdf.Replace(ext, "_" & i & ext)
                        doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
                        pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
                        doc.Open()
                        If i < n Then
                            For j As Integer = 1 To numOfPages
                                page = pdfCpy.GetImportedPage(reader, currentPage)
                                pdfCpy.AddPage(page)
                                currentPage += 1
                            Next j
                        Else
                            For j As Integer = currentPage To pageCount
                                page = pdfCpy.GetImportedPage(reader, j)
                                pdfCpy.AddPage(page)
                            Next j
                        End If
                        doc.Close()
                    Next
                End If
                reader.Close()
            Catch ex As Exception
                Throw ex
            End Try
        End Sub
    And you would the sub like this:
    Code:
    Dim sourcePdf as string = "path to your huge pdf file here"
    Dim numOfPage = 1        '< 1 page per output pdf
    Dim baseName as String = "Splitted-"   '< the base file name for output pdf's.
    SplitPdfByPages(sourcePdf, numOfPage, baseName)
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  34. #74
    New Member
    Join Date
    Jul 2011
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,

    Your code helps me a lot in my application. I am having a doubt in PDF hyperlinks. Please clarify me.
    I have used your sample for splitting my PDF file. Consider a PDF file named sample.pdf containing 72 pages. This sample.pdf contains pages that have hyperlink that navigate to other page. Eg: In the page 4 there are three hyperlinks which when clicked navigates to corresponding 24th,27th,28th page. As same as the 4th page there are nearly 12 pages that is having this hyperlinks with them. Following your code I had splitted this PDF pages into 72 separate file and saved with the name as 1.pdf,2.pdf....72.pdf. So in the 4.pdf when clicking that hyperlinks I need to make the PDF navigate to 24.pdf,27.pdf,28.pdf. Please help me out here how can I set the hyperlinks in the 4.pdf so that it navigates to corresponding pdf files. i.e I need to edit the destination of the hyperlink and make it to point out to another destination or else i need to place some text (eg: pageTo:24) in the url of the link. Please help me.

    NOTE: The link in the PDF holds the PRIndirectReference for linking within the pages. I have attached a sample PDF file(4.pdf). Please provide a sample code for this.

    Thank you,
    Ashok
    Attached Images Attached Images
    • File Type: pdf 4.pdf (143.2 KB, 413 views)
    Last edited by ashok.arumugam; Jul 6th, 2011 at 05:02 AM.

  35. #75
    New Member
    Join Date
    Jul 2011
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,
    Can you please help me to extract URLs from PRIndirectRefrence? The ExtractURLs function can able to fetch the URL from the link created with Anchor class. But I cannot able to extract the URLs or Link from the PRIndirectReference. I need to get the reference link and edit that link to navigate to another location. Eg: In the attached PDF 4.pdf there are 5 links(in the text 18,68,17,48,52) each navigating to the page no 18,page no 68... The 4.pdf is splitted from sample.pdf which holds 72 pages. Now once i splitted that PDF into 72 separate PDF files I need to navigate from one file to another. So that when clicking 18,68,17.. it must navigate to 18.pdf,68.pdf,17.pdf. Please help me to code how can I edit those links which are use PRIndirectRefrence.

    Thank you,
    Ashok
    Attached Images Attached Images
    • File Type: pdf 4.pdf (143.2 KB, 379 views)

  36. #76
    New Member
    Join Date
    Jul 2011
    Posts
    2

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by vijy View Post
    Thanks stanav...
    yep i tried and i get...

    Splitting Code:
    1. Public Function SplitPdfFiles(ByVal iStartPage As String, ByVal iEndPage As String, ByVal sPDFPath As String) As Boolean
    2.         Try
    3.             'Variables to hold the split file informations
    4.            
    5.             Dim reader As PdfReader = New PdfReader(sPDFPath)
    6.             reader.RemoveUnusedObjects()
    7.             reader.ConsolidateNamedDestinations()
    8.  
    9.             Dim importedPage As PdfImportedPage = Nothing
    10.             Dim currentDocument As New Document
    11.             Dim pdfWriter As PdfSmartCopy = Nothing
    12.  
    13.            
    14.             Dim bIsFirst As Boolean = True
    15.             For j As Integer = iStartPage To iEndPage
    16.                 If bIsFirst Then
    17.                     bIsFirst = False
    18.                     currentDocument = New Document(reader.GetPageSizeWithRotation(1))
    19.                     pdfWriter = New PdfSmartCopy(currentDocument, New System.IO.FileStream(System.IO.Path.GetDirectoryName(sInFile) & "\" & sSplitName, System.IO.FileMode.Create))
    20.                     pdfWriter.SetFullCompression()
    21.                     ' pdfWriter.CompressionLevel = PdfStream.BEST_COMPRESSION
    22.                     pdfWriter.PdfVersion = reader.PdfVersion
    23.                     currentDocument.Open()
    24.                 End If
    25.  
    26.                 importedPage = pdfWriter.GetImportedPage(reader, j)
    27.                 pdfWriter.AddPage(importedPage)
    28.             Next
    29.  
    30.             Dim bookMark As New ArrayList
    31.             bookMark = SimpleBookmark.GetBookmark(reader)
    32.          
    33.             If bookMark IsNot Nothing Then
    34.                 SimpleBookmark.EliminatePages(bookMark, New Integer() {iEndPage + 1, reader.NumberOfPages})
    35.                 If iStartPage > 1 Then
    36.                     SimpleBookmark.EliminatePages(bookMark, New Integer() {1, iStartPage})
    37.                     SimpleBookmark.ShiftPageNumbers(bookMark, -(iStartPage - 1), Nothing)
    38.                 End If
    39.                 pdfWriter.Outlines = bookMark
    40.             End If
    41.             currentDocument.Close()
    42.             pdfWriter.Close()
    43.             Return True
    44.         Catch ex As Exception
    45.         End Try
    46.         Return False
    47.     End Function

    this one working fine.. and the pdf extracting with actual bookmarks..


    the problem is its preserving first level bookmarks.. Stanav, its possible to get atleast the child bookmarks collection..??
    Stanav, was there and update for the question about child bookmarks?

  37. #77
    Fanatic Member vijy's Avatar
    Join Date
    May 2007
    Location
    India
    Posts
    548

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by blofvendahl View Post
    Hey Stanav,

    Which method in your class, if any, can be used to extract bookmark info from a pdf?

    thanks
    Brian
    iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)

    see post #14:

    it will help..
    Visual Studio.net 2010
    If this post is useful, rate it


  38. #78
    New Member
    Join Date
    Jul 2011
    Posts
    2

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by vijy View Post
    iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)

    see post #14:

    it will help..
    vijy,

    I have been trying to get starting and ending page numbers for the lowest level bookmarks that exist, and then use them to extract those pages to a new PDF. I am able to get to the 1st level information, but I can not figure out how to dive deeper into the bookmark to get the kids.

    i.e.
    1st level bookmark 1
    page number
    2nd level bookmark 1
    page number - 1st starting pdf extract page ** this is what I cannot figure out how to get to
    3rd level bookmark 1
    page number - 1st ending pdf extract page, also the 2nd starting pdf extract page ** this is what I cannot figure out how to get to
    1st level bookmark 2
    page number - 2nd ending pdf extract page
    Last edited by kakahappns; Jul 22nd, 2011 at 05:59 AM.

  39. #79
    New Member
    Join Date
    May 2011
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Stanav,
    Have you had any experience with the new PDF Portfolios (also known as Portable Collections)? I've been having trouble extracting more than the first page using iTextSharp (only recognizes the first page). Any thoughts?

  40. #80
    New Member
    Join Date
    Sep 2006
    Posts
    8

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Hi Stanav,
    May be a stupid doubt, but.....
    Can I use the same code for a web application? coz, I think iTextSharp does not work on relative paths....

Page 2 of 4 FirstFirst 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width