[VB.NET] Extract Pages and Split Pdf Files Using iTextSharp - Page 4-VBForums
Page 4 of 4 FirstFirst 1234
Results 121 to 132 of 132

Thread: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

  1. #121
    New Member
    Join Date
    Oct 2012
    Posts
    3

    removing restrictions

    I know this thread is a little old but I am hoping to get some help !

    I need to remove the restrictions from a PDF that is automatically generated from one of our systems. The software generating the PDF also generates a random 7 character password as the owner password which changes for each file. The user password is blank.

    I need to be able to change the files metadata to allow our PDF store to index the documents appropriately.

    If i use the restrictions remover in pdfmaniupation2.vb ( a great bit of code btw) then it does not remove the owner password, but does change all the permissions listed to allowed (apart from page extraction)

    When I use my code to change the metadata I get an exception "PdfReader not opened with owner password"

    my reading of the code in pdfmanipulation2.vb is that it should create a new PDf with the contents of the old, but it should have no encryption and have no restrictions - have I got this wrong ?, can anyone advise a better way of doing this ?

    thanks

  2. #122

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: removing restrictions

    Quote Originally Posted by doug_ecg View Post
    ....................
    my reading of the code in pdfmanipulation2.vb is that it should create a new PDf with the contents of the old, but it should have no encryption and have no restrictions - have I got this wrong ?, can anyone advise a better way of doing this ?

    thanks
    Yes, you got the wrong idea... Removing restrictions means just that. It won't remove the passwords.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  3. #123
    New Member
    Join Date
    Oct 2012
    Posts
    3

    Re: removing restrictions

    Quote Originally Posted by stanav View Post
    Yes, you got the wrong idea... Removing restrictions means just that. It won't remove the passwords.
    Ah indeed !

    Do you by any chance know of a way to alter the metadata without using the owner password ?

    With restrictions removed is it possible to copy the contents to a new PDF ?

    Thanks for your assistance !

    B

  4. #124

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Print it to a new PDF using a PDF print driver such as CutePDF... The printed version of the file (reads "new copy") won't have any passwords.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  5. #125
    New Member
    Join Date
    Oct 2012
    Posts
    3

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Thanks - that does work, I was rather hoping to handle it all within my application.

    Essentially I need to alter the document, add new metadata and then save it again. Rather irritatingly the company that produces the other piece of kit that generates the PDFs appears to use a random owner password ( and doesnt wish to change their software just so we can index the pdfs in ours)

    Does anyone by any chance know of a way to achieve this ?, if not I guess I shall have to get cutepdf, capture the created file and then work with that.

  6. #126

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by doug_ecg View Post
    Thanks - that does work, I was rather hoping to handle it all within my application.

    Essentially I need to alter the document, add new metadata and then save it again. Rather irritatingly the company that produces the other piece of kit that generates the PDFs appears to use a random owner password ( and doesnt wish to change their software just so we can index the pdfs in ours)

    Does anyone by any chance know of a way to achieve this ?, if not I guess I shall have to get cutepdf, capture the created file and then work with that.
    The custom version of Pdf Writer (reads "paid version") allows you to bypass the "save as" dialog window and thus you can silently print to pdf from your app.
    More info can be found from their web site:
    http://www.cutepdf.com/Solutions/pdfwriter.asp

    Truly, if this is for bussiness, the one time $500 price tag of the "Custom PDF Writer with programmatic access" package is justifiable.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  7. #127
    Frenzied Member
    Join Date
    Jul 2006
    Location
    MI
    Posts
    1,818

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Stanav ... your AddWatermarkText function requires both a source file and a destination file. Is it possible to rewrite this function to require only one file? In your current function, using the same file for both source & destination results in an error. I just want to specify a filename and the watermark text & have the function add the text to that file. Is this possible?

  8. #128

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    You have the source code and thus you can see how it works. If it doesn't meet your needs, feel free to modify it the way you want...
    Whenever you edir a pdf file, it has to be saved as a new file due to 2 reasons: 1. pdf files are not designed to be editable. 2. The source pdf file is being opened (since you're using it to edit), and thus the file is locked. You can't delete it until the file is closed. Now that you know this, it should be fairly straight forward to modify the existing code to do what you want... That is, instead of passing in a output file path, you declare this variable locally and generate a random temporary file name for it. When done adding the watermarks, after closing the original file, you delete it and move the temp file to replace the original file.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  9. #129
    New Member
    Join Date
    Dec 2013
    Posts
    1

    Re: [VB.NET] Pdf Manipulation Class Using iTextSharp

    Stanav,

    Great help in my project. Just running into a couple of small errors and wondered if you could point me right direction in fixing them.

    Error 1 Value of type '1-dimensional array of Byte' cannot be converted to 'iTextSharp.text.pdf.RandomAccessFileOrArray'. C:\VB.Net\PDFMerge_Window\PDFMerge_Console\PdfManipulation2.vb 138 65 PDFMerge_Window

    Error 2 'MessageBox' is not declared. It may be inaccessible due to its protection level. C:\VB.Net\PDFMerge_Window\PDFMerge_Console\PdfManipulation2.vb 179 13 PDFMerge_Window

  10. #130
    New Member
    Join Date
    Jun 2015
    Posts
    2

    Re: [VB.NET] Pdf Manipulation Class Using iTextSharp

    When I search tutorial on extracting PDF pages using iText, I see a links reprint this topic here:http://zh.scribd.com/doc/208204720/V...BForums#scribd
    All post on it seems should be download to display.

  11. #131
    Member
    Join Date
    Jan 2014
    Posts
    62

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    hi @ stanav,

    Thank you so much for your wonderful work.

    I have a excel file which contains: One column as name and Other column as Email. Also, I have number of pdf files in a folder.

    Is it possible, to pick a pdf based on column Name and inser the email on that PDF 's first page?

    It will be very helpful for me to skip from huge manual task.

    Thank you so much in advance.

  12. #132
    New Member
    Join Date
    Apr 2017
    Posts
    1

    Re: [VB.NET] Extract Pages and Split Pdf Files Using iTextSharp

    Quote Originally Posted by stanav View Post
    Yes, I THINK it is quite possible, but it would involve much more work (obviously). I gave it a shot as seen in the code below but frankly, the method I was using only works to some extends. It only preserves the 1st level bookmarks . My approach was to export the bookmarks in the original pdf to a collection, and, select the pages to be extract from the reader, use pdfstamper to copy the original pdf (with now only the selected pages) to a new pdf. Since pdfstamper automatically preserves ALL the bookmarks from the original, I had to edit the bookmark collection to remove the unused ones. This approach should work but I don't know why it only preserves 1st level bookmarks. Some more work is needed to work that bug out, but I don't have the time right now. I will post just what I have so far.
    vb.net Code:
    1. ''' <summary>
    2.     ''' Extract pages from an existing pdf file to create a new pdf with bookmarks preserved
    3.     ''' </summary>
    4.     ''' <param name="sourcePdf">full path to sthe source pdf</param>
    5.     ''' <param name="pageNumbersToExtract">an integer array containing the page number of the pages to be extracted</param>
    6.     ''' <param name="outPdf">the full path to the output pdf</param>
    7.     ''' <remarks></remarks>
    8.     Public Shared Sub ExtractPdfPages(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
    9.  
    10.         Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
    11.         Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    12.         Dim outlines As System.Collections.ArrayList = Nothing
    13.         Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
    14.         Dim stamper As iTextSharp.text.pdf.PdfStamper = Nothing
    15.         Dim hshTable As System.Collections.Hashtable = Nothing
    16.         Try
    17.             raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
    18.             reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
    19.             outlines = iTextSharp.text.pdf.SimpleBookmark.GetBookmark(reader)
    20.             reader.SelectPages(New System.Collections.ArrayList(pageNumbersToExtract))
    21.             stamper = New iTextSharp.text.pdf.PdfStamper(reader, New IO.FileStream(outPdf, IO.FileMode.Create))
    22.             RemoveUnusedBookmarks(outlines, pageNumbersToExtract)
    23.             stamper.Outlines = outlines
    24.             stamper.Close()
    25.             reader.Close()
    26.         Catch ex As Exception
    27.             MessageBox.Show(ex.Message)
    28.         End Try
    29.     End Sub
    30.  
    31.     Private Shared Sub RemoveUnusedBookmarks(ByRef bookmarks As System.Collections.ArrayList, ByVal pagesToKeep() As Integer)
    32.         Dim bookmark As System.Collections.Hashtable = Nothing
    33.         Dim obj As Object = Nothing
    34.         For i As Integer = bookmarks.Count - 1 To 0 Step -1
    35.             obj = bookmarks(i)
    36.             If TypeOf obj Is System.Collections.ArrayList Then
    37.                 RemoveUnusedBookmarks(DirectCast(obj, System.Collections.ArrayList), pagesToKeep)
    38.             ElseIf TypeOf obj Is System.Collections.Hashtable Then
    39.                 bookmark = DirectCast(obj, System.Collections.Hashtable)
    40.                 If bookmark.ContainsKey("Page") Then
    41.                     Dim value As String = DirectCast(bookmark.Item("Page"), String)
    42.                     If Not String.IsNullOrEmpty(value) Then
    43.                         Dim parts() As String = value.Split(" "c)
    44.                         If parts.Length > 0 Then
    45.                             Dim pageNum As Integer = -1
    46.                             If Integer.TryParse(parts(0), pageNum) Then
    47.                                 Dim idx As Integer = System.Array.IndexOf(pagesToKeep, pageNum)
    48.                                 If idx < 0 Then
    49.                                     bookmarks.Remove(obj)
    50.                                 Else
    51.                                     parts(0) = (idx + 1).ToString
    52.                                     value = String.Join(" ", parts)
    53.                                     bookmark.Item("Page") = value
    54.                                 End If
    55.                             End If
    56.                         End If
    57.                     End If
    58.                 End If
    59.             End If
    60.         Next
    61.     End Sub

    Another approach I thought of was to export the original bookmarks to an XML file and edit that file. Once done, import it back to the new pdf file (which contains only the extracted pages). But like I said, I'm currently donot have a lot of free time to play with it. So I leave it to you to try

    Good luck.

    THANK YOU Stanav - You got me further than I had been in two days.

    I tweaked it a bit in order to return actual page numbers so I can build a call for iTextSharp to recompile with only pages needed based on finding on a page.

    Code:
    Public Shared Function SearchTextFromPdf(ByVal sourcePdf As String, ByVal searchPhrase As String) As Integer
            Dim foundList As New Integer
            Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    
            Try
                raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
                reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
    
                For i As Integer = 1 To reader.NumberOfPages()
                    Dim pageText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, i)
    
                    If pageText.Contains(searchPhrase) Then
                        Return i
                        MessageBox.Show(i.ToString)
                        Exit Function
                    End If
                Next
    
                reader.Close()
            Catch ex As Exception
                MessageBox.Show(ex.Message)
            End Try
            Return 0
        End Function

Page 4 of 4 FirstFirst 1234

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width

Survey posted by VBForums.