Results 1 to 40 of 40

Thread: Itextsharp search word in multiple PDF then isolate the PDF document

  1. #1

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Itextsharp search word in multiple PDF then isolate the PDF document

    My first post. I already used itextsharp pdfreader to find a word in multiple PDF documents. Now I want to copy the PDF that contains the word into a new PDF. How can I download Manipulatepdf2.vb? I think the class includes method to accomplish what I am trying to do. Thank you all.

  2. #2
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    From what you wrote, it seems like you want to merge 2 or more pdf documents into a single pdf. To do this, you can use either the MergePdfFiles or MergePdfFilesWithBookmarks method found in PdfManipulation2 class. You can download that class here:
    http://www.vbforums.com/showthread.p...ing-iTextSharp
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  3. #3

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Actually I do not want to merge. I found and downloaded PdfManipulation2 class. So what I want is this. I used pdfreader to search for a word in a mutiple pdf document in other words the document contains all kinds of individual pdfs. pages range from 1 to 3. Now let's say I searched for "Dave Jones". I found it on page 5! How do I get the page number or document to PdfManipulation2.ExtractPdfPage("sss.pdf",pagenumber,outputpdf) ? Thank you.

  4. #4

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    I tried this from your prior post but not working. Thank you.

    'Specified the path to the source pdf file
    Dim sourcePdf as sgtring = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf"

    'Extract page # 2 off this above pdf file
    Dim pageNumberToExtract As Integer = 2

    'And then save it to a new pdf named 'table40_page2.pdf'
    Dim outputPdf As String = "C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40_page2.pdf"

    'Call the sub somewhere in your program passing in the above arguments
    PdfManipulation.ExtractPdfPage("C:\Documents and Settings\rch\Desktop\psm2010\venteps\out\table40.pdf", pageNumberToExtract, outputPdf)

  5. #5
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    1. To know on which page your search term was found, you need to search for it page by page. That is, run a loop thru the pdf pages and for each pdf page, you do the search. If found, you mark that page number (i.e adding it to a list) for later use. Once you get out of the loop, you check in your found list to see if anything in there. If there is, you loop thru the list and extract the pages.
    2. "It's not working" isn't very informative. It's like going to a doctor and say "I'm sick" without any detailed descriptions of the symptoms... You need to tell me what happened and/or what didn't happened when you run that code.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  6. #6

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Stanav, thank you much for your help. I think I put the error message on a different post. The message was "the item has already being created" the copypdf line creates the file and the addpage(page) line was choking.

    Please check this post.
    http://www.vbforums.com/showthread.p...textsharp+page

  7. #7
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Try to use a different output name or delete the existing one. You can't have 2 files with the same name in same directory.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  8. #8

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    I have the search word "Hello" on page 15 of a 30 page pdf document which is made up of 10 separate pdf documents. When I run this function, it finds the word after the first read when i=1 and sets sOut="Hello and the rest of the information on the page". What I am doing wrong.

    BTW I also have input directory and output directory with different file names.

    Public Shared Function GetTextFromPDF(PdfFileName As String) As String
    Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
    Dim sOut As String
    Dim _pageNumber As Integer
    Dim i As Integer
    sOut = Hello"
    Dim x As Integer = 1
    For i = 1 To oReader.NumberOfPages
    Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
    sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
    Next
    Return sOut
    End Function

  9. #9
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Try something like this:
    vb.net Code:
    1. ''' <summary>
    2.     ''' Simple page by page text search a PDF file and return a list of the page numbers where a match was found.
    3.     ''' </summary>
    4.     ''' <param name="sourcePdf">the full path to the pdf file to be searched</param>
    5.     ''' <param name="searchPhrase">the string to search for</param>
    6.     ''' <returns>List(Of Integer) containing the page number whose page contains one or more match string</returns>
    7.     ''' <remarks></remarks>
    8.     Public Shared Function SearchTextFromPdf(ByVal sourcePdf As String, ByVal searchPhrase As String, Optional ByVal caseSensitive As Boolean = False) As List(Of Integer)
    9.         Dim foundList As New List(Of Integer)
    10.         Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
    11.         Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    12.         Try
    13.             raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
    14.             reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
    15.             If caseSensitive = False Then
    16.                 searchPhrase = searchPhrase.ToLower()
    17.             End If
    18.             For i As Integer = 1 To reader.NumberOfPages()
    19.                 Dim pageText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, i)
    20.                 If caseSensitive = False Then
    21.                     pageText = pageText.ToLower()
    22.                 End If
    23.                 If pageText.Contains(searchPhrase) Then
    24.                     foundList.Add(i)
    25.                 End If
    26.             Next
    27.             reader.Close()
    28.         Catch ex As Exception
    29.             MessageBox.Show(ex.Message)
    30.         End Try
    31.         Return foundList
    32.     End Function
    After you've got the list of the page numbers where the search matched, simply loop thru it and call ExtractPdfPage method of PdfManipulation2 class.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  10. #10

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Thanks for this code however I still the get the error in pdfmanipulation2.ExtractPdfPage "An item with the same key has already been added."

    pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
    doc.Open()
    page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
    pdfCpy.AddPage(page) ----ERROR occurs here.

    Thank you.

  11. #11

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hey Stanav, Please shed some light on how to retrieve a 3-page PDF from multiple PDFs PDF file? I search for word, I find the word on page 5 but page 5 is the first page of a 3-page PDF and I want to retrieve all 3 pages into output folder. Currently, I have combines methods from PDFmanipulation2 but its become a mess. Thanks for your help.

  12. #12
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    How can you tell how many pages to grab after each search match? There has to be a rule of some sort. Computers are not human, and if the commands/rules aren't clear, they can't be executed reliably.
    As for achieving the task you are working on, I've given you all the relevant code needed to get it done. It's now just a matter of using it - modify it when necessary - to make it work the way you want. Programming is a lot more than copying and pasting.
    In order for me to provide further help, you need to:
    1. Upload a sample pdf file that I can use to test with
    2. State clearly what you want to do with the pdf
    3. State any rules/patterns that must be obeyed...

    I don't promise anything, but if I can spare some time, I'll give it a try.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  13. #13

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hey Stanav, have no fear. Thank you for the help. I figured out how to accomplish what I want but I still have that annoying error.

    pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
    doc.Open()
    page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
    pdfCpy.AddPage(page) ----ERROR occurs here.

    Please advice on what is wrong here and I will take it from there.

  14. #14
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    I can't tell you what's wrong until I have a chance to examine it myself. And that's the reason why I asked you upload a test file and provide me the necessary info to do the test. The line you pointed out where the error occurred is entirely within iTextSharp code, and therefore I'm suspecting that you do something wrong in your code rather than iTextSharp's bug.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  15. #15

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Due to constraints of the contents of the data, I cannot send actual stuff but this is what I am doing. Thanks for your patience.

    I call this function with the supplied parameters

    PdfManipulation2.ExtractPdfPage("c:\documents\xyz.PDF", 22, "c:\single_XYZPdf\single.PDF")
    "c:\documents\xyz.PDF" --contains 30 customer letters(each 3 pages long)
    "c:\single_XYZPdf\single.PDF" -- will contain page 22. ** I will write code to loop from page 22 for 3 pages to output to single_PDF

    This code is from PdfManipulation2.ExtractPdfPage
    pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create)) --This line is creating the PDF.
    doc.Open()
    page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
    pdfCpy.AddPage(page) ----ERROR occurs here.

  16. #16
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    As far as I see, I can't replicate the error... The code works as intended each and very time I run it. For testing purpose, use a different pdf file and extract a random page from it. Does that work?
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  17. #17

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hello Stanav, I tried with the attached PDF .

    Private Sub btnSearch_Click(sender As System.Object, e As System.EventArgs) Handles btnSearch.Click
    Dim sourcepdf As String = "C:\HH\diabeteslbs.pdf"
    PdfManipulation2.ExtractPdfPage(sourcepdf, 4, "c:\HO\Page_4.pdf")
    MessageBox.Show("Done!")
    I still got message "An item with the same key has already been added."
    Thanks for your help.
    Attached Images Attached Images

  18. #18
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    I still can't replicate the error using the sample file you uploaded. It's has to be something in your project or the PdfManipulation2.ExtractPdfPage code has been modified.
    Can you compare the code you have with this one? If yours is different than you know why it didn't work, right.
    Code:
     Public Overloads Shared Sub ExtractPdfPage(ByVal sourcePdf As String, ByVal pageNumberToExtract As Integer, ByVal outPdf As String)
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim doc As iTextSharp.text.Document = Nothing
            Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Try
                reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
                doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
                pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
                doc.Open()
                page = pdfCpy.GetImportedPage(reader, pageNumberToExtract)
                pdfCpy.AddPage(page)
                doc.Close()
                reader.Close()
            Catch ex As Exception
                Throw ex
            End Try
        End Sub
    Alternately, you can re-download the PdfManipulation2 class and start a fresh project to test the function. If it works, and I'm pretty sure that it will, you have your conclusion...
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  19. #19

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hey Stanav, Please advice on the difference between the two snippets. I re-downloaded the pdfmanipulation2.vb. You can see the "Throw Ex" happened when I ran the code again.

    Thank you
    Attached Images Attached Images

  20. #20

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Please check this attachment and you can see the error message. Thank you
    Attached Images Attached Images

  21. #21
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    The 2 code snippets are the same, except the 3 lines that are commented out which is OK. I have no idea why you keep getting that error while I don't... Are you using the right version of iTextSharp? It should be 5.2.1.0 or newer.
    For testing purposes, can you start a new project and test the function again?
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  22. #22

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hello again, I am on Windows 7 and using itextsharp-all-5.3.4 in a brand new VS2010 Windows From project.
    Thank you
    Attached Images Attached Images

  23. #23
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Ah... The new itextsharp version 5.3.4 seems to be the culprit. I tested using that new version and sure enough, I got the same error as you did.
    Use this 5.2.1 version below and you should be good to go....
    https://dl.dropbox.com/u/20581085/itextsharp_5.2.1.zip
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  24. #24

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Okay, I knew you were smarter than me. Thank you for sticking with me.
    Now I am going back to my original project which is
    1. search a pdf document for a word(label).
    2. Once found, select the page number as starting page and select all the pages that contain the label into a separate pdf.
    For example.
    I have a 30 page pdf that actually contain 10 customer invoices. Each invoice has a unique label. I search for label "Rome34". When I find it, I want to select all the pages that have "Rome34" to create a separate pdf.

    Thanks again.

  25. #25
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Copy and paste this function to the PdfManipulation2 class
    Code:
    Public Shared Function FindAndExtract(ByVal sourcePdf As String, ByVal outPdf As String, ByVal searchPhrase As String, Optional ByVal caseSensitive As Boolean = False) As Boolean
            Dim result As Boolean = False
            Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim doc As iTextSharp.text.Document = Nothing
            Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Try
                raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
                reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
                If caseSensitive = False Then
                    searchPhrase = searchPhrase.ToLower()
                End If
                For i As Integer = 1 To reader.NumberOfPages()
                    Dim pageText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, i)
                    If caseSensitive = False Then
                        pageText = pageText.ToLower()
                    End If
                    If pageText.Contains(searchPhrase) Then
                        If doc Is Nothing Then
                            doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
                            pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
                            doc.Open()
                        End If
                        page = pdfCpy.GetImportedPage(reader, i)
                        pdfCpy.AddPage(page)
                    End If
                Next
                If doc IsNot Nothing Then
                    doc.Close()
                    result = True
                End If
                reader.Close()
            Catch ex As Exception
                Throw ex
            End Try
            Return result
        End Function
    Usage example:
    Code:
     Private Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click
            Dim searchText As String = "4 oz. regular soda"
            Dim result As Boolean = PdfManipulation2.FindAndExtract("d:\test1.pdf", "d:\test1_extracted.pdf", searchText)
            MessageBox.Show(result.ToString)
        End Sub
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  26. #26

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hey Stanav, you did it! Now I have exactly what I was looking for in guidance and solution. Thanks again for the help and Happy Holidays.

  27. #27

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hey Stanav, I have folder X with 20 PDF files. I want to merge them into one PDF and output to folder Y.
    Can I use the wildcat *pdf with the sourceTable" ? Instead of me stringing all the PDF file names as an array?
    pdfManipulation.ExtractAndMergePdfPages(SourceTable, outPdf)
    Thanks

  28. #28
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Quote Originally Posted by pinokio View Post
    Hey Stanav, I have folder X with 20 PDF files. I want to merge them into one PDF and output to folder Y.
    Can I use the wildcat *pdf with the sourceTable" ? Instead of me stringing all the PDF file names as an array?
    pdfManipulation.ExtractAndMergePdfPages(SourceTable, outPdf)
    Thanks
    You need to use this method:
    Code:
     'Merge multiple pdfs into a single one.  
        Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String, _
                                             Optional ByVal authorName As String = "", _
                                             Optional ByVal creatorName As String = "", _
                                             Optional ByVal subject As String = "", _
                                             Optional ByVal title As String = "", _
                                             Optional ByVal keywords As String = "") As Boolean
    As you see in the function signature, it takes an array of pdf files and then merge to a single outpdf file. And yes, you can use system.io.directory.GetFiles(folderPath, "*.pdf") to get the pdf files and feed that array to the function.
    The method that you mentioned pdfManipulation.ExtractAndMergePdfPages(SourceTable, outPdf) is for extract some pages from each pdf and merge them to 1 single pdf. For example, take pages 1, 3, 7 from A.pdf, pages 9, 11, 30 from B.pdf. pages 2, 8, 11 from C.pdf and merge them into a new pdf. As you can see, since the parameters are pretty complex, it's easier to build a datatable to feed the function. However, you don't have to worry about this method since the MergePdfFiles will do exactly what you need.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  29. #29

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Many Thanks! I worked beautifully.

  30. #30

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Oops! I meant it worked beautifully. I have a question. How fast do you think the method to Extract PDF will retrieve a Tagged PDF(meaning a unique identifier) from one million page PDF? Do you also know any disk size calculation for one million pages of a PDF document?

    Thanks

  31. #31
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    I haven't work with any tagged pdf so I can't be sure on this, but it you're asking about using the FindAndExtract method above, I'd say it won't make much difference compared to non-tagged pdf's. The whole 1 mil pages are still being looped through 1 by 1. As for how long it'll take to complete a 1 mil page pdf, you're going to try and time it yourself. I don't have anything that large. The largest pdf file I've ever worked on was around 30k pages, and iTextSharp handled it without any problems.
    How much disk size a 1 mil page pdf takes? It's a tricky question because there are way too many variables involved in creating a pdf page: images, embedded resources, layers... just to name a few. And no, I don't know of anyway you can calculate or estimate the final disk size of a pdf before it is created.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  32. #32

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hi Stanav, hope you are well. I am using Pdfmanipulation2. Everything is fine. I just want to know how to position the bookmark (the BLUE font) from top of page to bottom preferably (footer section)during merging. I don't want to replace the current footer but insert the bookmark at the bottom two line footer. If that is too difficult then to the end of document before the footer. Thanks

  33. #33
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Quote Originally Posted by pinokio View Post
    Hi Stanav, hope you are well. I am using Pdfmanipulation2. Everything is fine. I just want to know how to position the bookmark (the BLUE font) from top of page to bottom preferably (footer section)during merging. I don't want to replace the current footer but insert the bookmark at the bottom two line footer. If that is too difficult then to the end of document before the footer. Thanks
    You see in the code how a paragraph is added to every page 1 of a pdf file before the original pdf page is copied over. That paragraph is what makes the bookmark. If you want the bookmark to be at the bottom of the page, just add the paragraph after you add the copied page to the new document. That is, change the inner while loop to this:
    Code:
     While i < pageCount
                            i += 1
                            'Get the input page size
                            pdfDoc.SetPageSize(reader.GetPageSizeWithRotation(i))
                            'Create a new page on the output document
                            pdfDoc.NewPage()
    
                           
                              
                            'Now we get the imported page
                            page = writer.GetImportedPage(reader, i)
                            'Read the imported page's rotation
                            rotation = reader.GetPageRotation(i)
                            'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
                            If rotation = 90 Then
                                cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
                            ElseIf rotation = 270 Then
                                cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
                            Else
                                cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
                            End If
                            'If it is the 1st page, we add bookmarks to the page
                            If i = 1 Then
                                'First create a paragraph using the filename as the heading
                                Dim para As New iTextSharp.text.Paragraph(IO.Path.GetFileName(fileName).ToUpper(), bookmarkFont)
                                'Then create a chapter from the above paragraph
                                Dim chpter As New iTextSharp.text.Chapter(para, f + 1)
                                'Finally add the chapter to the document
                                pdfDoc.Add(chpter)
                            End If
                        End While
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  34. #34

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hey Stanav, I 'm back. I hope all is well with you.
    Issue I got all the pdfs in a folder. The question is how to programmatically(VB.NET) open folder and print all the pdfs stored as individual documents.

    Thank you.

  35. #35
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    You would get a list of all the pdf files in that folder and then loop through the list printing 1 at a time.
    1. To get the pdfs in a folder, you can use System.IO.Directory.GetFiles method.
    2. To print a pdf file using the default application and printer, you start a process, set the verb to "print" and pass in the filepath as the argument. Search the forum and you will find examples.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  36. #36

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hello Stanav,
    I am trying to shrink the sizes pdf files in a folder. Basically compress each page by 80% or more without affecting the contents . I checked out the ResizePage function in the PDFManipulation2 but I am not sure it will do I need. Ideally, I would like to set the dpi to 72 and reduce the pixel count. Any ideas will be appreciated.

  37. #37
    New Member
    Join Date
    Oct 2013
    Posts
    2

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hello Stanav,

    I have just downloaded your PdfManipulation2 and it is great. However in the functions that use "token = New iTextSharp.text.pdf.PRTokeniser(pageBytes)" I get:
    Error 1 Value of type '1-dimensional array of Byte' cannot be converted to 'iTextSharp.text.pdf.RandomAccessFileOrArray'.

    Also getting these warnings:
    Warning 2 'Public Sub New(raf As iTextSharp.text.pdf.RandomAccessFileOrArray, ownerPassword() As Byte)' is obsolete: 'Use the constructor that takes a RandomAccessFileOrArray'.

    Am I doing something wrong? I am using "VB Express 2012"
    Thanks for your help
    Brad

  38. #38
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Quote Originally Posted by bhendo54 View Post
    Hello Stanav,

    I have just downloaded your PdfManipulation2 and it is great. However in the functions that use "token = New iTextSharp.text.pdf.PRTokeniser(pageBytes)" I get:
    Error 1 Value of type '1-dimensional array of Byte' cannot be converted to 'iTextSharp.text.pdf.RandomAccessFileOrArray'.

    Also getting these warnings:
    Warning 2 'Public Sub New(raf As iTextSharp.text.pdf.RandomAccessFileOrArray, ownerPassword() As Byte)' is obsolete: 'Use the constructor that takes a RandomAccessFileOrArray'.

    Am I doing something wrong? I am using "VB Express 2012"
    Thanks for your help
    Brad
    iTextSharp has evolved quite a bit since the last version that I worked on... So to answer your question, I'd need to know 2 things:
    1. What version of iTextSharp are you using?
    2. What exactly is it that you're trying to do?

    I've been extremely busy and also since I haven't had a need to use newer versions of iTextSharp, it's unlikely that I will update PdfManipulation2 class any time soon. If you're using a iTextSharp version newer than 5.2.1, I'd suggest you to download 5.2.1 and try again. Most of the time it will resolve the issues.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  39. #39
    New Member
    Join Date
    Oct 2013
    Posts
    2

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Thanks Stanav

  40. #40

    Thread Starter
    Junior Member pinokio's Avatar
    Join Date
    Dec 2012
    Location
    USA
    Posts
    27

    Re: Itextsharp search word in multiple PDF then isolate the PDF document

    Hello Stanav,
    I am back after search all of PDFManipulation2. Here is the situation. Every time I convert MS Word 2007 to PDF, The PDF is shrunk to about 90% of the Word document. Is there anyway to send printer commands to keep pdf 100%?
    Thank you.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width