Results 1 to 5 of 5

Thread: [RESOLVED] PDF compression

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Resolved [RESOLVED] PDF compression

    Hi

    I realise this may not be the most appropriate place to ask this, so apologies - but at least I might get a point in the right direction?

    I've been using the itextsharp library in a project to split a pdf into separate sheets, rename the parts, and then re-assemble them. Some of the pdfs I've been running have been 3000-5000 pages long with graphics and text.

    The split/join process works well enough, but the resulting re-joined pdf is inflated. With the original of 57mb, the output is 880mb. That's with a file of 3300 or so pages.

    Acrobat DC can sort this out just by opening the pdf and saving as a new file - it reports that it 'consolidates duplicate page backgrounds' and 'consolidates duplicate fonts' as it saves and by doing that I get the file size down to pretty much the original 57mb.

    Does anyone have a good means of doing this sort of compression from vb? - the ways I've tried so far don't make much of a saving on the file size.

    I have no budget for this, so I'd need to be able to do this without buying the Acrobat SDK!

    Thanks!


    UPDATE- just check the fonts in the joined doc and there are 6412 of them! - I reckon all but 6 or 7 are superfluous!
    Last edited by Precision999; Sep 14th, 2018 at 06:09 AM.

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: PDF compression

    Presumably you are creating all these duplicate fonts and page backgrounds. If we could see what you're doing then we might be able to see what's wrong with it.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: PDF compression

    Quote Originally Posted by jmcilhinney View Post
    Presumably you are creating all these duplicate fonts and page backgrounds. If we could see what you're doing then we might be able to see what's wrong with it.
    Hi - thanks for replying.

    I'm using Stanav's post here

    Specifically, it's the SplitPdfByPages and MergePdfFiles functions. I've not modified them much, just added a progress bar.

    Maybe I should have just replied to that thread, but with it being an old one I thought that might not be such a good idea.

    I've been doing some googling and it looks like it's more surplus fonts that are causing the problem rather than the backgrounds. Each split part of the pdf has a copy of the font, so maybe the solution lies in the merge function.


    Code:
    Public Shared Sub SplitPdfByPages(ByVal sourcePdf As String, ByVal numOfPages As Integer, ByVal baseNameOutPdf As String)
            Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim doc As iTextSharp.text.Document = Nothing
            Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Dim pageCount As Integer = 0
    
    
    
            Try
                raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
                reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
                pageCount = reader.NumberOfPages
                If pageCount < numOfPages Then
                    Throw New ArgumentException("Not enough pages in source pdf to split")
                Else
                    Dim ext As String = IO.Path.GetExtension(baseNameOutPdf)
                    Dim outfile As String = String.Empty
                    Dim n As Integer = CInt(Math.Ceiling(pageCount / numOfPages))
    
    
                    Form1.ProgressBar1.Value = 1
                    Form1.ProgressBar1.Maximum = n + 1
    
    
                    Dim currentPage As Integer = 1
    
                    Form1.jobqty = n
    
                    For i As Integer = 1 To n
    
                        outfile = baseNameOutPdf.Replace(ext, "_Part" & i & ext)
                        doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage))
                        pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
                        doc.Open()
                        If i < n Then
                            For j As Integer = 1 To numOfPages
                                page = pdfCpy.GetImportedPage(reader, currentPage)
                                pdfCpy.AddPage(page)
                                currentPage += 1
                            Next j
                        Else
                            For j As Integer = currentPage To pageCount
                                page = pdfCpy.GetImportedPage(reader, j)
                                pdfCpy.AddPage(page)
                            Next j
                        End If
    
                        Application.DoEvents()
    
                        Form1.ProgressBar1.Value += 1
    
                        doc.Close()
                    Next
                End If
                reader.Close()
    
    
            Catch ex As Exception
                Throw ex
            End Try
    
    
    
        End Sub

    Code:
    Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String, _
                                             Optional ByVal authorName As String = "", _
                                             Optional ByVal creatorName As String = "", _
                                             Optional ByVal subject As String = "", _
                                             Optional ByVal title As String = "", _
                                             Optional ByVal keywords As String = "") As Boolean
            Dim result As Boolean = False
            Dim pdfCount As Integer = 0     'total input pdf file count
            Dim f As Integer = 0            'pointer to current input pdf file
            Dim fileName As String = String.Empty   'current input pdf filename
            Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
            Dim pageCount As Integer = 0    'cureent input pdf page count
            Dim pdfDoc As iTextSharp.text.Document = Nothing    'the output pdf document
            Dim writer As iTextSharp.text.pdf.PdfWriter = Nothing
            Dim cb As iTextSharp.text.pdf.PdfContentByte = Nothing
            'Declare a variable to hold the imported pages
            Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
            Dim rotation As Integer = 0
            'Declare a font to used for the bookmarks
            Dim bookmarkFont As iTextSharp.text.Font = iTextSharp.text.FontFactory.GetFont(iTextSharp.text.FontFactory.HELVETICA, _
                                                                      12, iTextSharp.text.Font.BOLD, iTextSharp.text.BaseColor.BLUE)
    
            Try
                'pdfCount = pdfFiles.Length
                pdfCount = pdfFiles.Length - 1
                If pdfCount > 1 Then
                    'Open the 1st pad using PdfReader object
                    fileName = pdfFiles(f)
                    reader = New iTextSharp.text.pdf.PdfReader(fileName)
                    'Get page count
                    pageCount = reader.NumberOfPages
                    'Instantiate an new instance of pdf document and set its margins. This will be the output pdf.
                    'NOTE: bookmarks will be added at the 1st page of very original pdf file using its filename. The location
                    'of this bookmark will be placed at the upper left hand corner of the document. So you'll need to adjust
                    'the margin left and margin top values such that the bookmark won't overlay on the merged pdf page. The 
                    'unit used is "points" (72 points = 1 inch), thus in this example, the bookmarks' location is at 1/4 inch from
                    'left and 1/4 inch from top of the page.
                    pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1), 18, 18, 18, 18)
                    'Instantiate a PdfWriter that listens to the pdf document
                    writer = iTextSharp.text.pdf.PdfWriter.GetInstance(pdfDoc, New System.IO.FileStream(outputPath, IO.FileMode.Create))
                    'Set metadata and open the document
                    With pdfDoc
                        .AddAuthor(authorName)
                        .AddCreationDate()
                        .AddCreator(creatorName)
                        .AddProducer()
                        .AddSubject(subject)
                        .AddTitle(title)
                        .AddKeywords(keywords)
                        .Open()
                    End With
                    'Instantiate a PdfContentByte object
                    cb = writer.DirectContent
                    'Now loop thru the input pdfs
    
                    Form1.ProgressBar4.Value = 1
                    Form1.ProgressBar4.Maximum = pdfCount + 1
    
    
                    While f < pdfCount
    
    
                        'Declare a page counter variable
                        Dim i As Integer = 0
                        'Loop thru the current input pdf's pages starting at page 1
                        While i < pageCount
                            i += 1
                            'Get the input page size
                            pdfDoc.SetPageSize(reader.GetPageSizeWithRotation(i))
                            'Create a new page on the output document
                            pdfDoc.NewPage()
    
                            'If it is the 1st page, we add bookmarks to the page
                            If i = 1 Then
                                'First create a paragraph using the filename as the heading
                                'Dim para As New iTextSharp.text.Paragraph(IO.Path.GetFileName(fileName).ToUpper(), bookmarkFont)
                                'Then create a chapter from the above paragraph
                                'Dim chpter As New iTextSharp.text.Chapter(para, f + 1)
                                'Finally add the chapter to the document
                                'pdfDoc.Add(chpter)
                            End If
                            'Now we get the imported page
                            page = writer.GetImportedPage(reader, i)
                            'Read the imported page's rotation
                            rotation = reader.GetPageRotation(i)
                            'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
                            If rotation = 90 Then
                                cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
                            ElseIf rotation = 270 Then
                                cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
                            Else
                                cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
                            End If
                        End While
                        'Increment f and read the next input pdf file
                        f += 1
                        If f < pdfCount Then
                            fileName = pdfFiles(f)
                            reader = New iTextSharp.text.pdf.PdfReader(fileName)
                            pageCount = reader.NumberOfPages
                        End If
    
                        Form1.ProgressBar4.Value += 1
    
                    End While
                    'When all done, we close the document so that the pdfwriter object can write it to the output file
                    pdfDoc.Close()
                    result = True
                End If
            Catch ex As Exception
                Throw New Exception(ex.Message)
            End Try
            Return result
        End Function

  4. #4

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: PDF compression

    Hi

    I just found this link to using PdfSmartCopy - the template query is not relevant to me, but the code seems to work fine!

    I imagine the best thing to do really would be to re-do the merge to use pdfsmartcopy instead, but this code runs pretty quickly, so will do me for now. It reduced the fonts down to 70 - still too many, but much improved.

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Sep 2018
    Posts
    160

    Re: PDF compression

    Hi

    I just found this link to using PdfSmartCopy - the template query is not relevant to me, but the code seems to work fine!

    I imagine the best thing to do really would be to re-do the merge function above to use pdfsmartcopy instead, but this code runs pretty quickly, so will do me for now and I'll just run it after the merge. It reduced the fonts down to 70 - still too many, but much improved.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width