[RESOLVED] Extracting and merging pages with iText
Dear All,
Itext version - 5.3.4
I am using the PdfManipulation2.vb supplied by Stanav.
Code:
''' <summary>
''' Extract selected pages from a source pdf to a new pdf
''' </summary>
''' <param name="sourcePdf">the full path to source pdf to a new pdf</param>
''' <param name="pageNumbersToExtract">the page numbers to extract (i.e {1, 3, 5, 6})</param>
''' <param name="outPdf">The full path for the output pdf</param>
''' <remarks>The output pdf will contains the extracted pages in the order of the page numbers listed
''' in pageNumbersToExtract parameter.</remarks>
Public Overloads Shared Sub ExtractPdfPage(ByVal sourcePdf As String, ByVal pageNumbersToExtract As Integer(), ByVal outPdf As String)
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim doc As iTextSharp.text.Document = Nothing
Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Try
reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
doc.Open()
For Each pageNum As Integer In pageNumbersToExtract
page = pdfCpy.GetImportedPage(reader, pageNum)
pdfCpy.AddPage(page)
Next
doc.Close()
reader.Close()
Catch ex As Exception
Throw ex
End Try
End Sub
My problem is when I try and use it I get "Expected expression" at the page numbers {1,3,5,6}. :(
Code:
PdfManipulation2.ExtractPdfPage("C:\Test.pdf",{1,3,5,6}, "C:\TestExtractMerge.pdf")
Is it a syntax problem.. Any ideas would be great!
Thank you in advance!
Glen
Re: Extracting and merging pages with iText
I figured it out myself: :bigyello:
Code:
Private Sub Button5_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button5.Click
Dim fname As String
fname = "C:\Test.pdf"
'create table to hold reports
Dim dtMerge As New DataTable
dtMerge.Columns.Add("column 0", GetType(String))
dtMerge.Columns.Add("column 1", GetType(String))
'add report to table, using 0 to mrge all pages
dtMerge.Rows.Add(New String() {fname, TextBox2.Text})
' TextBox2.Text = 1,3,5,6
'call pdf merge
PdfManipulation.ExtractAndMergePdfPages(dtMerge, "C:\TestExtractMerge.pdf")
End Sub
Using the PdfManipulation.vb supplied by Stanav.
Code:
''' <summary>
''' Extract pages from multiple pdf's file and merge them into
''' a single pdf
''' </summary>
''' <param name="sourceTable">the datatable containing source pfd paths and the pages to extract
''' from each of them. This datatable should have 2 datacolumns of type String. The 1st column (column 0)
''' is for the file (full) path while the 2nd column (column 1) is for the list of pages to extract from
''' the source pdf in column 1. This list is a string of integer values separated by commas
''' (ex: "1, 3, 2, 5 , 8, 7, 9") </param>
''' <param name="outPdf">the path to save the output pdf</param>
''' <remarks>the pdf pages are extracted and merged in the order list in the source datatable.
''' That is, for source pdf files, they will be merged from top row down, and for pages, they will be merged
''' by the order listed in the csv string</remarks>
Public Shared Sub ExtractAndMergePdfPages(ByVal sourceTable As DataTable, ByVal outPdf As String)
Dim rowCount As Integer = sourceTable.Rows.Count
Dim sourcePdf As String = String.Empty
Dim pageNumbersToExtract() As Integer = Nothing
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim doc As iTextSharp.text.Document = Nothing
Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Select Case rowCount
Case 0 'Nothing to extract and merge
Exit Sub
Case 1 'only 1 source pdf
sourcePdf = CStr(sourceTable.Rows(0).Item(0))
pageNumbersToExtract = ConvertToIntegerArray(CStr(sourceTable.Rows(0).Item(1)))
ExtractPdfPage(sourcePdf, pageNumbersToExtract, outPdf)
Case Else 'multiple source pdf's
Try
sourcePdf = CStr(sourceTable.Rows(0).Item(0))
pageNumbersToExtract = ConvertToIntegerArray(CStr(sourceTable.Rows(0).Item(1)))
reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outPdf, IO.FileMode.Create))
doc.Open()
For Each pageNum As Integer In pageNumbersToExtract
page = pdfCpy.GetImportedPage(reader, pageNum)
pdfCpy.AddPage(page)
Next
reader.Close()
For i As Integer = 1 To rowCount - 1
sourcePdf = CStr(sourceTable.Rows(i).Item(0))
pageNumbersToExtract = ConvertToIntegerArray(CStr(sourceTable.Rows(i).Item(1)))
reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
doc.SetPageSize(reader.GetPageSizeWithRotation(1))
For Each pageNum As Integer In pageNumbersToExtract
page = pdfCpy.GetImportedPage(reader, pageNum)
pdfCpy.AddPage(page)
Next
reader.Close()
Next
doc.Close()
Catch ex As Exception
Throw ex
End Try
End Select
End Sub
Re: Extracting and merging pages with iText
Quote:
Originally Posted by
Glenvn
Code:
PdfManipulation2.ExtractPdfPage("C:\Test.pdf",{1,3,5,6}, "C:\TestExtractMerge.pdf")
Is it a syntax problem.. Any ideas would be great!
Thank you in advance!
Glen
You need to declare the array and you're correct, it's a syntax error on your part for incorrectly doing it. That line should be like this:
Code:
Dim myPages() as Integer = New Integer() {1, 3, 5, 6}
PdfManipulation2.ExtractPdfPage("C:\Test.pdf",myPages, "C:\TestExtractMerge.pdf")
If you prefer to cramp everything in 1 line then
Code:
PdfManipulation2.ExtractPdfPage("C:\Test.pdf", New Integer() {1,3,5,6}, "C:\TestExtractMerge.pdf")