How can I count the number of pages in a pdf document?
(without using Acrobat SDK if possible)
Printable View
How can I count the number of pages in a pdf document?
(without using Acrobat SDK if possible)
You might be able to using some other PDF program like CutePDF or PDF995. But other then using some third party program or SDK I dont think its possible? Maybe try printing it to file (.tiff) and reading it back in or womething like that?
Give this a try,
Open the file for binary mode input and count number of /Page tag in the file.
Edit: I've posted this same reply just a few minutes ago. But my page didn't refreshed due to network problem. If my previous message was moderated, please delete this too.
I googled for PDF to TIFF converter. couldnt find any free libraries. Can you help?
If we use SDK, does the user need full Acrobat to run the application? or can he run it with the help of Acrobat reader only?
Very good answer, iPrank. I was looking at the SDK instead, and also saw that there are COM components for this that don't show you source code. :)
If I create the app using SDK 7.0, can the user run it using Acrobat Reader older than 7.0?
Or you can open the pdf in wordpad and see that the fourth line is the "/Pages 2 0 R" which tells you the number of pages for the document but it seems to be off by 2 as far as I can see. It was an 18 page document in acrobat but the tag shows 20? 1 for a header and 1 for a footer or something like that I guess?
I think that's something else. I have '25' for a 2 page document here. Damn those Adobers and their proprietary formatting. :D
That's nice.
for a 518 page pdf file opened in wordpad, I saw "/N 518" in 10th line
Are we on different versions of Acrobat? I don't have a /pages 2 or a /n 2 in my PDF file when viewed in wordpad or notepad for that matter.
I tried on other pdfs and getting weird results. A pdf with 5 pages shows /Pages 509. Another doesnt show any at all. Another shows in a different line, etc.
I think it would depend upon the version of Acrobat that the document was created with and not the viewer?
There was no /N for another 29 page PDF. But there is a
<<
/Type /Pages
/Kids [ 5 0 R etc...]
/Count 29
>>
I think this has to do with the version in which the PDFs were created.
%PDF-1.3
^^ might be something.
I guess that /Pages and /Page are completely different things. What does the PDF specification says ?
I have tried with different page-numbered docs, and all the time, count of /Page tag matches the total pages.
Search for /N, /Count depending on version.
Not a good logic so come back to SDK
I have tested with PDF v1.2, v1.3, v1.4, v1.5.
In all versions count of /Page tag is same as the total number of pages.
Yap ! That would be the best choice. :DQuote:
Originally Posted by jain_mj
from PDF documentation:
6.3 Pages tree
The pages of a document are accessible through a tree of nodes known as the Pages
tree. This tree defines the ordering of the pages in the document.
To optimize the performance of viewer applications, the Acrobat Distiller program
and Acrobat PDF Writer construct balanced trees. (For further information on
balanced trees, see reference [15] in the Bibliography on page 506.) The tree
structure allows applications to quickly open a document containing thousands of
pages using only limited memory. Applications should accept any sort of tree
6: Document Structure March 11, 1999
72 Adobe Systems Inc.
structure as long as the nodes of the tree contain the keys described in Table 6.4.
The simplest structure consists of a single Pages node that references all the page
objects directly.
Note The structure of the Pages tree for a document is unrelated to the content of the
document. In a PDF file for a book, for example, there is no guarantee that a
chapter is represented by a single node in the Pages tree. Applications that
consume or produce PDF files are not required to preserve the existing structure of
the Pages tree.
The root and all interior nodes of the Pages tree are dictionaries, whose minimum
contents are shown in Table 6.4.
Table 6.4 Pages attributes
Key Type Semantics
Type name (Required) Object type. Always Pages.
Kids array (Required) List of indirect references to the immediate children of this Pages node.
Count integer (Required) Specifies the number of leaf nodes (imageable pages) under this node.
The leaf nodes do not have to be immediately below this node in the tree, but can
be several levels deeper in the tree.
Parent dictionary (Required; must be indirect reference) Pages object that is the immediate ancestor
of this Pages object. The root Pages object has no Parent.
The following illustrates the Pages object for a document with three pages.
Appendix A contains an example showing the Pages tree for a document
containing 62 pages.
Example 6.3 Pages tree for a document containing three pages
2 0 obj
<<
/Type /Pages
/Kids [4 0 R 10 0 R 24 0 R]
/Count 3
>>
endobj
Seems for version 1.3 that this is the identifing format...
%PDF-1.3
/Count 18
That makes sense as it looks like the "/Page" is just an identifier for the start of a new page and the /Parent tag identifies the owner page as when presented in the treeview.
%PDF-1.3
1 0 obj
<< /Type /Catalog
/Pages 2 0 R
'...
<< /Type /Page
/Parent 2 0 R
if SDK is not easy to use, I will use this methodQuote:
I have tested with PDF v1.2, v1.3, v1.4, v1.5.
In all versions count of /Page tag is same as the total number of pages.
Since you do want to avoid the SDK, might as well do this. It'll also save you the hassle of having to include the SDK class libraries when you deploy your application.
I will test "/Page" counting soon. For the time being, [RESOLVED]
Code ready. Please have suggestions for improving the code
VB Code:
Dim SR As New StreamReader(FileName) Dim PDFData As String = SR.ReadToEnd Dim PageCount As Int16 = 0 ''---------------------------------------------------------------------- ''Slow but Simple method : Good for small files '' 'Dim c As Int16 'While PDFData.IndexOf("/Type /Page") <> -1 ' 'Should be "/Page" and not "/Pages". So check for "s" ' If PDFData.Substring(PDFData.IndexOf("/Type /Page") + 11, 1) <> "s" Then ' c += 1 ' End If ' PDFData = PDFData.Substring(PDFData.IndexOf("/Type /Page") + 11) 'End While 'PageCount = c ''---------------------------------------------------------------------- ''---------------------------------------------------------------------- ''Faster but a bit lengthy : Good for large files '' ''Temp Variables Dim TypePagesIndex As Integer Dim StartIndex As Integer 'Starting index of the Pages Object Dim EndIndex As Integer 'Ending index of the Pages Object Dim CountIndex As Int16 'Starting index of "/Count" Dim chars() As Char = {"/", ">"} Dim tmp As String Dim CountEndIndex As Int16 'Index of next "/" after "/Count" While PDFData.IndexOf("/Type /Pages") <> -1 'Get an Object of type 'Pages' from PDF file TypePagesIndex = PDFData.IndexOf("/Type /Pages") tmp = PDFData.Substring(0, TypePagesIndex) StartIndex = tmp.LastIndexOf("<<") tmp = PDFData.Substring(TypePagesIndex) EndIndex = TypePagesIndex + tmp.IndexOf(">>") + 1 tmp = PDFData.Substring(StartIndex, EndIndex - StartIndex + 1) 'Now tmp="<< /Kids, /Count etc >>" 'the pagecount is just after "/Count " in tmp CountIndex = tmp.IndexOf("/Count") CountIndex += 7 'Move index to the end of "/Count " tmp = tmp.Substring(CountIndex) 'now tmp="Pagecount ....>>" 'Pagecount is followd by a newline like char and then "/" or ">>" CountEndIndex = tmp.IndexOfAny(chars) - 1 tmp = tmp.Substring(0, CountEndIndex) 'Get the PageCount If PageCount < Val(tmp) Then PageCount = Val(tmp) PDFData = PDFData.Substring(EndIndex + 1) End While ''---------------------------------------------------------------------- MsgBox("# Pages = " & PageCount)
For complete PDF specification
http://www.wotsit.org/download.asp?f=pdfspec
Code edit
VB Code:
Dim SR As New StreamReader(FileName) Dim PDFData As String = SR.ReadToEnd ''---------------------------------------------------------------------- ''Slow but Simple method : Good for small files '' 'Dim c As Int16 'While PDFData.IndexOf("/Type /Page") <> -1 ' 'Should be "/Page" and not "/Pages". So check for "s" ' If PDFData.Substring(PDFData.IndexOf("/Type /Page") + 11, 1) <> "s" Then ' c += 1 ' End If ' PDFData = PDFData.Substring(PDFData.IndexOf("/Type /Page") + 11) 'End While 'PageCount = c ''---------------------------------------------------------------------- ''---------------------------------------------------------------------- ''Faster but a bit lengthy : Good for large files '' ''Temp Variables Dim TypePagesIndex As Integer Dim StartIndex As Integer 'Starting index of the Pages Object Dim EndIndex As Integer 'Ending index of the Pages Object Dim CountIndex As Int16 'Starting index of "/Count" Dim chars() As Char = {"/", ">"} Dim tmp As String Dim tmpIndex1, tmpIndex2 As Integer Dim CountEndIndex As Int16 'Index of next "/" after "/Count" Do 'Get an Object of type 'Pages' from PDF file 'It can be "/Type /Pages" or "/Type/Pages" tmpIndex1 = PDFData.IndexOf("/Type /Pages") tmpIndex2 = PDFData.IndexOf("/Type/Pages") 'Different possibilities of 2 indices If tmpIndex1 > -1 And tmpIndex1 < tmpIndex2 Then TypePagesIndex = tmpIndex1 ElseIf tmpIndex2 > -1 And tmpIndex2 < tmpIndex1 Then TypePagesIndex = tmpIndex2 ElseIf tmpIndex1 = -1 And tmpIndex2 > -1 Then TypePagesIndex = tmpIndex2 ElseIf tmpIndex2 = -1 And tmpIndex1 > -1 Then TypePagesIndex = tmpIndex1 Else 'tmpIndex1 = -1 And tmpIndex2 = -1 Exit Do End If tmp = PDFData.Substring(0, TypePagesIndex) StartIndex = tmp.LastIndexOf("<<") tmp = PDFData.Substring(TypePagesIndex) EndIndex = TypePagesIndex + tmp.IndexOf(">>") + 1 tmp = PDFData.Substring(StartIndex, EndIndex - StartIndex + 1) 'Now tmp="<< /Kids, /Count etc >>" 'the pagecount is just after "/Count " in tmp CountIndex = tmp.IndexOf("/Count") CountIndex += 7 'Move index to the end of "/Count " tmp = tmp.Substring(CountIndex) 'now tmp="Pagecount ....>>" 'Pagecount is followd by a newline like char and then "/" or ">>" CountEndIndex = tmp.IndexOfAny(chars) tmp = tmp.Substring(0, CountEndIndex) 'Get the PageCount If PageCount < Val(tmp) Then PageCount = Val(tmp) PDFData = PDFData.Substring(EndIndex + 1) Loop ''---------------------------------------------------------------------- MsgBox("# Pages = " & PageCount)
This PDF page counter saved me so much time. I needed to know the page count for a project, and it worked perfectly. https://pdfwordcounter.io/pdf-page-counter/
Your answer is about 19yrs late. lol
If you have some code you would like to share there is a Code Bank Forum on this site. You should post it there.