[RESOLVED] I've got new problem can someone please shed some light on this for me
I'm downloading files from a web site.there pdf files..for some reason the web site is appending the html page source code to the end of pdf file.
So the pdf files will not open..i can open the pdf files in ultraedit and search the doc for "EOF.....<!DOCTYPE html PUBLIC" and cut everything after the EOF and add 0A to the end and the pdf file is fixed..
My problem is all of the pdf files at this web site are doing this..
how can i code something that will start at the end of the pdf file and scan backwards..find the offending code's address and save from the beginning of the file to that address location and add the 0A to the end of it.
or which ever way would be better beginning to end or end to beginning...
Re: I've got new problem can someone please shed some light on this for me
Ultimately it depends on how you're downloading it too.... that would be the first thing I'd look at... make sure that when you're downloading it, you're getting the PDF and just the PDF... seems a bit odd that you're getting extra stuff...
Re: I've got new problem can someone please shed some light on this for me
hi no the way it's downloading is appearently a bug with the web site..the file extensions are correct,,the xml files are using the same type link and those files are fine..as well as zip files..but the pdf's for some reason are getting the html code attached to the end of the pdf..i know it's odd..first time i've ever run across this type of issue .
as for making the html a string i don't think that will work..this would mean that i would have to read the whole pdf as a string..and pdf's as well as any file that isn't plain text have non-printable text..so reading it as a string won't work..
Re: I've got new problem can someone please shed some light on this for me
This may work: Open the pdf as text file and chop off what you don't need. Convert the remaining string to bytes and then write it back to a file with .pdf extension using binarywriter. If you provide a sample pdf file, I'll see what I can do...
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -
Re: I've got new problem can someone please shed some light on this for me
the file attached give a good idea of whats going on.
Well, not really, as my PDF reader simply reads this as a PDF file without any apparent difficulty or any extraneous matter displayed. As the guys said, without some information on the method of download (weirdly, we like code better than vague descriptions!) and, if possible, the site in question there's really not a whole lot we can do!
As the 6-dimensional mathematics professor said to the brain surgeon, "It ain't Rocket Science!"
Reviews: "dunfiddlin likes his DataTables" - jmcilhinney
Please be aware that whilst I will read private messages (one day!) I am unlikely to reply to anything that does not contain offers of cash, fame or marriage!
Re: I've got new problem can someone please shed some light on this for me
if you open that file in a hexeditor.you will see quit clearly that from the end of the file upwards there is nothing but html code.
Not the whole file mind ya but at some point you find the beginning of the html file and just before that you will find the actual end of the pdf EOF or 454F46 removing everything after that EOF and making the last byte 0A fixes the file ..i'm not sure how you got it to open in acrobat..as i've tried reader and full version of 7 maybe the newer acrobat like 11 will open it,,but i'm not installing something i don't need just to open a file that when fixed to the correct length will open just fine in the version i have installed..
Re: I've got new problem can someone please shed some light on this for me
I use Foxit Reader (free and free from Adobe bloat) but it also shows perfectly adequately in Universal Viewer and Internet Explorer.
As the 6-dimensional mathematics professor said to the brain surgeon, "It ain't Rocket Science!"
Reviews: "dunfiddlin likes his DataTables" - jmcilhinney
Please be aware that whilst I will read private messages (one day!) I am unlikely to reply to anything that does not contain offers of cash, fame or marriage!
Re: I've got new problem can someone please shed some light on this for me
If you use notepad to open the sample pdf file you uploaded, you'll see that it uses external references. The xref points to the 2nd half of the file, which is the html source code of a web page. To get rid of the html, you will need to open the file in a pdf reader and then save it. The act of opening and saving seems to consolidate those external references and gets rid of the html. From these findings, I've come up with a solution for you. You can use iTextSharp to open the original file and then use pdfCopy to save a copy of the file, which will be in proper pdf format. After that, you can delete the original and rename the newly created file to the old file (optional).
Here is the code for making a copy of the pdf file using itextsharp
Code:
Public Shared Sub FixPdf(ByVal sourcePdf As String)
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Dim doc As iTextSharp.text.Document = Nothing
Dim pdfCpy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim page As iTextSharp.text.pdf.PdfImportedPage = Nothing
Dim pageCount As Integer = 0
Dim ext As String = IO.Path.GetExtension(sourcePdf)
Dim fileName As String = IO.Path.GetFileNameWithoutExtension(sourcePdf)
Dim outfile As String = IO.Path.Combine(IO.Path.GetDirectoryName(sourcePdf), String.Format("{0}_fixed{1}", fileName, ext))
Try
reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
pageCount = reader.NumberOfPages
doc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1))
pdfCpy = New iTextSharp.text.pdf.PdfCopy(doc, New IO.FileStream(outfile, IO.FileMode.Create))
doc.Open()
For i As Integer = 1 To pageCount
page = pdfCpy.GetImportedPage(reader, i)
pdfCpy.AddPage(page)
Next
doc.Close()
reader.Close()
'Delete the original and rename the new pdf. This is optional, of course...
'IO.File.Delete(sourcePdf)
'IO.File.Move(outfile, sourcePdf)
Catch ex As Exception
Throw ex
End Try
End Sub
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it. - Abraham Lincoln -