Results 1 to 12 of 12

Thread: Recommendation for "printing" PDF to remove all metadata?

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Jul 2017
    Posts
    760

    Question Recommendation for "printing" PDF to remove all metadata?

    I find myself in need to remove all meta data from PDFs before passing them on, for example when I receive a PDF from person A (for example a co-worker) to person B (a customer).
    I really don't like how much meta data is in PDFs and how difficult it is to remove them, so I developed the habit of printing each PDF and then scanning it to get rid of all meta data.

    Can anybody recommend a simpler way to do that than printing them out physically?
    Loosing the ability to extract text from it is not a problem, I only need the visual representation of the PDF contents.

    Also, I don't want to use any 3rd party tools anymore except RC6. In fact Olafs work is one of the few that I trust without seeing the source code.

    Thank you for any recommendation.

  2. #2
    Hyperactive Member -Franky-'s Avatar
    Join Date
    Dec 2022
    Location
    Bremen Germany
    Posts
    475

    Re: Recommendation for "printing" PDF to remove all metadata?

    You could try deleting the metadata using the IPropertyStore interface. IPropertyStore::SetValue with VT_EMPTY deletes a value from the property store.

  3. #3
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    7,653

    Re: Recommendation for "printing" PDF to remove all metadata?

    Quote Originally Posted by -Franky- View Post
    You could try deleting the metadata using the IPropertyStore interface. IPropertyStore::SetValue with VT_EMPTY deletes a value from the property store.
    Wouldn't that require the installation of a 3rd party property handler shell extension? AFAIK there's nothing built in for PDF, at least through Win10. See if you can do it in Explorer. If you can do it in Explorer, then you have a decent chance of being able to do it through IPropertyStore (or guaranteed you can if you use twinBASIC; not all property handlers install 32bit versions for VB6 these days; if there's only a 64bit one that won't load in an out of process server, you'd need a 64bit exe).

    Using only RC6+native VB is going to be a big problem for any kind of PDF handling beyond basic 'Print to PDF'. If you can't trust Microsoft Office or Google's open source pdfium, it's going to mean manually writing a pdf parser yourself in all likelihood.
    Last edited by fafalone; Jan 2nd, 2025 at 10:39 AM.

  4. #4

    Thread Starter
    Fanatic Member
    Join Date
    Jul 2017
    Posts
    760

    Re: Recommendation for "printing" PDF to remove all metadata?

    I am under the impression that (please correct me if I am wrong), PDF is going to stay a standard.
    Lawyers, tax counsellors, etc. all use PDF.
    In fact I was hoping that ultimatively somebody / the community would create a PDF library in TB or in VB6 so that we don't have to rely on third party tools anymore.
    But obviously this is not the case yet.

  5. #5
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    7,653

    Re: Recommendation for "printing" PDF to remove all metadata?

    Google's pdfium is open source and easily used by VB6, VBA, and twinBASIC; so is xpdf. It's a massive undertaking so I don't know why anyone wouldn't use well established open source tools or why anyone would devote 6+ months of full time work for free to duplicate those efforts winding up with a codebase so complex 99% VB6 users couldn't modify it anyway .. tB supports static linking so you could probably eliminate the .dll if you wanted there... but at least it's a flat DLL so there's not even ActiveX registration hell.

  6. #6

    Thread Starter
    Fanatic Member
    Join Date
    Jul 2017
    Posts
    760

    Re: Recommendation for "printing" PDF to remove all metadata?

    Could you lend me a hand by telling me if there is a library or a wrapper around a library available (that I could use from within VB6) that you would do what I need, even if it reduced the PDF contents to pure images?
    I took a look at Olaf Schmidt's PDF image extractor, but I think I misunderstood its purpose. It does extract images, but it does not convert the contents to images. This is what I would actually need.

  7. #7
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    7,653

    Re: Recommendation for "printing" PDF to remove all metadata?

    The OrdoPdfReader control displays pages as images with pdfium.

    My gPdfMerge works with pdfium text functions to perform searches.

    Stripping metadata without just going through image rendering or text extraction seems to be such pain only Adobe's commercial closed source crap can do it. So I don't know who'd even be able to make such a tool in any language. If you install Adobe's stuff they have property handlers you could go through the Windows shell to remove easily.
    Last edited by fafalone; Jan 8th, 2025 at 08:26 PM.

  8. #8

    Thread Starter
    Fanatic Member
    Join Date
    Jul 2017
    Posts
    760

    Re: Recommendation for "printing" PDF to remove all metadata?

    Thank you.
    I took a look at Olaf's solution again.
    I was so stupid (aka under time pressure) that I didn't realize it has everything I need.

  9. #9
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    7,653

    Re: Recommendation for "printing" PDF to remove all metadata?

    The irony of not wanting 3rd party dependencies only to use *both* closed source RC6 comprised of *several* dlls PLUS pdfium instead of a complete open source solution of just pdfium with the pdf page to image code in the ordo pdf project....

  10. #10
    Hyperactive Member -Franky-'s Avatar
    Join Date
    Dec 2022
    Location
    Bremen Germany
    Posts
    475

    Re: Recommendation for "printing" PDF to remove all metadata?

    Starting with Win10, you can use WinRT to render the pages of a PDF into images. These images can then be combined into a PDF using a PDF printer, for example.

  11. #11

    Thread Starter
    Fanatic Member
    Join Date
    Jul 2017
    Posts
    760

    Re: Recommendation for "printing" PDF to remove all metadata?

    I didn't express myself clearly. I meant to say that I want to be able to trust the end result.
    Using Olaf's project I can convert the PDF pages to images and then put these together as PDF pages again. That is what I needed.

    I did realize that Windows has the "Print to PDF" option, but as I didn't have a clue what it does under the hood and whether it then still contains meta data, I didn't want to use it. In fact, the print to PDF thing gave me varying results in aspects of editable fields finally showing up when printed to PDF or NOT showing up anymore or BEING able to be edited and NOT being able to be edited anymore. As I needed images only, I ditched this approach.

  12. #12
    PowerPoster
    Join Date
    Jul 2010
    Location
    NYC
    Posts
    7,653

    Re: Recommendation for "printing" PDF to remove all metadata?

    Quote Originally Posted by tmighty2 View Post
    I didn't express myself clearly. I meant to say that I want to be able to trust the end result.
    Using Olaf's project I can convert the PDF pages to images and then put these together as PDF pages again. That is what I needed.

    I did realize that Windows has the "Print to PDF" option, but as I didn't have a clue what it does under the hood and whether it then still contains meta data, I didn't want to use it. In fact, the print to PDF thing gave me varying results in aspects of editable fields finally showing up when printed to PDF or NOT showing up anymore or BEING able to be edited and NOT being able to be edited anymore. As I needed images only, I ditched this approach.
    Your original post said you didn't want 3rd party tools, but RC6 is just wrapping pdfium, so you're still using it, just adding a bunch of other closed source 3rd party DLLs instead of just calling FPDF_RenderPage in the pdfium dll yourself.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width