dcsimg
Results 1 to 8 of 8

Thread: Best package to handle PDF files (read its content and split pages)

  1. #1

    Thread Starter
    Member
    Join Date
    Jul 2018
    Posts
    42

    Question Best package to handle PDF files (read its content and split pages)

    Hi everyone,

    One of next steps in my proyect is reading and handle PDF files.

    In my previous version developed it vba on a excel file I used the own adobe library (requires having Acrobat Reader Pro), but now, using visual studio and c# I cant use it for different reasons.

    So, I need a package that allows me to:

    1.-Read PDF content (text, in a first moment no images, only text). I need to be able to read the text in a way I know in with page this text is.

    2.-Once I read the text, in some cases it's done, but in others I will have to split the PDF of 2 or multiple PDFs, depending on the text I find inside.

    3.-I'll use it in multithread scenario. Of course, each thread will handle a different PDF file.

    4.-If possible, free to use in the program I'm developing and I want to sell when finished.

    I mean, in some cases I'll leave a 5 pages as it, in others I'll have to split in 5 PDFs of one page each, in others I'll need a new file with pages 1-2, other with page 3 and a third PDF with pages 4-5 (it'll depend on what i find inside).

    Recently I had to look for a way to read/write excel files and after a long search and try I used NPOI, a pakage available on NuGet, I even wrote a post explaining how it works, so it can help some others in my situation:

    http://www.vbforums.com/showthread.p...mmies-like-me)

    Now I've start the search of the PDF package writing "PDF" on nuget and it's what I have:



    It seems Spire.PDF could be ok, but I'd love to hear some advices from you.

    Regards in advance

  2. #2
    .NUT jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    102,999

    Re: Best package to handle PDF files (read its content and split pages)

    I can't claim to have used any of them but I do spend a fair bit of time on forums and the one that is mentioned more often than any other is iTextSharp, so that is likely to be the one you can get the most help with.

  3. #3

    Thread Starter
    Member
    Join Date
    Jul 2018
    Posts
    42

    Re: Best package to handle PDF files (read its content and split pages)

    Quote Originally Posted by jmcilhinney View Post
    I can't claim to have used any of them but I do spend a fair bit of time on forums and the one that is mentioned more often than any other is iTextSharp, so that is likely to be the one you can get the most help with.
    Hi jmcilhinney,

    English is not my first language and I don't know exactly how to translate the expression I want to use to answer you, but I'll use one I think is clear:

    Your words are sacred to me.

    I'll look for information of iTextSharp and will consider it as my first option.

    Many thanks!!

  4. #4
    Superbly Moderated NeedSomeAnswers's Avatar
    Join Date
    Jun 2002
    Location
    Manchester uk
    Posts
    2,570

    Re: Best package to handle PDF files (read its content and split pages)

    You need to be aware that ITextSharp changed there Licencing

    It now uses the AGPL licence which means that it can be used for free on condition that you also release the source code of your project for free under the same license.

    If your not planning on doing that then you will probably want to look at another option.

    PDFsharp for instance is published under the MIT licence and can be used for commercial projects for more info see here
    Please Mark your Thread "Resolved", if the query is solved & Rate those who have helped you



  5. #5
    .NUT jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    102,999

    Re: Best package to handle PDF files (read its content and split pages)

    Quote Originally Posted by NeedSomeAnswers View Post
    You need to be aware that ITextSharp changed there Licencing

    It now uses the AGPL licence which means that it can be used for free on condition that you also release the source code of your project for free under the same license.
    I don't think that that's the case. I think that what that license means is that, if you create a modified version of iTextSharp then you are obliged to make the source for that modified version freely available. It doesn't mean that an application that uses an unmodified version of iTextSharp must have its source code made public.

  6. #6

    Thread Starter
    Member
    Join Date
    Jul 2018
    Posts
    42

    Re: Best package to handle PDF files (read its content and split pages)

    Many thanks to both, NeedSomeAnswers and jmcilhinney.

    I really don't care using one or another, but of course I need a library/package that allows me what I told in the first post (I suppose both does):

    To sum up, I only need to get the text inside the PDF (preferibly in the same order appears inside it), I need to know when a new page starts. And, when I analyze text inside PDF I need to be able to split it in different new PDF files with the appropiate pages range.

    Of course I'd thank if it's easier to implement and takes me less headaches and time making it work xD

    But... Yes, my intention is selling the program without publishing source code; even more, I will use some tool to protect it from piracy. I'm putting a lot of my time and effort into it and I'd like to convert it into my new job or at least complement the one I've now.

    If I'm in a doubt, to avoid future problems I'd give a try to PDFSharp, unless you told me is quite more difficult to use/implement than iTextSharp or does not meet my needs.

    Regards and thanks again.

  7. #7
    .NUT jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    102,999

    Re: Best package to handle PDF files (read its content and split pages)

    Quote Originally Posted by jmcilhinney View Post
    I don't think that that's the case. I think that what that license means is that, if you create a modified version of iTextSharp then you are obliged to make the source for that modified version freely available. It doesn't mean that an application that uses an unmodified version of iTextSharp must have its source code made public.
    Actually, I think I may have been wrong about that.

  8. #8
    Superbly Moderated NeedSomeAnswers's Avatar
    Join Date
    Jun 2002
    Location
    Manchester uk
    Posts
    2,570

    Re: Best package to handle PDF files (read its content and split pages)

    Actually, I think I may have been wrong about that.
    Yeah i am pretty sure you are (that must be a rare occurrence ), under the AGPL licence if you use ITextSharp you have to basically be doing an Open Source project.

    You can purchase ITextSharp for commercial use but that comes under a different licence
    Please Mark your Thread "Resolved", if the query is solved & Rate those who have helped you



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width