Results 1 to 5 of 5

Thread: Reading data from pdf file using itext is returning weird data

  1. #1

    Thread Starter
    New Member
    Join Date
    Mar 2015
    Posts
    3

    Reading data from pdf file using itext is returning weird data

    I have some basic code where I am attempting to read from a pdf file using itextsharp. But when I see the data, it is non-readable. I tried converting to utf-8 but no success.

    Here is the code I am using:

    reader = New iTextSharp.text.pdf.PdfReader(sourcePdf)
    Dim strategy As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
    currpage = 1
    currentPageText = PdfTextExtractor.GetTextFromPage(reader, currpage, strategy)

    Dim utf8Encoding As New System.Text.UTF8Encoding(True)
    Dim encodedString() As Byte
    encodedString = utf8Encoding.GetBytes(currentPageText)
    Dim currtext = utf8Encoding.GetString(encodedString)
    MsgBox(currtext)


    Attached is what the field currtext looks like after retrieving the data. What do I need to do to see it as plain text?
    Attached Images Attached Images  

  2. #2
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    32,939

    Re: Reading data from pdf file using itext is returning weird data

    then it's probably not UTF-8...

    but this doesn't make sense to me...
    encodedString = utf8Encoding.GetBytes(currentPageText)
    Dim currtext = utf8Encoding.GetString(encodedString)

    you're taking a string, getting the bytes in UTF-8, then passing it right back and converting it to a UTF-8 string... seems to me that if the text was UTF-8, it would come out the same way as it went in (currentPageText = encodedString) ... but since it's coming out all jacked, that leads me to believe the encoding isn't what you think it is.

    What is currentPageText initially? Are you SURE it's UTF-8?

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  3. #3

    Thread Starter
    New Member
    Join Date
    Mar 2015
    Posts
    3

    Re: Reading data from pdf file using itext is returning weird data

    the currentpagetext is what you see in the attachment. I tried the convert to utf-8 just to see if it would do anything which it isn't. So I am not sure where to go from here.

  4. #4
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    32,939

    Re: Reading data from pdf file using itext is returning weird data

    try this:
    Encoding.ASCII.GetString(Encoding.UTF8.GetBytes(yourString))

    Source: http://stackoverflow.com/questions/5...scii-in-vb-net

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  5. #5

    Thread Starter
    New Member
    Join Date
    Mar 2015
    Posts
    3

    Re: Reading data from pdf file using itext is returning weird data

    No luck unfortunately. Btw I converted the code to this (I think the command on that site was in C):

    currtext = ASCIIEncoding.GetString(utf8Encoding.GetBytes(currentPageText))

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width