Results 1 to 1 of 1

Thread: Parsing HTML encoded content! How to convert?

Threaded View

  1. #1

    Thread Starter
    Member JohnPotier's Avatar
    Join Date
    Sep 2007
    Location
    Norway
    Posts
    42

    Question Parsing HTML encoded content! How to convert?

    Hi all,

    (Really? No one knows of a simple way to handle these encoded characters? Surprises me!)

    Is there a simple way to convert HTML encoded characters ("Ą", "&quote;" ) to their printable equivalents?

    Background:
    In my program I'm parsing HTML manually looking for all kinds of attributes in our online product catalog and I have no option to address the data before they are cached on the webserver (implying I can't go to my source data. I need to inspect after business servers have mixed our data with other data providers...). The format and layout is under my control, so manual parsing is ok, as I'm well aware of any changes to the structure :- )

    My challenge however is that the data is mixed with data from other providers from all over the world and I encounter encoded characters from the entire Unicode universe. I'm tired of constantly expanding my fixString() function with yet another case, like

    Code:
    html = Replace(html, "Ą", "Ą") ' similar to "awn"
    These come in at least 4 variants for the same characters, as far as I've found:
    Code:
    Words: &quote;
    Hex: Ą
    Decimal:  & # 29 ;     (without the spaces)
    Strange: "À"     (this one is probably because VB6 controls has limited capabilities to print international characters)
    So I repeat: Is there a simpler way to convert all these encoded characters to the printable equivalents?

    At least I would expect to find a comunity built function that would convert all encoded charaters... Haven't found it yet! I'll keep expanding my fixString(:-)...

    I will store the data I need in UTF-8 text files for further processing.
    Last edited by JohnPotier; Jan 27th, 2024 at 01:13 PM.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width