I am downloading pages from the web using a variety of methods, including this:
No matter which method is used for downloading, quite often the HTML contains characters that are encoded somehow, such as: é and º and a€˜Code:Dim URLString = "http://www.example.com/page.htm" Dim MyWebClient As Net.WebClient = New Net.WebClient() Dim HTML as String = MyWebClient.DownloadString(URLString)
If I save the text to a file, eg:
...some of the issues are dealt with (eg: é becomes é , º becomes º , and a€˜ becomes ‘ )Code:My.Computer.FileSystem.WriteAllText(filePath, HTML, False, System.Text.Encoding.Default)
Is there a way I can do this conversion without saving to a file and reloading? (or hard-coding conversions as I find them!)
I've tried several things with no luck, including this:
Code:Dim encodedBytes As Byte() = System.Text.UTF8Encoding.UTF8.GetBytes(HTML) Dim decodedString As String = System.Text.UTF8Encoding.UTF8.GetString(encodedBytes)
If possible, I'd also like to convert characters with accents etc to their 'simple' character (eg: instead of é and Ø I'd like to get e and O)




Reply With Quote