Results 1 to 6 of 6

Thread: Discovering non-ascii codes in text files

  1. #1

    Thread Starter
    New Member
    Join Date
    Jun 2021
    Posts
    2

    Discovering non-ascii codes in text files

    I am currently generating Word and PDF files based on the contents of a wide range of standard TXT files. The process is working well but some codes in the text files translate to unknown characters in the generated Word file. These characters are are mainly the " and ', but there are others as well.
    I really want to be able to replace these unknown codes with their equivalent in Word. I am currently using Regex to replace American spellings but I would be grateful if anybody could point a way to handle these characters at the TXT source so that they produce meaningful characters in the resulting Word file.

    Thanks

  2. #2
    Frenzied Member
    Join Date
    Feb 2003
    Posts
    1,807

    Re: Discovering non-ascii codes in text files

    Welcome to the forum.

    And you're converting text files to word files and need help? To Word? What does that mean? *.doc or *.docx files? What method are you using? What is the relevant code?

  3. #3

    Thread Starter
    New Member
    Join Date
    Jun 2021
    Posts
    2

    Re: Discovering non-ascii codes in text files

    Quote Originally Posted by Peter Swinkels View Post
    Welcome to the forum.

    And you're converting text files to word files and need help? To Word? What does that mean? *.doc or *.docx files? What method are you using? What is the relevant code?
    Thank you very a very prompt response.

    I am using VB.NET with Office Interop. The main work is in sorting out the range of forms in which the text is provided at source. I have no control of what this might be so I need to act on what I'm given in order to get a satisfactory result. As I said, the result is very good except for those few characters that display in the Word ".docx" file as "?". Some I recognised as being the the apostrophe and the quote so I was able to use Regex to replace these as they are processed from the source. I would like to ensure that all the other characters that are causing me an issue are corrected but I'm not sure that I can. I've got a feeling that they may belong to the Unicode set and I'm about to open a particular file that I know generates the unknown character but I thought this might be a place to ask while I continue my research.

  4. #4
    Frenzied Member
    Join Date
    Feb 2003
    Posts
    1,807

    Re: Discovering non-ascii codes in text files

    Could you post the code that does the actual conversion?

  5. #5
    Fanatic Member Delaney's Avatar
    Join Date
    Nov 2019
    Location
    Paris, France
    Posts
    845

    Re: Discovering non-ascii codes in text files

    In which language are the texts?
    ' and " are ASCII (code 27 and 22 respectively) so it is not just an ascii problem. Maybe an UTF coding problem
    Last edited by Delaney; Jun 14th, 2021 at 08:48 AM.
    The best friend of any programmer is a search engine
    "Don't wish it was easier, wish you were better. Don't wish for less problems, wish for more skills. Don't wish for less challenges, wish for more wisdom" (J. Rohn)
    “They did not know it was impossible so they did it” (Mark Twain)

  6. #6
    eXtreme Programmer .paul.'s Avatar
    Join Date
    May 2007
    Location
    Chelmsford UK
    Posts
    25,479

    Re: Discovering non-ascii codes in text files

    They’re not non-ascii codes. They are unrecognised characters or characters that have no text representation…

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width