Discovering non-ascii codes in text files

**jonel** · Jun 14th, 2021, 05:22 AM

I am currently generating Word and PDF files based on the contents of a wide range of standard TXT files. The process is working well but some codes in the text files translate to unknown characters in the generated Word file. These characters are are mainly the " and ', but there are others as well.
I really want to be able to replace these unknown codes with their equivalent in Word. I am currently using Regex to replace American spellings but I would be grateful if anybody could point a way to handle these characters at the TXT source so that they produce meaningful characters in the resulting Word file.

Thanks

**Peter Swinkels** · Jun 14th, 2021, 07:43 AM

Welcome to the forum.

And you're converting text files to word files and need help? To Word? What does that mean? *.doc or *.docx files? What method are you using? What is the relevant code?

**jonel** · Jun 14th, 2021, 07:55 AM

Originally Posted by Peter Swinkels

Welcome to the forum.

And you're converting text files to word files and need help? To Word? What does that mean? *.doc or *.docx files? What method are you using? What is the relevant code?

Thank you very a very prompt response.

I am using VB.NET with Office Interop. The main work is in sorting out the range of forms in which the text is provided at source. I have no control of what this might be so I need to act on what I'm given in order to get a satisfactory result. As I said, the result is very good except for those few characters that display in the Word ".docx" file as "?". Some I recognised as being the the apostrophe and the quote so I was able to use Regex to replace these as they are processed from the source. I would like to ensure that all the other characters that are causing me an issue are corrected but I'm not sure that I can. I've got a feeling that they may belong to the Unicode set and I'm about to open a particular file that I know generates the unknown character but I thought this might be a place to ask while I continue my research.

**Peter Swinkels** · Jun 14th, 2021, 07:57 AM

Could you post the code that does the actual conversion?

**Delaney** · Jun 14th, 2021, 08:41 AM

In which language are the texts?
' and " are ASCII (code 27 and 22 respectively) so it is not just an ascii problem. Maybe an UTF coding problem

**.paul.** · Jun 15th, 2021, 07:08 PM

They’re not non-ascii codes. They are unrecognised characters or characters that have no text representation…

Thread: Discovering non-ascii codes in text files

Thread Tools

Display

Discovering non-ascii codes in text files

Re: Discovering non-ascii codes in text files

Re: Discovering non-ascii codes in text files

Re: Discovering non-ascii codes in text files

Re: Discovering non-ascii codes in text files

Re: Discovering non-ascii codes in text files

Posting Permissions