Differences in Filestream between VBnet and MS access?

**smktec** · May 25th, 2016, 08:26 AM

Hi,
I use a .csv file to inform a forwarding agency for pick up of packes. This worked from ms access without any problems. The code I use is:

Code:

 Const ForReading = 1, ForWriting = 2, ForAppending = 8
 Dim fs As Object, DHLDatei As Object
 Set fs = CreateObject("Scripting.FileSystemObject")

 Set DPDDatei = fs.OpenTextFile(DirectoryandFilename, ForAppending, True, TristateFalse)
 DPDDatei.write "Data"
 DPDDatei.close

in vbnet I use:

Code:

 Dim fs As FileStream = File.Create(path)
 array = System.Text.Encoding.Unicode.GetBytes("Data") 
 fs.Write(array, 0, array.Length)
 fs.close

for any unknown reason the vbnet version runs into errors trying to be read by the external company.

When I load the file as .csv file into Excel and save it afterwards as .csv file it works. The .csv file from vbnet also looks different in the preview of the windows explorer. Any ideas what the reason is and how I can fix it.

**dday9** · May 25th, 2016, 08:40 AM

Whenever I create a CSV file I use the following:

Code:

IO.File.WriteAllText("myfile.csv", String.Join(",", array))

And I've never had any of the issues you're mentioning.

**Inferrd** · May 25th, 2016, 08:52 AM

Based on the arguments you are supplying to the OpenTextFile method in this line:

Code:

Set DPDDatei = fs.OpenTextFile(DirectoryandFilename, ForAppending, True, TristateFalse)

I'd say the equivalent would be:

Code:

 IO.File.AppendAllText(DirectoryandFilename, "Data", System.Text.ASCIIEncoding.ASCII)

**Sitten Spynne** · May 25th, 2016, 10:33 AM

I'm suspicious you need to tell them the VB.NET version is outputting UTF-16, so they can open it and read it likewise. I'd be willing to bet the Access/VBA version is writing it using your ANSI codepage.

You could 'solve' the problem by using Encoding.Ascii instead of Encoding.Unicode, but that'd just be rewarding the other people for using 1980s methodologies, and your life will get very complicated if you ever have to transmit data with Asian scripts.

**smktec** · May 25th, 2016, 11:52 AM

I have tried all available codings and the various methods suggested here.
Importing the files back into Excel or wordpad shows now difference of the contents.
However, if I imported the file from VBnet into word pad and save it without any editing using UTF-8, the resulting output file will not show any import problems at the other company. So, there must be a difference between the file output by VBnet (using UTF-8) and the file saved by wordpad using UTF-8. Whatever the difference is, it can not be seen in wordpad, word or excel.
Any ideas? It would help if I had an output method from vbnet that does the job.

**Sitten Spynne** · May 25th, 2016, 02:42 PM

That is exactly what I said, with a slightly different encoding.

ASCII was the first text encoding, invented by white men in a country that speaks English around the 1970s. It uses one byte per character, but only 7 bits, leaving room for 128 characters. They discovered that Europe existed shortly after inventing ASCII, and put a few accented characters in the mix, but ultimately had to introduce "code pages" that defined how the numbers 128-255 would map to letters in different countries. Then someone discovered that Chinese people could read and use computers, and their language had thousands of ideograms! So ASCII sometimes supports more than one byte per character, and sometimes not, and it all depends on the code page your machine is using, so sometimes the file looks like garbage even when it has the right values.

Smarter people in the 1990s decided "What if we just used many bytes from the start?" They made Unicode. The most common version is UTF-16, it always uses at least two bytes.

But this made white people in European/United States countries angry, because their languages only ever need one byte. And they'd been using ASCII with ANSI code pages for a few decades and didn't want to change it. So they invented a Unicode encoding called UTF-8, that maps the values 0-128 exactly like ASCII, then uses up to 4 bytes per character to represent other scripts.

No smart person has used ASCII since 1990 unless they had a special reason. That's why Visual Basic and friends defaulted to ASCII: they weren't designed for people who want to think. .NET defaults to UTF-16, which it calls Unicode for some reason, and does so because it was intended for smart people to stop using ASCII unless they had a good reason.

Your Access code is exporting either UTF-8 or ASCII, probably ASCII because Access is not a tool for people that want smart solutions. Your initial .NET code is exporting UTF-16. The program that is reading your data wants UTF-8 or ASCII. It's hard to tell. UTF-8 can start with some optional bytes that say "I am UTF-8" called a "Byte Order Mark", or BOM. You should rewrite your .NET code to output UTF-8 (use Encoding.UTF8 instead of Encoding.Unicode) and see if that fixes things. If it doesn't, stop opening text in a text editor. Open a working file and a 'broken' file in a hex editor. Tell me what the first six bytes in each file are. Then, like magic, I'll tell you what's wrong.

**Shaggy Hiker** · May 25th, 2016, 05:39 PM

Your bitterness is showing through. There are lots of reasons for the decisions that were made that have nothing to do with the intelligence, or race, of the actors involved (perhaps the nationality, but not the race). Looking back with the wisdom of dirt cheap, plentiful, low-energy, memory, it's easy to be critical about choices made in the past. Many such stories are used to ridicule people in the past, such as the supposed Bill Gates quote about nobody ever needing more than X amount of memory (which was never said by anybody, especially not Gates, as far as anybody can determine).

The bottom line is that people can't see the future and make decisions based on their appreciation of their situation at the present. ASCII wasn't a bad thing. My backyard is paved in ASCII, and I wouldn't have been able to say nearly as much had I used unicode. ASCII doesn't make anybody stupid, and wasn't created by people who didn't know that either China or Europe existed. So, how about leaving off the biases and just solve the problem. It appears to be encoding, and that's all that needs to be said.

**Inferrd** · May 26th, 2016, 04:19 AM

As Shaggy Hiker says, it's an encoding problem.

What confuses me the most about your post is you say you save the file from WordPad using UTF-8, and that solves the problem. I've looked in the past and never been able to find an option in WordPad to specify UTF-8. Are you sure this is actually what you are doing?

Anyway...
In your call to OpenTextFile you pass TristateFalse as an argument. If you look at the documentation for the OpenTextFile method you'll find it says

The format argument can have any of the following settings:

Constant	Value	Description
TristateUseDefault	2	Opens the file using the system default.
TristateTrue	1	Opens the file as Unicode.
TristateFalse	0	Opens the file as ASCII.

and that's why I chose System.Text.ASCIIEncoding.ASCII as the encoding in my earlier answer.

However, the encoding that is actually being used seems to be the encoding for the System's default non-Unicode code page. So an equivalent to your original code would more likely be:

Code:

IO.File.AppendAllText(DirectoryandFilename, "Data", System.Text.Encoding.Default)

I hope you will forgive me for taking Microsoft documentation at face value.

Note that there is a big problem with relying on the system's default code page: different PC's can and often do use different defaults. You say that the external company can read the file that is generated by your VBA code. That would suggest that they have the same default settings that you are using, but that's more by luck in this case than by design.

Ideally, you should either find out what encoding the external company is using when it reads your file and use that exact same encoding to create the file in the first place, or specify an encoding of your choice and tell the external company they must use your chosen encoding when they read the file. The latter option would likely involve the external company rewriting their software, so may not go down too well with them.

**smktec** · May 26th, 2016, 04:32 AM

The hint with the hex editor helped.
The file requires ANSI code. This is not an option in VB. Then I found out in the doc that ANSI is used if .ecoding.default is used. Now it works.
I then tried to analyze the various versions and this is a kind of strange to me. Maybe somebody can explain it.
The problem came from the German character 'ß' and the Byte order mark:

HTML Code:

Source          encoding     BOM           'ß' (Hex)        works 
VBnet            utf-8           -               c3 9F             no
VBnet            default         -              DF                 yes
VBnet            Ascii            -              3F                  no 

Access                            -              DF                 yes
Windows Editor Ansi           -              DF                 yes
Windows Editor Ascii   EF BB BF           c3 9F             yes

To me this is strange, because ASCII gives different results in the Windows Editor and VBnet.

thanks for everybodies contribution.

**dbasnett** · May 26th, 2016, 06:08 AM

Originally Posted by Sitten Spynne

...ASCII was the first text encoding, invented by white men in a country that speaks English around the 1970s...

It was actually the 60's and was based on the work of a French telegraph engineer from the late 1800's???. When I started in this business computers didn't talk to each other by any other means than punched cards. I guess it would have been possible for someone to send a deck of cards overseas so long as humidity and handling were up to spec.

The IT business is evolutionary with a no technology left behind mentality. Want a really high paying job, learn COBOL.

For you to question the intelligence and motives of those that preceded you is shortsighted.

**Sitten Spynne** · May 26th, 2016, 08:17 AM

Originally Posted by dbasnett

It was actually the 60's and was based on the work of a French telegraph engineer from the late 1800's???. When I started in this business computers didn't talk to each other by any other means than punched cards. I guess it would have been possible for someone to send a deck of cards overseas so long as humidity and handling were up to spec.

The IT business is evolutionary with a no technology left behind mentality. Want a really high paying job, learn COBOL.

For you to question the intelligence and motives of those that preceded you is shortsighted.

It's a snarky journey through the hindsight trail. They're not gods, they're men, and they made mistakes from which we can learn.

And I already have a high-paying job without COBOL, thanks.

Here's your more accurate progression:

ASCII was made because, at the time, getting text sent around at all was a major accomplishment. Supporting Asian scripts would've been impractical.
ANSI was made because the technical limitations were lifted. It was designed to work OK if everyone involved in a transaction spoke the same language, or had protocols to specify which code pages to use.
Unicode was made because we later realized it was stupid to make the assumptions ANSI required.

So dry. So stuffy. So glossy over the Eurocentrism of ASCII/ANSI solutions. It's not like Unicode is without its faults. While real human languages were still waiting to be encoded, the consortium went on to approve the Emoji code points. So some human out there can't use a computer to type in his native language because I can say �� to globalization.

And that emoji ("PILE OF POO"), doesn't even render on this forum. At least 20 years after Unicode's adoption. Because it's using an ANSI codepage that doesn't support emoji. Or, in the PHP that drives this forum, it gets accidentally converted to ASCII/ANSI at some point, and doesn't convert back properly. It's a mistake we could've quit making in the 90s.

smktec:

"VB .NET can't do ANSI" is false. But you have to work harder to use ANSI. "ANSI" is not an encoding by itself. You have to know the code page you are using. Do you know which code page you need to use? If you do, I can demonstrate that VB, indeed, can use dead technologies to transfer text between computers. It's not enough to say "I want this one German character". There might be many code pages that render that character at different code points. This is part of why ASCII/ANSI are considered poor technologies today.

The "different text" between Windows Editors and VB .NET is because your chart is wrong. You list as "ASCII" a file that has the BOM EF BB BF. That BOM signifies UTF-8 encoding. This is why I asked for the first six bytes of the file, that reveals BOMs.

**Shaggy Hiker** · May 26th, 2016, 09:25 AM

Originally Posted by Sitten Spynne

It's a snarky journey through the hindsight trail. They're not gods, they're men, and they made mistakes from which we can learn.

Perhaps a bit TOO snarky. Every person who does what appears best at the time will end up doing something that those that come after them will think was misguided. If we all heap derision from the lofty perch of hindsight, we'll have plenty heaped on us, as well. Unless you like being a fool, a bit of consideration for those who came before is warranted.

**Sitten Spynne** · May 26th, 2016, 09:38 AM

smktec, let's play the game where I guess the code page, if this doesn't work you'll at least see what you need to do. The German character 'ß' is represented in several ANSI code pages. You're going to have to pick one.

One option is Windows-1252, which is generally the default for a US English Windows version. In this codepage, the character has a code point of 223 decimal, or 0xDF hexadecimal.

Another option is Windows-1250, which is more typically used with Central and Eastern European scripts. It turns out the German-specific characters have identical code points in 1250 and 1252, 'ß' is represented with 0xDF.

There's also page 437, which was the original IBM/MS-DOS default character set. It also encodes the character as 0xDF.

This matches with your chart: your claim is that 'default' VB .NET encoding (whatever that is), outputs 0xDF, and so does Access and an editor configured to output ANSI. It's curious you claim VB .NET using 'ASCII' outputs 0x3F. I cannot find a codepage that outputs 0x3F for 'ß', but there's hundreds so I also can't claim I've done an exhaustive search.

Anyway, I'm going to assume Windows-1252 or Windows-1250 are in play. What matters isn't what your machine is using, what matters is what machine the client is using. If that machine/company is in the US, I'm almost positive their machine is set to use Windows-1252. If they are in Europe, they might be configured to use 1250. For 'ß' in particular, it doesn't matter, but there might be other characters that are mismatched between the two. So you might have to try both.

Anyway. If you use Encoding.Ascii without any other qualifications, you get some default system character set that is determined by at least two different settings, and Windows Compatibility code, etc. etc. If you want to use another code page, you have to create an Encoding that uses that code page. You do this with the Shared method Encoding.GetEncoding(), which takes an integer representing the code page:

Code:

Dim windows1252 As Encoding = Encoding.GetEncoding(1252)

From there, you use your encoding as-is. I made a program to demonstrate a few and help show what's going wrong:

Code:

Imports System.Text

Module Module1

    Sub Main()
        Dim theCharacter As Char = "ß"c

        Dim encodings As New List(Of Encoding)()
        encodings.Add(Encoding.GetEncoding(1252))
        encodings.Add(Encoding.GetEncoding(1250))
        encodings.Add(Encoding.ASCII)
        encodings.Add(Encoding.Unicode)
        encodings.Add(Encoding.Default)
        encodings.Add(Encoding.UTF8)

        For Each e In encodings
            PrintExplanationLine(theCharacter, e)
        Next

    End Sub

    Private Sub PrintExplanationLine(ByVal theCharacter As Char, ByVal encoding As Encoding)
        Dim charBytes = encoding.GetBytes(theCharacter)
        Console.Write("{0}/{1}: ", encoding.EncodingName, encoding.CodePage)
        For Each byteValue In charBytes
            Console.Write("0x{0:X2} ", byteValue)
        Next
        Console.WriteLine()
    End Sub

End Module

The output on my machine is:

Code:

Western European (Windows)/1252: 0xDF
Central European (Windows)/1250: 0xDF
US-ASCII/20127: 0x3F
Unicode/1200: 0xDF 0x00
Western European (Windows)/1252: 0xDF
Unicode (UTF-8)/65001: 0xC3 0x9F

It's clear from this that Encoding.Default is not the same as Encoding.ASCII, and uses some bizarre code page I didn't even find. But it uses 0x3F as well!

So figure out which code page you need, create the proper encoding, then use that encoding to get the bytes. You could also configure the FileStream to use that encoding and bypass the requirement to manually create the Byte array, but let's try one step at a time.

If your client were using UTF-8, which was a standard in 1993, or UTF-16, which was a standard in 1996, it wouldn't be quite so tricky. But I've got clients like this too, and they're likely to keep using ASCII/ANSI for the next 30 years too, and I'll have to train my children to clean up the mess.

**smktec** · May 26th, 2016, 12:02 PM

thanks for this elaborated study.
Before I started this I had no idea in what mess I would step into. Obviously this company has run into such problems with some clients as well, because they have sent out an Excel tamplate and suggest their customers to fill in the data, save it as .csv to make it work.
I am glad that I at least know where to look for when I run into such problems next time (hoping that this will never happen

)

**Sitten Spynne** · May 26th, 2016, 12:46 PM

I guess the less offensive way to think about it, since my humor was not well-received, is like this:

If you have the choice, you should use UTF-8 or UTF-16 for transmitting and receiving text. UTF-8 is compatible with a small bit of ASCII, but both encodings might use more than one byte per character. In return for this 'cost', you can support characters from almost every language in the world, including dumb emoji characters you probably have no need to support. When you use UTF-8 or UTF-16, you don't have to worry about what language the computers in your network speak: they support everything equally, because they were designed for a world with global networking.

But many hardware devices, older programming environments, or wayward developers insist on using ASCII with ANSI code pages. When this is the case, it is only easy if every computer on the network is set to the same language. If not all computers are set to the same language:

You have to update all of them to use the same language, OR
You have to make your program understand which language to use with each computer, OR
You have to make "This is the code page I am using" part of your protocol, which means all receivers listen for that and adjust, OR
You have to make "This is the code page I expect" part of your protocol, which means all senders listen for that and adjust.

This is the best compromise that exists, because the designers of ANSI code pages did not adequately predict that computer networks would globalize.

If you don't use code pages at all, the only safe characters to use have ASCII values 0-127, and include the English alphabet, some punctuation, and about 30 non-printable characters like "Carriage Return" or "Bell". This is a compromise you have to make because when ASCII was adopted, it was easier to measure computer memory in dollars than kilobytes and the bulk of computer research was being done by English-speaking United States and European scientists.

So be careful when picking an encoding, and understand that there's really no such thing as "just ASCII" in the real world. When someone says "ASCII", they always mean some specific ANSI code page, and if they speak English it's probably Windows-1252. Other languages have specific code pages you have to look up, and sometimes there are multiple variants from which to choose.

Thread: Differences in Filestream between VBnet and MS access?

Thread Tools

Display

Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Re: Differences in Filestream between VBnet and MS access?

Posting Permissions