Please note that this is not a response I got by communicating with the server myself, so I can't change "Accept-Encoding" in the header, in order to get the data in a different format.
I just put the string in a text file and used the gzip.exe commandline tool, but I get the message that the file is encrypted. Apparently gzip does not support encryption, so I have no clue how this file can be encrypted. I'm using the lastest version 1.2.4.
Code:
gzip: test.txt is encrypted -- get newer version of gzip
Text is a byref parameter, so the result will be placed in the string you pass to the Subs, alternatively you could make them functions.
This code requires zlib.dll.
A good default compression level is 6. Higher is better compression(and slower, up to 9), smaller is worse compression(and faster). This is only needed for encoding a stream.
In order to decompress the stream you need to know its original length, so be sure to include that.
Last edited by FireXtol; May 1st, 2010 at 08:26 PM.
Thank you FireXtol, but I'm not looking for a way to compress and decompress data myself.
The HTTP response is something I got from my browser (not by communicating with the server myself) and my browser can decompress and read the data just fine, because it replies to the server again after receiving it. I'd like to know how my browser decompresses the data.
I don't I'll be able to figure this out. The data starts with 31 (1F) and 139 (8B), followed by 8 (08) which means 'Deflate' compression, like described in the RFC. But that's as far as I get.
If gzip.exe can't decompress it and shows the message "gzip: test.txt is encrypted -- get newer version of gzip", then I have no idea how I'm able to do it. Perhaps I'm using the wrong commandline parameters?
Yes, it's GZip. The HTTP header says "Content-Encoding: gzip".
On this page posted by FireXtol, it says the folowing.
Member header and trailer
ID1 (IDentification 1)
ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213), to identify the file as being in gzip format.
The data starts with 31 (1F) and 139 (8B), so it's GZip compression.
If I remove the first character from the data and run the GZip commandline tool, then it returns:
Thank you, unfortunately I get an "Automation Error - The system cannot find the file specified" message when I click on one of the butons in the example project.
I looked at the article below with the VB6 sample code and near the bottom at the "Running the Sample" chapter it says:
1. Download and install the .NET Framework 2.0 Software Development Kit (if you have Visual Studio .NET 2005 or Visual Basic Express installed, you can skip this step).
But the .NET Framework 2.0 Software Development Kit is 354MB. A little bit too much for me just to test it out. I'm also not sure (if it works) how to install all this. My app is supposed to run other machines as well.
Yes, indeed, the stream is encrypted(flag: 0x20 aka bit 5), and apparently the RFC doesn't have any information about this. Nor did any source code I found(except to notify the user that it's not supported).
It also appears you don't need to convert from UTF-8->UTF-16. But the encoding seems funky.
But it kind of looks corrupted(winrar reports crc failed, file corrupt)... or something.
1f8b is the magic header, 08 the compression, 20 indicates encrypted, then next four 20's would be the datetime, another 20, the XFL(should be a 2 or 4, but it's &H20 or 32), and then 03, the OS(Unix). Apparently the compressed data stream should start now. The last 8 bytes consist of two longs, one for the crc32, and one for the original size, which is 248, but is stored as F8202020=538,976,504.
Good luck.
Last edited by FireXtol; May 2nd, 2010 at 12:42 PM.
In you replace the "2027" hex at position:&h37, according to my hex editor, to "0027", fixing the error caused by null to space conversion. Then the stream decodes just great with Winrar(no errors!): I also replaced all 20's to 00's besides the encryption flag and the other 20's within the compressed stream(so winrar reads the original size correctly). Without the encryption flag set, it does exactly what zlib does, outputs a bunch of spaces.
No clue how to automate this... as it's inconsistent with the data provided.
Fixing the error is not really difficult. "2027" hex is not the same in every data, but there is "20" hex in every data, but not on the same position. The only thing that needs to be done is searching from byte 15 and replace the first occurrence of "20" to "00".
Now the only question is, what does Winrar do to display the data correctly?
Last edited by Chris001; May 2nd, 2010 at 01:28 PM.
Nevermind, it's not encrypted, it just needs to not have the nulls converted to spaces. There's still two spaces in the stream, the middle one needed correcting.
Here's a corrected version of the stream which works no problem in gzip.
Yea, I wouldn't rely on the duuh.gz. But what you said should work, if you your code matches your statement.
The problem seems to be your "minus 10 bytes..." file. It still has the erroneousness "2027" hex. There's only one occurrence of this, it needs to be "0027" in hex. Then the adler32() matches the compressstring strData, 6 adler32. I used winrar's valid output to reencode the stream(lvl 6 compression), to ensure the adler32 matched.
But the 20F6 and 2039 hex bytes need to be spaces. Just the space in the 'middle' of the stream("2027"->"0027") needs to be a null. And, again, I'm not sure how this should be handled, as to me it appears inconsistent. The bytes around the space don't seem to signify any reason for that particular space to be converted to a null. Perhaps a raw Winsock connection would get the data properly.
You'll also need to change:
MsgBox decompressString(strData, 248)
To use the last 4 bytes of the gzip stream(which should have the &H20's also converted to nulls), instead of 248.
Last edited by FireXtol; May 2nd, 2010 at 11:14 PM.
The problem here is that response.txt was at some point handled with a regular text editor and/or copied in text mode to clipboard. This resulted in invalid data in the binary data. There is no point in correcting this manually, it will be better to get a new raw unmodified response data. In future it makes no sense to try guessing how to change &H20 -> &H00 now that we have a corrected the sample so that we can help.
Anyway, here is a more complete project that uses GZIP.DLL to do the decompression. I have corrected the response.txt – also note that I'm filling the string as a direct byte copy via byte array file read, if you get the HTTP response as a string in VB it may be better to change that to byte array to reduce unnecessary data conversion (ANSI <-> Unicode).
WARNING! This attachment contains GZIP.DLL – this means you download and execute the code of that library at your own risk. My own antivirus didn't say anything about it, but it is a random download off the web. I'm only providing it here as a convenience.
FireXtol, sorry I wasted your time with the invalid data in the response.txt file. I wasn't aware that copying the data to a text file would corrupt it
It expects binary data. You're apparently providing a regular VB6 string that holds binary data as characters (= most often 1 byte becomes one character ie. 2 bytes, but varies depending on locale).