Results 1 to 25 of 25

Thread: [RESOLVED] Decompress GZip

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Resolved [RESOLVED] Decompress GZip

    Hi, I have the following response from a server. The data is GZip compressed, but I can't find a way to decompress it.

    I tried this code from pscode.com, but it doesn't work (returns -1). Does it have anything to do with "charset=UTF-8" ?

    http://www.planet-source-code.com/vb...64920&lngWId=1

    Please note that this is not a response I got by communicating with the server myself, so I can't change "Accept-Encoding" in the header, in order to get the data in a different format.


    Code:
    HTTP/1.1 200 OK
    Date: Sat, 01 May 2010 15:46:27 GMT
    Server: Apache/2.2.14 (EL)
    X-Powered-By: PHP/5.2.11
    Set-Cookie: PHPSESSID=47400e3a200b9703a339fbe7c5aeda99; expires=Sat, 08-May-2010 15:46:27 GMT; path=/; domain=.host.com
    Set-Cookie: PHPSESSID=47400e3a200b9703a339fbe7c5aeda99; expires=Sat, 08-May-2010 15:46:27 GMT; path=/; domain=.host.com
    Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    Pragma: no-cache
    Vary: Accept-Encoding
    Content-Encoding: gzip
    Content-Length: 206
    Connection: close
    Content-Type: text/html; charset=UTF-8
    
    ‹      =ŽÁNÃ0DÿeÏQ娦Æ9S$č"îk{ÒDMëjTBUþ· ö8ûÞÎÞh 'u7*(eÌgêÈ:«k¥‚wʰ1¾pñ‰‘Ø{j*,×1âòçhÕ*Õêm[—A9»3‡	‰ºY¬
    	Ê2Í÷®eXªe¬Q©Öë8á3ñ¸výGï™|zÇw
    “cÛ'kµÛ8zÀý#ûúäí…ºç†ÆKÅsíÂæ 9_Q–ã&æ*ëáGãRø
    Attached Files Attached Files
    Last edited by Chris001; May 1st, 2010 at 11:38 AM.

  2. #2
    Next Of Kin baja_yu's Avatar
    Join Date
    Aug 2002
    Location
    /dev/root
    Posts
    5,989

    Re: Decompress GZip

    Isn't GZip (http://www.gzip.org/) a command line tool? Why not get it and simply shell/shellexecute it to do the unpacking for you?

    You can also search the forums for 'ShellAndWait' which will pause operation untill the outside exe has finished processing/unpacking.

  3. #3

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    Thanks.

    I just put the string in a text file and used the gzip.exe commandline tool, but I get the message that the file is encrypted. Apparently gzip does not support encryption, so I have no clue how this file can be encrypted. I'm using the lastest version 1.2.4.

    Code:
    gzip: test.txt is encrypted -- get newer version of gzip

  4. #4
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    Some code for this:

    vb Code:
    1. ' API Declarations for zlib library
    2. Public Declare Function compress2 Lib "zlib.dll" (dest As Any, destLen As Any, src As Any, ByVal srcLen As Long, ByVal level As Long) As Long
    3. Public Declare Function uncompress Lib "zlib.dll" (dest As Any, destLen As Any, src As Any, ByVal srcLen As Long) As Long
    4.  
    5. Public Sub decompressString(ByRef Text As String, ByVal OriginalSize As Long)
    6. Dim cmpSize as Long, strBuff as string
    7.  
    8.         strBuff = Space(OriginalSize + (OriginalSize * 0.01) + 12)
    9.         cmpSize = Len(strBuff)
    10.         uncompress ByVal strBuff, cmpSize, ByVal Text, Len(Text)
    11.         Text = Left$(strBuff, cmpSize)
    12. End Sub
    13.  
    14. Public Sub compressString(ByRef Text As String, ByVal CompressionLevel As Long)
    15. Dim orgSize as long, cmpSize as Long, strBuff as string
    16.          
    17.         orgSize = Len(Text)
    18.         strBuff = Space(orgSize + (orgSize * 0.01) + 12)
    19.         cmpSize = Len(strBuff)
    20.         compress2 ByVal strBuff, cmpSize, ByVal Text, Len(Text), CompressionLevel
    21.         Text = Left$(strBuff, cmpSize)
    22. End Sub

    Text is a byref parameter, so the result will be placed in the string you pass to the Subs, alternatively you could make them functions.

    This code requires zlib.dll.

    A good default compression level is 6. Higher is better compression(and slower, up to 9), smaller is worse compression(and faster). This is only needed for encoding a stream.

    In order to decompress the stream you need to know its original length, so be sure to include that.
    Last edited by FireXtol; May 1st, 2010 at 08:26 PM.

  5. #5

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    Thank you FireXtol, but I'm not looking for a way to compress and decompress data myself.

    The HTTP response is something I got from my browser (not by communicating with the server myself) and my browser can decompress and read the data just fine, because it replies to the server again after receiving it. I'd like to know how my browser decompresses the data.

  6. #6
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    How? It's going to likely convert UTF-8 to a binary or UTF-16 equivalent(something like this would be good for VB http://msdn.microsoft.com/en-us/libr...(v=VS.85).aspx), and then inflate it, as described here: http://www.gzip.org/zlib/rfc-gzip.html#file-format

    More or less.
    Last edited by FireXtol; May 2nd, 2010 at 08:01 AM.

  7. #7

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    Thank you, FireXtol.

    I don't I'll be able to figure this out. The data starts with 31 (1F) and 139 (8B), followed by 8 (08) which means 'Deflate' compression, like described in the RFC. But that's as far as I get.

    If gzip.exe can't decompress it and shows the message "gzip: test.txt is encrypted -- get newer version of gzip", then I have no idea how I'm able to do it. Perhaps I'm using the wrong commandline parameters?

    Code:
    gzip -dc test.txt > new.txt
    pause

  8. #8
    Next Of Kin baja_yu's Avatar
    Join Date
    Aug 2002
    Location
    /dev/root
    Posts
    5,989

    Re: Decompress GZip

    Are you sure the compression is GZip? Maybe it's some other method.

  9. #9

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    Yes, it's GZip. The HTTP header says "Content-Encoding: gzip".

    On this page posted by FireXtol, it says the folowing.

    Member header and trailer

    ID1 (IDentification 1)
    ID2 (IDentification 2)
    These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213), to identify the file as being in gzip format.
    The data starts with 31 (1F) and 139 (8B), so it's GZip compression.

    If I remove the first character from the data and run the GZip commandline tool, then it returns:

    Code:
    gzip: test.txt: not in gzip format

  10. #10
    Hyperactive Member mbutler755's Avatar
    Join Date
    May 2008
    Location
    Peoria, AZ
    Posts
    417

    Re: Decompress GZip

    I always use SharpZipLib for compression/decompression. It works great and can handle many formats. Take a look at it here: http://www.icsharpcode.net/OpenSourc.../Download.aspx
    Regards,

    Matt Butler, MBA, BSIT/SE, MCBP
    Owner, Intense IT, LLC
    Find us on Facebook
    Follow us on Twitter
    Link up on LinkedIn
    mb (at) i2t.us

    CODE BANK SUBMISSIONS: Converting Images to Base64 and Back Again

  11. #11

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    Thank you, unfortunately I get an "Automation Error - The system cannot find the file specified" message when I click on one of the butons in the example project.

    I looked at the article below with the VB6 sample code and near the bottom at the "Running the Sample" chapter it says:

    1. Download and install the .NET Framework 2.0 Software Development Kit (if you have Visual Studio .NET 2005 or Visual Basic Express installed, you can skip this step).
    But the .NET Framework 2.0 Software Development Kit is 354MB. A little bit too much for me just to test it out. I'm also not sure (if it works) how to install all this. My app is supposed to run other machines as well.

    http://msdn.microsoft.com/en-us/libr...06(vs.71).aspx

  12. #12
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    Yes, indeed, the stream is encrypted(flag: 0x20 aka bit 5), and apparently the RFC doesn't have any information about this. Nor did any source code I found(except to notify the user that it's not supported).

    It also appears you don't need to convert from UTF-8->UTF-16. But the encoding seems funky.

    Winrar produces something half-ass usable, perhaps:
    Code:
    {"h"7reb   i0e0033Td:"47400e3a200b9703a339fbe7c5aeda99",i0erviceVer033Td:"201001261",iprefetchEnabled":true},"result   iuSecsd:"343000000",iFileTokeTd:"2gbfRf",i0treamKeyd:"d7a4fd4427634bac9ee7",i0treamServerID":8,"ipd:"0tream27b.grooveshark.com"}}
    But it kind of looks corrupted(winrar reports crc failed, file corrupt)... or something.

    1f8b is the magic header, 08 the compression, 20 indicates encrypted, then next four 20's would be the datetime, another 20, the XFL(should be a 2 or 4, but it's &H20 or 32), and then 03, the OS(Unix). Apparently the compressed data stream should start now. The last 8 bytes consist of two longs, one for the crc32, and one for the original size, which is 248, but is stored as F8202020=538,976,504.

    Good luck.
    Last edited by FireXtol; May 2nd, 2010 at 12:42 PM.

  13. #13
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    In you replace the "2027" hex at position:&h37, according to my hex editor, to "0027", fixing the error caused by null to space conversion. Then the stream decodes just great with Winrar(no errors!): I also replaced all 20's to 00's besides the encryption flag and the other 20's within the compressed stream(so winrar reads the original size correctly). Without the encryption flag set, it does exactly what zlib does, outputs a bunch of spaces.

    No clue how to automate this... as it's inconsistent with the data provided.

    Code:
    {"header":{"session":"47400e3a200b9703a339fbe7c5aeda99","serviceVersion":"201001261","prefetchEnabled":true},"result":{"uSecs":"343000000","FileToken":"2gbfRf","streamKey":"d7a4fd4427634bac9ee7","streamServerID":8,"ip":"stream27b.grooveshark.com"}}
    Last edited by FireXtol; May 2nd, 2010 at 12:52 PM.

  14. #14

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    That's exactly what I need! Thank you!

    Fixing the error is not really difficult. "2027" hex is not the same in every data, but there is "20" hex in every data, but not on the same position. The only thing that needs to be done is searching from byte 15 and replace the first occurrence of "20" to "00".

    Now the only question is, what does Winrar do to display the data correctly?
    Last edited by Chris001; May 2nd, 2010 at 01:28 PM.

  15. #15
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    Nevermind, it's not encrypted, it just needs to not have the nulls converted to spaces. There's still two spaces in the stream, the middle one needed correcting.

    Here's a corrected version of the stream which works no problem in gzip.
    Attached Files Attached Files

  16. #16

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    It works great with the GZip commandline tool, but now I need to find some code to decompress it internally.

    I tried your code above in post #4 and it returns 262 spaces.

    Code:
    Private Sub Command1_Click()
        Dim strData As String
        
        strData = RichTextBox1.Text
        decompressString strData, 248
        RichTextBox1.Text = strData
    End Sub
    The code in this post doesn't work for me either. It returns the exact same data.

  17. #17
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    There's 2 extra characters before the zlib stream(compression info and flags), and 4 after(adler32).

    http://www.ietf.org/rfc/rfc1950.txt

    Or basically: strData = Chr$(&H78) & Chr$(&H9C) & strData & Adler32(strData)

    Where strData was the compressed stream in the gzip stream.

    I don't see customizing the flags having any real effect, so using those two characters should probably work for most any stream.

    The Adler32 was simple:

    vb Code:
    1. Private Function B10To256(ByVal initVal As Double, ByVal Character_Return As Long, Optional ByVal DoFlip As Boolean) As String
    2.       Dim lng As Long
    3.       Dim bA(3) As Byte
    4.  
    5.     If initVal > 2147483647 Then lng = initVal - 4294967296# Else lng = initVal
    6.     CopyMemory bA(0), lng, 4
    7.     If DoFlip Then
    8.       B10To256 = StrReverse(Right$(StrReverse(StrConv(bA, vbUnicode)), Character_Return))
    9.     Else
    10.       B10To256 = Right$(StrReverse(StrConv(bA, vbUnicode)), Character_Return)
    11.     End If
    12.  
    13. End Function
    14.  
    15. Private Function Adler32(ByVal strIn As String) As String
    16. Dim bString() As Byte
    17. Dim A As Long, B As Long, X as Long
    18. bString = StrConv(strIn, vbFromUnicode)
    19. A = 1
    20. For X = 0 To UBound(bString)
    21.     A = (A + bString(X)) Mod 65521
    22.     B = (B + A) Mod 65521
    23. Next X
    24. B = B * 256 + A
    25. Adler32 = B10To256(B, 4)
    26. End Function
    Last edited by FireXtol; May 2nd, 2010 at 03:49 PM.

  18. #18

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    From your "duuh.gz" file I removed the first 10 bytes and the last 8 bytes from the data to get the stream.

    What I get back is the beginning of the currupted data from Winrar that you showed in post #12.
    It also returns -3 (Z_DATA_ERROR).

    Code:
    {"h"7reb

    <project removed>
    Last edited by Chris001; May 3rd, 2010 at 08:34 AM.

  19. #19
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    Yea, I wouldn't rely on the duuh.gz. But what you said should work, if you your code matches your statement.

    The problem seems to be your "minus 10 bytes..." file. It still has the erroneousness "2027" hex. There's only one occurrence of this, it needs to be "0027" in hex. Then the adler32() matches the compressstring strData, 6 adler32. I used winrar's valid output to reencode the stream(lvl 6 compression), to ensure the adler32 matched.

    But the 20F6 and 2039 hex bytes need to be spaces. Just the space in the 'middle' of the stream("2027"->"0027") needs to be a null. And, again, I'm not sure how this should be handled, as to me it appears inconsistent. The bytes around the space don't seem to signify any reason for that particular space to be converted to a null. Perhaps a raw Winsock connection would get the data properly.

    You'll also need to change:
    MsgBox decompressString(strData, 248)

    To use the last 4 bytes of the gzip stream(which should have the &H20's also converted to nulls), instead of 248.
    Last edited by FireXtol; May 2nd, 2010 at 11:14 PM.

  20. #20
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Decompress GZip

    The problem here is that response.txt was at some point handled with a regular text editor and/or copied in text mode to clipboard. This resulted in invalid data in the binary data. There is no point in correcting this manually, it will be better to get a new raw unmodified response data. In future it makes no sense to try guessing how to change &H20 -> &H00 now that we have a corrected the sample so that we can help.


    Anyway, here is a more complete project that uses GZIP.DLL to do the decompression. I have corrected the response.txt – also note that I'm filling the string as a direct byte copy via byte array file read, if you get the HTTP response as a string in VB it may be better to change that to byte array to reduce unnecessary data conversion (ANSI <-> Unicode).

    WARNING! This attachment contains GZIP.DLL – this means you download and execute the code of that library at your own risk. My own antivirus didn't say anything about it, but it is a random download off the web. I'm only providing it here as a convenience.
    Attached Files Attached Files
    Last edited by Merri; May 2nd, 2010 at 11:45 PM.

  21. #21
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Decompress GZip

    Jotti's scanners all report "Nothing found" for the DLL in Merri's zip.

    Beautiful code, Merri.

    I knew something was funky about that data. But I really ought to use GZIP for a project I'm working on, and had fun figuring it out.

    Good luck, Chris.

  22. #22

    Thread Starter
    Frenzied Member
    Join Date
    Nov 2005
    Posts
    1,834

    Re: Decompress GZip

    Perfect, thank you very much Merri

    VirusTotal scanners report it clean too.


    FireXtol, sorry I wasted your time with the invalid data in the response.txt file. I wasn't aware that copying the data to a text file would corrupt it

  23. #23
    Frenzied Member
    Join Date
    Dec 2007
    Posts
    1,072

    Re: [RESOLVED] Decompress GZip

    Merri I'm using your code... After I get the HTML contents (including headers) from winsock, I split the headers and the page body into an array:
    Code:
    BuffSplit = Split(Buffer, vbNewLine & vbNewLine, 2)
    Then I call this:
    Code:
    Debug.Print "BeforeLen: " & Len(BuffSplit(1))
    BuffSplit(1) = GZip(BuffSplit(1))
    Debug.Print "AfterLen: " & Len(BuffSplit(1))
    Here's what I get:
    BeforeLen: 8888
    AfterLen: 0
    I directly copied & pasted the GZip property/function. Please help!

  24. #24
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: [RESOLVED] Decompress GZip

    It expects binary data. You're apparently providing a regular VB6 string that holds binary data as characters (= most often 1 byte becomes one character ie. 2 bytes, but varies depending on locale).

    BuffSplit(1) = GZip(StrConv(BuffSplit(1), vbFromUnicode))

  25. #25
    Frenzied Member
    Join Date
    Dec 2007
    Posts
    1,072

    Re: [RESOLVED] Decompress GZip

    Works like a freaking charm. Merri you are so good at what you do!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width