Page 2 of 3 FirstFirst 123 LastLast
Results 41 to 80 of 93

Thread: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

  1. #41

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Dry Bone View Post
    Regarding the original task which is

    This may come as a stunning surprise, but the the VB's DIY function is about 6.5 times faster then atof in a compiled (and optimized) exe!
    Code:
    Function ArrToDouble(a() As Byte) As Double
    Dim n As Double
    Dim dec As Long
    Dim mul As Long
    Dim i As Long
    Dim s As String
    i = LBound(a)
    Do While a(i)
        Select Case a(i)
        Case 46: If mul Then Exit Do Else mul = 1
        Case Is < 48: Exit Do
        Case Is > 57: Exit Do
        Case Else: n = n * 10 + a(i) - 48: If mul Then mul = mul * 10
        End Select
        i = i + 1
    Loop
    If mul Then ArrToDouble = n / mul Else ArrToDouble = n
    End Function
    Also, this function exits when a non-numeric character is encountered, so no need to replace commas with nulls.
    When I get into my office, I'll benchmark it. That's interesting. I was thinking I might implement some "atoi" and "atol" today, but maybe that's going the wrong direction.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  2. #42
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    UDT padding is a bit like wizardry. But I'll try and outline it.

    The basic rule is: Any particular "item" in it will have a size. So, any/all items must start on a memory address that's a multiple of their size. Also, any full UDT (in VB6) must start on a 4-byte boundary, and must also take a multiple of 4-bytes. So, a full UDT that's 23 bytes will always actually take 24 bytes.

    And that UDT above, if that padding wasn't at the beginning, it would insert (pad) two bytes after "Minute" so that the 4 byte Single for Seconds could start on a multiple-of-four memory address. I thought it was better to pad at the beginning, rather than have that somewhat arbitrary padding in the middle. And also, the whole thing needed to be a multiple-of-four bytes long, so it was getting padded regardless, and I just took control of it.
    So, if I am getting it right, getting the UDT down to 20 bytes would work and could remove the padding, or if it went down to 18 bytes it would still need the 2 byte padding. Best solution there, if I discard the exchange data, is to keep 2 bytes of padding beforehand...or, if I kept it, no padding...but no point keeping it if I won't use it, and it doesn't compress well nor is there any real pattern (or use) to it.

    Quote Originally Posted by Elroy View Post
    Also, I need to re-read The Trick's post to see if I can eek a bit more speed out of this thing, possibly taking control of the SafeArray locking.
    His posts go above my head most of the time (if not all of the time)...I read them (or try to) but I just don't know enough about advanced VB6 coding to understand most of it :-)

    Quote Originally Posted by Elroy View Post
    Also, I was thinking about this thing as I was waking up this morning. If I continue to be motivated (after reading some morning news), I might put some "file read windowing" in so it can handle any sized file.
    I was also thinking about it once I woke up to the code...I don't think there is any real need for file read windowing, given that I already have that covered in my code (which I would be happy to share).

    Part 1 (the date pointer indexer) iterates through the 2.5GB of data in a matter of seconds, and checks to see if each byte of data has a new date in it. Eventually it will find a new date and will store the offset value for where in the file that block of data begins. This list of pointers (which stores both the start and end, though obviously we have the start for the next file so don't really need that...it just looked nicer...and also stores the block size, again not really needed as you can glean it from the start points...and the date for that block). I could probably adapt this to instead create "working blocks" for the loading code to use that definitively gives the pointer data for a complete block of X number of lines (I think 1m lines would be doable, to be honest, but 100k at a time would also work). I'm aware of the HugeTextFile class which does exactly this (allows you to iterate through the file for specific lines, and can skip X lines) but it didn't feel fast enough for me whereas my implementation takes a few seconds to do what I previously needed and could probably be adapted to either number of lines or block sizes.

    Part 2 is integral to the actual CSV processing code, though it doesn't have to be...this part performs the loading of further data and keeps track of where in the file I am so I know when to stop.

    With your code (above) adapted to work slightly differently with passed data, and with a slightly different block loading mechanism put in place, I could probably fairly easily block load the data Xmb at a time then pass it to your code to process into the array, and when it's done just grab the next bit of data and repeat the process. Essentially this allows the app to process the data in smaller stages rather than having to take the whole file at once.

    All I really need to do in order to get this working with your code is to adapt my date pointer indexer to become a block marker (marking out in an array where in the file I should load data from, and how much), which isn't particularly challenging...all your code then needs to do is take that input (which, as it uses a public byte array, is already done) and an array offset (so if the last bit of data went into UDT(100) it would store 100 as an index, then the next time it writes to array it would know to write it to 101)

    Quote Originally Posted by Elroy View Post
    Also, I thought I'd write it all out to a new MDB database (managed by the DAO, as that's what I'm quickest with to get things done, and I also don't have any SQL servers installed on my new box and really don't want to).
    I use local DB files with my project too, don't worry :-)

    But I don't write the naked data to DB...all I currently need is for it to be stored in a public array (or, in your case now, UDT) from which I can pull the data out with CopyMemory and compress the binary to write it as a BLOB into my database.

    Quote Originally Posted by Elroy View Post
    I'll read the above again, but I might also wrap some of wqweto's ZIP work into it if I can understand your situation.

    If possible, maybe just attach one of these ASCII ZIPped files to a post here. Or IDK, maybe share it on Google Drive and send me an IM with the link. This thing should be fairly easy to just fully automate, and just let it crunch on files all day long, and, in my conceptualization, convert each ZIP file into an MDB file. (People will "ding" me for using MDB files, but hey ho. If I process each of your ZIP files separately, the MDB files get nowhere near their 2GB limit, being binary.)
    I don't see a need for putting in an extra step to the process, not when I can take the next steps that I need from the moment the data is completed for each day. I'll happily share the resulting code (or at least a cut-down version of it that gives an example of what the app does) which will load the CSV then parse out 5 files per day into the DB with the names explaining what they are. The files will be LZMA compressed (thanks to RC6) before being added, so the resulting DB is considerably smaller than the original ZIPs (especially considering the ZIPs were ASCII data with plenty of bloat). If you got the app converting into an MDB I would then have to pull from the MDB to do the previously described process when I could just pull from the UDT directly.

    I ALSO have some suggestions by Schimidt to check out, though it looks like they might not pan out as well unless they can handle multi-gigabyte files (his CSV parser loads from file)...I will have to put all these code ideas into a DLL and set the whole thing up to run that way so I can test them compiled and see which works best :-)

  3. #43

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    So, if I am getting it right, getting the UDT down to 20 bytes would work and could remove the padding, or if it went down to 18 bytes it would still need the 2 byte padding. Best solution there, if I discard the exchange data, is to keep 2 bytes of padding beforehand...or, if I kept it, no padding...but no point keeping it if I won't use it, and it doesn't compress well nor is there any real pattern (or use) to it.
    Yeah, you've got the gist of it. If you "take control" of the padding (as I did), you'll at least know what it is (two spaces in my case). If you don't take control, it's probably zeroes, but I'm not sure about that one. It may just be memory garbage.

    -----------

    I'll re-read the rest of your post a bit later.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  4. #44
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Schmidt View Post
    If you look at the code I'Ve posted - it should be clear how to use it...:
    0) place the CsvCallback-Function in your Form- or Class-module
    1) create an instance: Set CSV = New_c.CSV
    2) call the parsing-method: CSV.ParseFile <SomeCSVFilePath>
    3) place appropriate Code inside the Callback: Select Case ColNr ... to handle each Cell-Value
    Yes...hence, "as usual HAD no idea" :-P




    Quote Originally Posted by Schmidt View Post
    Yes to both...

    It is the fastest thing out there, to handle your use-case -
    and it needs the least amount of user-code, to handle it...

    E.g. you don't have to "manually handle smaller ByteArray-chunks" -
    the callback already gives you a proper ByteArray, along with Offset and Length for each parsed "Column-Cell".
    Where were you when I needed you...before writing all this code! Before Elroy did all his code too (wasted time if your CSV parser does it quicker...though of course we all learned new things along the way, so not a waste) :-P

    As I mentioned above to Elroy, I will have to put these options (along with my own originals) into a DLL and compare the compiled speeds in IDE to see how fast each method performs the required task.

  5. #45

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    As I mentioned above to Elroy, I will have to put these options (along with my own originals) into a DLL and compare the compiled speeds in IDE to see how fast each method performs the required task.
    Hey, it'll be interesting to see some speed comparisons between native VB6 code and RC#. I'm not an RC# fan and have never used it, but if you wind up going that direction, you're not hurting my feelings. I'm just having fun with this stuff.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  6. #46
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Schmidt View Post
    If you want to speed this up by about factor 8 (~1sec, when native-compiled, all options checked) -
    you can use the superfast CSV-Parser-Class of the RC6 (which requires much less code as well)...
    Just to confirm...with a little bit of tweaking (integer isn't large enough to store volume data in some cases, and I divide the result by 100 as it is always in multiples of 100...plus it didn't work out of the box and I had to fiddle with a few things) it managed to process the entire 2.5GB file in 70s (70,661.02ms) with a total of 53,577,397 rows processed...that's 1m10s compared to 3m47s (and compared to 11m+ for my own code :-) )...with my app compiled, 41s (41,153.91ms)...though, as I always say, this is purely just a test of the processing part of the code...the data isn't processed further (compressed, saved etc).

    Also, let me clearly point out something about MY 11m+...that didn't include the parsing of the data out into the array...so this CSV Parser does the entire job.

    There are still things I need to include, like recognition that the day has changed, which would trigger the arrays being processed and the pointers being reset so they can be used again...not exactly rocket science though, that bit :-)

  7. #47
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Hey, it'll be interesting to see some speed comparisons between native VB6 code and RC#. I'm not an RC# fan and have never used it, but if you wind up going that direction, you're not hurting my feelings. I'm just having fun with this stuff.
    Schmidt definitely beat you on the speed front...1m10s and 41s when my app is compiled (compared to 3m47s for yours)...understandable though, it was a bit of code you cobbled together in a little bit of time while he's been working on that (or, more specifically, the entire project) for ages :-)

    RC is nice for certain things, though I use it far less than I would like to...I've mentioned before to Schmidt that he REALLY needs to document the vast array of functions it provides so that people have an easy reference to look at when using it. I'm not the only one who thinks it, and the entire RC6 project revitalises VB6 to give it a whole new lease of life for programmers who are looking for features without having to mess around coding them. As an example, the LZMA compression is essentially 2 lines of code (one assigning the .Crypt, one doing the actual compression) and decompression is just as easy. If you want to (for instance) implement LZMA compression into your app and don't want to mess around with your own implementation of it, it's easy to reference the DLL and just use it with two lines of code (or 1 if you assign the .crypt elsewhere globally because you plan to use compression often)

  8. #48

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by The trick View Post
    When you pass an array item to an API (VarPtr) it locks array (SafeArrayLock), calls the API (VarPtr), unlocks array (SafeArrayUnlock). If you need to increase performance you could use a pointer variable and pass it. You could get the pointer to the first element of array and the use index to add this pointer which is faster:
    Ahhh, I just took the time to think this through (between watching F1 FP1).

    So, since it's a byte array, I can just get-and-save VarPtr(bbFile(0)), and then just add offsets to that saved value. Makes perfect sense. I'll make that change, and re-post what I've done so far.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  9. #49

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Ok, saving the VarPtr(bbFile(0)) helped a bit:

    Compiled, fully optimized, using that Big_File.txt I posted with the above ZIP:
    Before saving VarPtr.
    Time Taken: 1.160156 seconds.

    After saving VarPtr.
    Time Taken: 1.066406 seconds.
    I'm going to speed test Dry Bone's approach versus an "atof" call before I post more code though. I'd like to either use all "roll-your-own" or all calls to the "msvcrt" to get this done, rather than a hodge-podge like I have now.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  10. #50

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Ok, I'm convinced. VB6 is faster than "atof". Here's my test code (with results compiled and fully optimized). I did more optimizing of Dry Bone's ideas as well.

    Code:
    
    Option Explicit
    
    Private Declare Function atof CDecl Lib "msvcrt" (ByVal psz As Long) As Double
    
    Private Sub Form_Load()
    
        ' For testing, create a small byte array
        ' with an AsciiZ float in the middle of it.
        Dim s As String
        s = "asdf" & vbNullChar & "123.45" & vbNullChar & "qwer" & vbNullChar
        Dim bb() As Byte
        bb = StrConv(s, vbFromUnicode) ' We've now got three AsciiZ strings in our byte array.
        ' Offset 5 points at the 123.45 AsciiZ number.
        '
        ' Save our VarPtr(bb(0))
        Dim iVarPtr As Long
        iVarPtr = VarPtr(bb(0))
    
    
        Dim d As Double
        Dim nStart As Single
        Dim nStop As Single
        Dim i As Long
    
    
        nStart = Timer
        For i = 1& To 20000000
            AsciiBytesToDouble bb, 5&, d
        Next
        nStop = Timer
        Me.Text1.Text = CStr(nStop - nStart)
    
    
        nStart = Timer
        For i = 1& To 20000000
            d = atof(iVarPtr + 5&)
        Next
        nStop = Timer
        Me.Text2.Text = CStr(nStop - nStart)
    
    
    
    
    
    End Sub
    
    Private Sub AsciiBytesToDouble(bb() As Byte, ByVal pStart As Long, dOut As Double)
        Dim mantissa As Double
        Dim fraction As Long
        '
        dOut = 0#
        Do While bb(pStart)
            Select Case bb(pStart)
            Case 46:    fraction = 1&
            Case Else:  mantissa = mantissa * 10# + CDbl(bb(pStart) - 48): If fraction Then fraction = fraction * 10&
            End Select
            pStart = pStart + 1
        Loop
        If fraction Then dOut = mantissa / CDbl(fraction) Else dOut = mantissa
    End Sub
    
    
    And here are the results:

    20000000 calls to AsciiBytesToDouble:
    0.3320313 seconds.

    20000000 calls to atof (using saved VarPtr):
    2.25 seconds.
    So ~7X speedup over atof.

    I'll work this into my Big_File code, and re-test it.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  11. #51
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    (snipped)

    So ~7X speedup over atof.

    I'll work this into my Big_File code, and re-test it.
    7x speedup, eh? 3m47s magically becomes 32.4s, narrowly beating Schmidt's 41s WHEN COMPILED and 70s in the IDE...of course, not holding you to that estimation (it's closer to 6.77x, but who's being picky :-) )

  12. #52

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Ok, using the Big_Test.txt file, I'm now down to: (Compiled & fully optimized.)

    Time Taken: 0.4648438 seconds.
    That's compared to the "1.066406 seconds" from above post #49.

    I'm guessing that we're getting down to where most of our time is file I/O and not the byte array massaging.

    --------------------

    I'm now going to do some tweaking on the UDT, as per SmUX2k's specifications, and also just write it all directly into an MDB file. I could probably do away with the UDT, as it'll be directly written, but I think I'll leave that step anyway.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  13. #53
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    I'm guessing that we're getting down to where most of our time is file I/O and not the byte array massaging.
    https://www.maketecheasier.com/setup-ram-disk-windows/ ...it will direct you to https://sourceforge.net/projects/imdisk-toolkit/ where you can get the app to create the ram drive, and even a 1GB ram drive should be enough to hugely improve IO performance. I have mine set to Z: so it's obvious it's the ram drive and I can quickly and easily use it. ONLY issue I may have in future is that it's only a 4GB capacity and there's many files that are larger than that, but I'll deal with that issue when the time comes :-)

    For the record, any of my tests that are with real data will be with the ram drive...I didn't think about it, but I *should* have been fair and put your test file in there too, IIRC the benchmark I did with yours was from HD :-)

  14. #54

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    https://www.maketecheasier.com/setup-ram-disk-windows/ ...it will direct you to https://sourceforge.net/projects/imdisk-toolkit/ where you can get the app to create the ram drive, and even a 1GB ram drive should be enough to hugely improve IO performance. I have mine set to Z: so it's obvious it's the ram drive and I can quickly and easily use it. ONLY issue I may have in future is that it's only a 4GB capacity and there's many files that are larger than that, but I'll deal with that issue when the time comes :-)

    For the record, any of my tests that are with real data will be with the ram drive...I didn't think about it, but I *should* have been fair and put your test file in there too, IIRC the benchmark I did with yours was from HD :-)
    I'm already using pretty fast SSD (with NVMe interface), so I'm not sure a ramdrive would help much. With a ramdrive, it's still got to be copied into it at some point. But it's possibly a good idea at your side.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  15. #55

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Here, I attached the latest of mine, before I started making changes related to "windowing the data file" and "saving to a database".

    Basically, it's got the new AsciiBytesToSingle procedure implemented, and the VBP has all the optimization flags set for compiling.

    I thought you might like it for another comparison. I'm not sure how anything can get much faster.

    ---------------

    Ohhh, I left that "atof" call in there, but it could be taken out, as it's no longer used.
    Attached Files Attached Files
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  16. #56
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    I'm already using pretty fast SSD (with NVMe interface), so I'm not sure a ramdrive would help much. With a ramdrive, it's still got to be copied into it at some point. But it's possibly a good idea at your side.
    I actually did it with the RAM drive, and it weirdly didn't make much difference with such a small file (to be fair, it may be cached somewhere). 8.641s compared to 8.766s from an EXTERNAL USB 4TB 2.5" drive (my Stocks drive, where everything goes until I am able to afford a nice 4TB NVMe for my mini-PC). With larger files I have definitely seen some difference in the speeds. It will also help when writing the output from a ZIP file as writing is always slower to HD than reading whereas RAM doesn't have that issue.

  17. #57

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Make sure you've got an NVMe socket on your motherboard before you spend the money. You may need to stick with a SATA SSD drive.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  18. #58
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Make sure you've got an NVMe socket on your motherboard before you spend the money. You may need to stick with a SATA SSD drive.
    Correction: "M.2 2280 PCle 3.0 SSD" ...https://store.minisforum.uk/products/hx90 is my PC.

    Though, to be fair, if I had the money for NVMe I would probably be going all in and building a new powerhouse PC to work from :-)

  19. #59
    PowerPoster
    Join Date
    Feb 2015
    Posts
    2,797

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    So ~7X speedup over atof.
    atof (like Val/CDbl) universal function. It supports whole double range:
    Code:
    Private Sub Form_Load()
        Dim b() As Byte
        
        b = StrConv("0.00000000000001" & vbNullChar, vbFromUnicode)
        
        Debug.Print atof(VarPtr(b(0)))
        'Debug.Print ArrToDouble(b)  ' // Overflow
    
        b = StrConv("-99.21" & vbNullChar, vbFromUnicode)
        
        Debug.Print atof(VarPtr(b(0)))
        Debug.Print ArrToDouble(b)  ' // Wrong
        
        b = StrConv("1E10" & vbNullChar, vbFromUnicode)
        
        Debug.Print atof(VarPtr(b(0)))
        Debug.Print ArrToDouble(b)  ' // Wrong
        
        b = StrConv("3.45E-150" & vbNullChar, vbFromUnicode)
        
        Debug.Print atof(VarPtr(b(0)))
        Debug.Print ArrToDouble(b)  ' // Wrong
    
    End Sub
    If your file contains the small (or big) numbers (saved by VB6 code) it'll looks like:

    8.71445834636688E-06;5.62368631362915E-17;9.49556648731232E-08

  20. #60
    Hyperactive Member
    Join Date
    Jul 2021
    Posts
    267

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    This is why universal functions are not always good.
    For each use case, there might be a better solution dedicated to the specific requirements.
    It happens to me all the time.
    No matter how hard I try to develop a "clever does-it-all function", when it comes to a real life situation - it wouldn't fit!

  21. #61

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    I've about got a prototype up and running.

    • Extract ZIP file.
    • Process 12 TXT files.
    • Create 12 MDB files with records and fields.

    Time for lunch though.

    From alpha-testing, it looks like it's going to process a whole ZIP file (12 TXT files) in about a minute.

    I haven't started on the "file windowing" yet though, as that's going to take a little thought. And some of your files do blow up VB6 memory. But there are solutions.

    I'll be back.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  22. #62
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    I haven't started on the "file windowing" yet though, as that's going to take a little thought. And some of your files do blow up VB6 memory. But there are solutions.
    https://www.vbforums.com/showthread....File-I-O-Class is what I used...took *4* seconds to fully load in a file (1MB at a time) and calculate exactly where each day ends and the next day begins...and that's with using StrConv on the byte array output! This is the output I get from my code (in immediate window):
    Code:
    21:29:24 - Started
    21:29:24       1             101884744    2021-03-01     0             101884743     101884744 
    21:29:24       2             122164934    2021-03-02     101884744     224049677     122164934 
    21:29:25       3             145589353    2021-03-03     224049678     369639030     145589353 
    21:29:25       4             219832632    2021-03-04     369639031     589471662     219832632 
    21:29:25       5             237883695    2021-03-05     589471663     827355357     237883695 
    21:29:26       6             194742596    2021-03-08     827355358     1022097953    194742596 
    21:29:26       7             121977061    2021-03-09     1022097954    1144075014    121977061 
    21:29:26       8             160873534    2021-03-10     1144075015    1304948548    160873534 
    21:29:26       9             78840585     2021-03-11     1304948549    1383789133    78840585 
    21:29:26       10            93401787     2021-03-12     1383789134    1477190920    93401787 
    21:29:26       11            77502242     2021-03-15     1477190921    1554693162    77502242 
    21:29:26       12            103138946    2021-03-16     1554693163    1657832108    103138946 
    21:29:27       13            117197095    2021-03-17     1657832109    1775029203    117197095 
    21:29:27       14            112431124    2021-03-18     1775029204    1887460327    112431124 
    21:29:27       15            106238659    2021-03-19     1887460328    1993698986    106238659 
    21:29:27       16            73350569     2021-03-22     1993698987    2067049555    73350569 
    21:29:27       17            99863779     2021-03-23     2067049556    2166913334    99863779 
    21:29:27       18            102351865    2021-03-24     2166913335    2269265199    102351865 
    21:29:27       19            133777368    2021-03-25     2269265200    2403042567    133777368 
    21:29:28       20            103652747    2021-03-26     2403042568    2506695314    103652747 
    21:29:28       21            112607831    2021-03-29     2506695315    2619303145    112607831 
    21:29:28       22            86706024     2021-03-30     2619303146    2706009169    86706024 
    21:29:28       23            75862958     2021-03-31     2706009170    2780609680    74600511 
    21:29:28 - Finished
    It's not particularly smart code, but it is loading 2.5GB of data into memory (from RAM drive, as I think you know by now) in fairly small blocks at ~600MB/s...with some smart management, I'm sure you could make good use of HugeBinaryFile for this :-)

    Edit: For reference, the last 3 columns are Start pointer, End pointer, Size (and I know there's a weird dupe but check the last line and you'll see size doesn't tally...there's a reason, last number is right as it's calculated from start and end pointer...not important, this isn't about my dodgy code, it's about the speed of the HBF)
    Last edited by SmUX2k; Mar 7th, 2024 at 04:43 PM.

  23. #63

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    I've got a file with these lines in it:

    2021-01-06 09:37:56.491,51.49,100,13,51.5,50000,2
    2021-01-06 09:37:56.491,51.49,100,13,51.5,50100,2
    Apparently, someone wanted a LOT of shares. So, a VB6 Integer isn't going to work for the "AskVolume".

    I'm just going to change both the volumes to Longs.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  24. #64
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    I've got a file with these lines in it:



    Apparently, someone wanted a LOT of shares. So, a VB6 Integer isn't going to work for the "AskVolume".

    I'm just going to change both the volumes to Longs.
    I dealt with this for Schmidt also...divide by 100...they're generally reported in batches of 100, so you should be able to handle up to 3,276,700 while storing the value in the Ask Volume...I have never seen that many yet (just in case, consider capping the value at that), I don't think people buy or sell that many at once...it's just that people need to be aware that the value is multiples of 100 if using that data :-)

    Storing as a long would be an alternative, though not necessarily worth it

  25. #65

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Ok, here's a prototype (attached).

    I picked on the "ko_quote_year_2021_1icfge.zip" file because none of the TXT files were too large to fit into memory. That's a problem I'm still working on.

    Here's a screen capture of the latest version of the program, after it processed that file:

    Name:  Program.png
Views: 12641
Size:  15.8 KB

    Now, by far, the biggest time-consumer is writing out those MDB files. The rest (unzip, massage byte array) is all quite fast.

    Here's a clip of the folder after it laid down all the MDB files:

    Name:  Files.jpg
Views: 12577
Size:  51.5 KB

    The red rectangle is around the ones the program made. The green rectangle is around the one that was processed to produce those.

    It's important to note that none of these MDB files are anywhere near the 2GB limit of MS Access MDB files, and I didn't even do any work on the UDT.

    And here's just a look at the first of those MDB files:

    Name:  Trades.jpg
Views: 12673
Size:  34.6 KB

    Almost 5 million records in that thing, and that's just one of the 12 files (i.e., months).

    ------------------

    I'll see about handling bigger files. I'm tempted to just use some far-memory stuff and still read the whole TXT files into memory. You said you had 32GB so that should be plenty to process anything you get.

    ------------------

    But wait, the latest episode of Halo is out. I think it might be time to watch that.
    Attached Files Attached Files
    Last edited by Elroy; Mar 7th, 2024 at 06:43 PM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  26. #66
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    "Time Taken: -86281.96 seconds."...that's quite a speed increase...almost a day faster than your best!...could be because I started it just before midnight (here) and it finished at 00:01 :-)

    Did it again, 118s for MDBs of 294MB

    You mention that the MDB is the slowest part...not a major worry for me as I will jump in at around that point and re-route the output into a byte array using CopyMemory and compress it into a BLOB to be added to my database system. Everything else seems to do things fast and the way I needed it done, definitely faster than I was achieving.

    As for the bigger files, that is definitely a requirement...my first test of the app balked at the data I sent it, turns out one of the files was 700MB...but HugeBinaryFile definitely does the job. As I mentioned above, I used it to first grab day-by-day pointers and it was then able to load an entire day's worth of data into memory 1 MB at a time...in theory it could actually have loaded an ENTIRE day (you'll see from the output above that the largest "day" was ~237MB).

    I might be able to handle some of the other bits and pieces I need for optimal usefulness. The data would be output on a day by day basis, so would generally ignore the DMY data and only check to see if the day has changed or not, and I am ignoring the Exchange (bid and ask) value. I won't do anything major until you're done...or if I do, I'll make notes throughout of where I've made edits so things don't clash if there's an update. I'll look more into it tomorrow, as it's late right now (if you hadn't guessed by the 00:01)

    I've had a quick look at the code, and I'm pretty sure I could incorporate the HugeBinaryFile code into the cmdDoIt and have it iterate through the file day by day and perform the output when complete...where it is now is actually pretty useful once I have tweaked it a little...I can happily say that I can follow the logic of the code easily for the most part :-)

  27. #67

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    WOW, ok, I've got everything up and running (between watching F1 qualifying). Ran into another problem though. Trying to process that aapl_quote_year_2021_g2yon9.zip file and it actually generates an MDB file that's >2GB (which is the limit).

    As a quick-fix, I'm going to cut down on the UDT, as you've been encouraging me to do. That'll get it under the 2GB MDB limit (but not necessarily by much). This is definitely some "big data".

    I'll attach what I've got so far. It'll read a TXT file of any size, and write "most" of them to MDB just fine.

    This thing also ties up the processor thread badly (almost as if it's hung). Not sure what the best approach is to solve that one, or if it even needs solving.
    Attached Files Attached Files
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  28. #68
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    WOW, ok, I've got everything up and running (between watching F1 qualifying). Ran into another problem though. Trying to process that aapl_quote_year_2021_g2yon9.zip file and it actually generates an MDB file that's >2GB (which is the limit).
    Simple answer, and what I have already done...store the file by day rather than month...so the first file it outputs should be 2021-01-04.mdb, which is 53,560k. Give me 5-10 minutes and I will post an update on MY progress with adapting the code (though obviously I will have to now move the adaptations I need to over to the new code and make sure they still work :-) )...it still has 1 month of data to get through for the entire year (which has taken a little over an hour) and I have completed the test :-P

    And on the hang, I generally put a DoEvents here and there to give the system back some time to catch up...this is one of the big problems of trying to process massive amounts of data constantly for long periods of time!

    Edit: Oh, and if you do want to do by day rather than month, wait until I post my update as my code already has the bits in place to do this...should be easy enough to work with, or you can use an adapted version of the code I sent earlier to you and use HBF to pull the specific day into memory...it seems to work fine for me, the largest day in that AAPL file is under 200MB.
    Last edited by SmUX2k; Mar 8th, 2024 at 01:49 PM.

  29. #69
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Update:

    I have made some slight modifications to the code, turning the Sub into a Function (which is called naked by the original Sub). This allows me to now pass a filename (ZIP or TXT) to the function and it'll skip the relevant parts (file dialog if ZIP, file dialog and zip decompress if TXT) if I decide to set up a queue automation system for processing entire folders worth of ZIPs, which I already have done in my own code. It also allows for the ability to customise or optimise the method for decompression...currently the app is set up to extract everything in the zip before continuing, but people might prefer to extract on demand and delete afterwards (I also added an option to flag whether it deletes temporary files when it's finished with them, the default being true but if you specify a TXT file it logically sets that to false...which can be overridden).

    It has Elroy's updated HBF included (I've essentially commented out the original code in the file loading, and incorporate the HBF loading code there, rather than remove it) and is set up to index the TXT file (work out where the day splits are in the file) and load a SPECIFIC day's data into a byte array rather than the whole file. This isn't going to be 100% reliable, as I know sometimes individual days will screw with this simple system, and I should really block-load the data but I don't yet know enough about the overall way things are set up in the code to allow me to do partial blocks....but it works. I also added in protection against trailing CR/LF/spaces in the data so the CSV parser gets only the data it needs. Also, I suspect now it is set up to work with 1 day's worth of data at a time, it *should* be okay...it's rare to see 1 day's worth of data being more than 100MB (though I am testing it with AAPL, and they're going as high as 200MB but usually around the 100MB mark or lower)

    I also have option buttons to allow the option to output to MDB (it outputs to a file based on the date, so 1 file per day, rather than writing everything for the month into 1 MDB), a SQL Blob (not implemented...the button is just there) or file(s) (again, not there...it would be 1 file per column per day, so 7 files for each day's output when it is done...is there REALLY any reason to output it to a CSV again? If I was just splitting the CSV into days I could have done that ages ago!).

    I'm currently stress testing it on a year's worth of data and 81 days has taken 35 minutes as I originally wrote this. 183 days took 1 hour. For the record, I am taking this timing from the write time of the MDB files. First file was done at 17:32 and last file was 18:54 so 252 days in 82 minutes! About 19.5s per day on average. The app itself reports "5042.465s" in total which suggests 20s per day.

    I need to do a little more work on the day splitter (I sent Elroy a copy of the code, though after sending it I realised there's a few things still missing) to make it a bit more robust and useful.

    Just some quality of life improvements that I would have implemented...Elroy, feel free to take these ideas and redo them your way if you want to :-)

    Obviously this was done using the original Big_File (from yesterday) so expect an updated version in a little while. Edit: A little while as in tomorrow, maybe...other things to do :-P
    Attached Files Attached Files
    Last edited by SmUX2k; Mar 8th, 2024 at 02:27 PM.

  30. #70

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Yeah, I was thinking about prompting for a "Folder" instead of a "ZIP file". Then, all the ZIP files in that folder could just be processed. An easy way to do that would be to prompt for the folder, and then load all the ZIP files into a ListBox. Then, just build a super-loop which processes them all.

    But, I successfully processed the AAPL file. I think I might call my part of this thing done. With all we've explored, I don't think there are any additional huge gains in speed to be found ... maybe some small tweaks, but nothing that's going to be a 2X or anything like gains we've already achieved. Still 20 minutes for that huge AAPL file though.

    Here's a screen-grab of processing the AAPL file:
    Name:  ProgramAAPL.png
Views: 12657
Size:  16.6 KB

    To keep the MDB files under 2GB, I removed my AutoID field (which wasn't necessary), removed "Year" and "Month". And also made "Seconds" into a byte, only keeping the integer portion. March was still very large:
    Name:  FilesAAPL.jpg
Views: 12535
Size:  35.5 KB

    And here's a peek at March with a DB Viewer:
    Name:  TradesAAPL.jpg
Views: 12714
Size:  20.4 KB

    This latest version is attached.

    Unless someone finds a bug, I think this one is done for me. Have fun with it.
    Attached Files Attached Files
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  31. #71
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    To keep the MDB files under 2GB, I removed my AutoID field (which wasn't necessary), removed "Year" and "Month". And also made "Seconds" into a byte, only keeping the integer portion.
    That is probably a big no-no...the seconds NEEDS milliseconds in the output (sometimes there are 100s of price changes in a second, often there's more than 10)...this is why I convert HMS to a Long, as there's only 86.4m milliseconds in a 24h period...this is how I do it: "(((60& * m) + (60& * 60 * h)) * 1000) + s * 1000"...there's probably tidier ways to do it, but this was a quick algorithm :-)

    In a day of data that has JUST 1m lines in it, and only 30,600 seconds of data in an 8.5h period, you will expect an average of 33 entries per second...without milliseconds, there's no way to tell which came first unless you just trust they're in order

    Edit: And there seems to be a discrepancy of some sort...my MDB output files aren't the same size as yours, though they're working with exactly the same data. I can't post full details as it hasn't processed them all, but:

    AAPL_2021_01.mdb - 1,021,504k
    AAPL_2021_02.mdb - 1,153,848k

    And it's a bit slower on my PC, it seems...will take a while to process all of them to get a fuller picture (I would post a screenshot, but of course don't have them all done)

    Edit: And it crashed on the 3rd month...guessing you've posted an old version (or I am somehow running one...redownloading and re-testing) :-P

    Edit 2: I guess you re-uploaded it inbetween and I got an old version...idiot me didn't notice you had posted two versions today, I don't know why I am so tired...it shouldn't be able to even look at an AAPL file so it has to have been one you added HBF to. Or perhaps I was an idiot and was running my edited version (though pretty sure I would have known, I display the date rather than the month. It's fine now!
    Last edited by SmUX2k; Mar 8th, 2024 at 03:57 PM.

  32. #72

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    That is probably a big no-no...the seconds NEEDS milliseconds in the output (sometimes there are 100s of price changes in a second, often there's more than 10)...this is why I convert HMS to a Long, as there's only 86.4m milliseconds in a 24h period...this is how I do it: "(((60& * m) + (60& * 60 * h)) * 1000) + s * 1000"...there's probably tidier ways to do it, but this was a quick algorithm :-)

    In a day of data that has JUST 1m lines in it, and only 30,600 seconds of data in an 8.5h period, you will expect an average of 33 entries per second...without milliseconds, there's no way to tell which came first unless you just trust they're in order
    Ahhh, makes sense. I'm thinking about lunch though. I suspect you can work that one out.

    Also, if I really want to look at this more, I should probably install a SQL server on my machine. That would get rid of that 2GB limit. I've got MariaDB on my NAS box, but that would require sending the data through the router and switches over on the other table in my office. And I'm not sure what the speeds are of my switches.

    I've almost got another big update ready for my VB6_Frm_To_Py program, and I'm going to work on that now. You take care, and I'll be around.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  33. #73
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Ahhh, makes sense. I'm thinking about lunch though. I suspect you can work that one out.

    Also, if I really want to look at this more, I should probably install a SQL server on my machine. That would get rid of that 2GB limit. I've got MariaDB on my NAS box, but that would require sending the data through the router and switches over on the other table in my office. And I'm not sure what the speeds are of my switches.

    I've almost got another big update ready for my VB6_Frm_To_Py program, and I'm going to work on that now. You take care, and I'll be around.
    Thanks for all of this, it's given me stuff to think about...different ways to process things.

    I've mentioned Schmidt's RC6 before, it comes with an SQLite DB system which allows you to create local SQL DBs of any size...I am sure there are other SQLite implementations out there, or similar, and (if you're a masochist) HBF has an output mode that WRITES data as well, and, theoretically it would be possible to create your own DB system that can hold any size of data. In the long term this is probably what I will do with my BLOB system, do away with SQL entirely and instead store the output files raw with an offset stored to say where the file actually is in the DB. I went with SQL simply because it was a simpler option than writing my own system, but in the long term it seems to have worked out I was probably better off doing my own system. What held me back most wasn't the managing of the data, it was more the logistics of managing millions (potentially) of files in a large binary file and keeping their file pointers accurate, and also pulling the pointer data for a specific file when there *are* millions of files to search through. It was also the 2GB file limit, but HBF fixes that and accepts a Currency absolute seek (so not limited by the max of a Long)

    Just in case you didn't read my edits above, my file sizes for the MDBs don't match yours (though the source files are exactly the same) and it crashes at the 3rd month. It isn't a huge problem, you already know I don't plan to use that bit of the code, but thought I'd mention it in case it IS an old version

    Edit: It was somehow an old version, ignore that comment!
    Last edited by SmUX2k; Mar 8th, 2024 at 03:38 PM.

  34. #74

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Yeah, it was that 3rd month (March) of the AAPL file I was having trouble with too. But the latest version (in post #70) should have that fixed (by cutting a few things out).

    I'd hate to having one-file-per-day, as, to me, the only thing that makes any of this useful is the ability to have it in a database form where someone could do research with it. In my own mind, that's why the blob never made a great deal of sense to me.

    But hey ho, you have fun with it. It's been a fun journey for me to figure some of this stuff out.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  35. #75
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Yeah, it was that 3rd month (March) of the AAPL file I was having trouble with too. But the latest version (in post #70) should have that fixed (by cutting a few things out).

    I'd hate to having one-file-per-day, as, to me, the only thing that makes any of this useful is the ability to have it in a database form where someone could do research with it. In my own mind, that's why the blob never made a great deal of sense to me.

    But hey ho, you have fun with it. It's been a fun journey for me to figure some of this stuff out.
    I'd love to store it 1 month (or even 1 year) per DB, but the problem is that DBs don't implement compression. I essentially need to be able to pull this data out 1 day at a time, and specific columns only, which is definitely easily doable with an SQL DB query...but it would pull 10MB of data from the database for one column of data, when if I LZMA compressed the data stream (the CopyMemory-pulled array data) it would be anything between 500k and 1MB. When 1 day of data is 80MB, you can imagine what 1 year of data would look like, 10 years of data is 10x that and 4000+ stocks worth of data just increases that to unimaginable levels. On the small scale where you're looking at 2022 data for AAPL, you don't see the problem...when you're looking at 2000-2024 data for AAPL, it's 24 years worth of data (though, to be fair, 2000-2020 is probably about as big as 2021 is...or, if not, not far off...my current earliest zip is 2010 and that's 368MB compared to the 1.5GB 2021 and 1.6GB 2022 that has December missing) and that's potentially 6000x 80MB (24 years of 250 days 6000...which is ~483GB) just for one stock.

    I say better to store it as a BLOB or similar so someone can extract the specific data they want and do the research on it on demand, rather than having the data available raw 24/7 and taking up petabytes of space :-)

  36. #76
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,454

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    I'd love to store it 1 month (or even 1 year) per DB,
    but the problem is that DBs don't implement compression.
    That's not really the case...
    Most decent DB-engines can be configured for transparent compression (allowing direct queries)
    at either row, column or page-level - (for example SQLite, MySQL, PostgreSQL and even MS-SQLServer).

    The MS-Jet-Engine (*.mdb-Files) is one of the rare exceptions here, without any support for it...

    That these compression-features (at DB-level) are not often known (or used) these days,
    is due to transparent compression-support directly at FileSystem-Level -
    (most modern Linux-FS' - but also MS-NTFS allow that with roughly similar compression-ratios as the DB-engines).

    Sure, the choosen compression-algos for this kind of "live"-compression do not achieve rates as high as e.g. LZMA -
    but their write/read throughput is ~400-800MB/sec (that's faster than the write/read rate of magnet-HDs).

    So, with NTFS-Folder-based compression, you could interact and query your DB-data directly, in a decent performance -
    at the cost of having only about half of the LZMA-compression-ratios.

    Olaf

  37. #77

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Yeah, I was thinking about this too ... both the size issue and the compression issue.

    I'm still tempted to install MariaDB on my new box, we'll see. But pretty much any SQL server software is going to solve the size issue. Furthermore, they'll probably run faster than writing to MDB files. This is true because all the writing will be handled in a different OS thread (and probably a different CPU core) because the server software will be running as a service independent of the VB6 program. Although, with the ADO, you can still directly interface with it.

    Regarding compression, an easy solution is to just turn on Windows compression on the folder where the SQL server stores its files. That probably won't be "maximum" compression, but I suspect it'd be pretty good. And also, done that way, you can still run queries (or whatever) on your table(s) in your SQL server.

    I've got to watch the F1 Saudi race today, and also clean the house. But we'll see. I'll keep thinking about installing MariaDB and taking a better look at some of this stuff.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  38. #78
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    The speed problem I have found with processing data is not the actual data processing but the actual writing of that data to an array (or variable). Schmidt's code (the CSV parser) runs pretty well but where the MAIN bottleneck occurs is when that data is moved over from the output callback into the array.

    I tested this by moving the value from Csv.ParseNumber into a temporary variable then moving it from that temporary variable into the actual array I wanted the data in. I benchmarked both commands separately.

    It took 200s (with this extra step, usually around 120s...I assume my benchmarking is going to add some time to the process...the act of observing something in motion will alter its trajectory, and all that) to process the entire 2.5GB file, according to Schmidt's timing (slightly more than the previous times, I know...not sure why)

    40s of that was taking the ParseNumber value and putting it into the temporary variable. 20s of that was moving the data from temporary variable into the array. By this logic you can assume that the actual converting of the data is taking 20s.

    I re-run the code, ParseNumber completed in 38.6s while assigning values took 21.7s...that suggest the converting took ~17s

    I am the first to admit that my benchmarking code (using GetTickCount to get the number of ms it takes to do something) isn't mission-critical perfect, and there's 214m iterations of these (it only processes 4 values) so it isn't much time per individual write to the array...but it is indicative, if nothing else.

    Later on, when I want to read this data in from the SQL, the issue is not going to be the speed of the SQL system but the writing of this data directly into the array. There will be ways to speed it up, but is it really going to be quicker than loading a BLOB from SQLite, decompressing it and writing it directly to the array using CopyMemory? This was my main reason for wanting to directly access the underlying memory of variables/arrays, to maximise the speed of loading of the data. If it was coming from SQL, a split() would probably be involved in the process if I don't iterate through each array element one by one, and we both know how godawful slow either of those options can be. I suppose if I also went the raw SQL DB route I could run comparisons on speed with what I get through CopyMemory...do I really need to though?

  39. #79

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Hi SmUX2k,

    I'm no longer sure what "array" you're talking about. The way my latest code works is to just create one UDT at a time, and then dump it into the database.

    I guess you're still building an array so you can make your "blob". Personally, I'm not at all convinced that a blob is the way to go. That just completely disables your ability to do easy queries, sorts, and analysis on the processed data, without a lot of "unpacking" before you can do that.

    I know you may not use it, but I'm playing around with using MariaDB to store all the data in one database. If I were to actually want to use something like this, that's just the way I'd do it. I'd definitely want to "easily" do things like make daily or monthly histograms for, say, average stock price, and other such things. Just thinking about doing that from a blob already gives me a headache.

    --------------

    And hey, if you're trying to work out how to move your array (UDT array, or however you're doing it) to/from RC#'s version of MySQL, you may do better to start a new forum thread, as that's getting pretty far away from the OP topic of this thread.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  40. #80
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    I'm no longer sure what "array" you're talking about. The way my latest code works is to just create one UDT at a time, and then dump it into the database.
    You yourself admitted that the slowest part of the app was the writing to DB, did you not? :-P

    You may have set it up as a UDT, but you write it to DB as StockUDT()...you can just as easily access specific "arrays" by going StockUDT().element (for example StockUDT().BuyPrice would be the same as the entire array of BuyPrice...I would refer to that element of a UDT as an array). Within MY app (set up before you did your work on the CSV parser) I have DataXX() arrays (XX being TS, BP, AP, BV, AV, BE, AE for the 7 different elements)...I haven't converted it to a UDT yet as it wasn't high on my priorities, but I eventually will do.

    Quote Originally Posted by Elroy View Post
    I guess you're still building an array so you can make your "blob". Personally, I'm not at all convinced that a blob is the way to go. That just completely disables your ability to do easy queries, sorts, and analysis on the processed data, without a lot of "unpacking" before you can do that.
    No queries to do, sorts are done by timestamp BEFORE writing to the BLOB, no analysis done beyond the current day (comparisons to other days can be done, but it isn't really vital...I have no plans to yet and will deal with that if it occurs), and unpacking IS a slightly lengthier process than just pulling from DB but the data is in the array far quicker. ALL analysis is intraday, generally intra-minute, and done on pre-generated indicator results rather than on the price data (which is used to generate the indicator results).

    Quote Originally Posted by Elroy View Post
    I know you may not use it, but I'm playing around with using MariaDB to store all the data in one database. If I were to actually want to use something like this, that's just the way I'd do it. I'd definitely want to "easily" do things like make daily or monthly histograms for, say, average stock price, and other such things. Just thinking about doing that from a blob already gives me a headache.
    I may use it, I dunno...would be worth it to compare the BLOB method to the SQL raw method, if only to put to rest for certain which one is right for my needs :-P

    And yes, though I mention the RC6 CSV Parser, your app is still there as part of my app...I'm just fiddling around with things and trying to get both the RC6 and your app's output tied into the BLOB generator so I can properly benchmark them.

    You might want to easily do daily or monthly histograms of the average stock price for a stock...I actually don't...the statistical analysis I am doing relates to split-second triggers from results of indicators (I can't easily explain it any better than that) rather than stock prices directly, and it is this out-of-the-box thinking that I hope gives me the edge over all those people basing their analysis on historical prices and estimating which ones might be due to rise soon :-P

    My analysis requires an entire day's worth of data to be loaded into memory and a simulation of the day's trading to be repeatedly run millions (or perhaps billions) of times with slightly different setups...the loading speed of a day's data isn't vitally important if it is only done once before the data is used millions of times, but there might come times when it is only used once once loaded (it depends entirely on what I am doing at that time...either hunting for good settings or testing if a setting is good). I should probably point out that it won't be the tick data that I use for this, it'll be aggregate data created from the tick data, so there won't be millions of lines of prices to simulate.

    I'm not saying SQL raw data isn't an option, as it definitely is, but for my specific needs (where the query will essentially ALWAYS be for the specific BLOB of data...I might want all of 2022-01-02, and I will never want all except for before 10am, for instance...the BLOB is essentially a pre-generated query result that's nicely packaged up into a LZMA compressed binary ready to be CopyMemory copied into the array) I don't see it doing as well.

    Essentially I have an idea in my head and I am following it...I may find it is the wrong idea later on, but I'm on the track and staying there as otherwise things get too complicated and I will never finish the project (which, as you can probably tell by another post from 2022, has been going for many years already...I started it a little after Covid hit)

    Quote Originally Posted by Elroy View Post
    And hey, if you're trying to work out how to move your array (UDT array, or however you're doing it) to/from RC#'s version of MySQL, you may do better to start a new forum thread, as that's getting pretty far away from the OP topic of this thread.
    I'm not, just using it as an example of the fact that the populating of individual elements of an array one element at a time is going to be time consuming. Getting the data from MariaDB is one thing, getting it from the query result into where I need it is an entirely different thing. There might be more efficient ways to import data from a SQL DB that I don't know, but (let me say this again) I *don't* know SQL well enough to know these things...I know VB6 for the most part, and go with what I know :-P

Page 2 of 3 FirstFirst 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width