Page 1 of 3 123 LastLast
Results 1 to 40 of 93

Thread: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

  1. #1

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Resolved [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    This came up in another thread, and I wouldn't mind having a solution for my own VB6 library.

    Let's say we have an ASCII file with numeric data in it that we know how to parse. For example purposes, we'll just say that the following was read out of this ASCII file:

    Code:
    
        Dim bb() As Byte
        ReDim bb(5)
    
        bb(0) = Asc("9")
        bb(1) = Asc("8")
        bb(2) = Asc("7")
        bb(3) = Asc(".")
        bb(4) = Asc("5")
        bb(5) = Asc("6")
    
    And now, I'd like to convert that into a Single and/or Double.

    Sure, here's one way to do it:

    Code:
    
        Dim d As Double
    
        d = CDbl(StrConv(bb, vbUnicode))
        Debug.Print d
    
    But that's jumping through quite a few hoops to do it:
    * Converting the byte array to a String:
    * Allocate space for a BSTR.
    * Take our bb array and convert it from ASCII to 2-byte Unicode characters.
    * Then do the conversion.

    It just seems like there should be a way to take VarPtr(bb(0)) and feed that directly into some API call that returns our floating point number. (Maybe adding a vbNullChar to the end of our bb array, to make it a c-style ASCII string.)

    Within the msvbvm60.dll library, I see functions named as follows:
    • __vbaFpCDblR4
    • __vbaFpCDblR8
    • __vbaFpCSngR4
    • __vbaFpCSngR8
    • rtcR8ValFromBstr

    It just seems that one of those might get this job done, but I can find no documentation nor usage of any of those functions.

    Or maybe there's an API call in another library that'll get it done.

    Any ideas?
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  2. #2

  3. #3
    PowerPoster VanGoghGaming's Avatar
    Join Date
    Jan 2020
    Location
    Eve Online - Mining, Missions & Market Trading!
    Posts
    2,619

    Question Re: Converting ASCII Byte Array to Single or Double ... Fast

    And how would you call "atof" from VB6 without making a DLL that exports it?

    Code:
    Private Declare Sub atof Lib "MyDLL" (CharPtr As Any, ByVal d As Long)
    
        Dim bb() As Byte, d As Double
        ReDim bb(6)
    
        bb(0) = Asc("9")
        bb(1) = Asc("8")
        bb(2) = Asc("7")
        bb(3) = Asc(".")
        bb(4) = Asc("5")
        bb(5) = Asc("6")
        bb(6) = 0
    
        atof bb(0), VarPtr(d)
    Would that work?

  4. #4

  5. #5

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: Converting ASCII Byte Array to Single or Double ... Fast

    Ok, that's just too cool.

    Here's my implementation, keeping everything AsciiZ.

    Code:
    
    Option Explicit
    
    Private Declare Function atof CDecl Lib "msvcrt" (ByVal psz As Long) As Double
    
    
    Private Sub Form_Load()
        Dim bb() As Byte
        ReDim bb(6)
    
        bb(0) = Asc("9")
        bb(1) = Asc("8")
        bb(2) = Asc("7")
        bb(3) = Asc(".")
        bb(4) = Asc("5")
        bb(5) = Asc("6")
        ' leave last one to zero terminate.
    
    
        Dim d As Double
        d = atof(VarPtr(bb(0)))
    
        Debug.Print d
    
    End Sub
    
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  6. #6

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: Converting ASCII Byte Array to Single or Double ... Fast

    And here's a timing test:

    Code:
    
    Option Explicit
    
    Private Declare Function atof CDecl Lib "msvcrt" (ByVal psz As Long) As Double
    
    
    Private Sub Form_Load()
        Dim d As Double
        Dim bb() As Byte
        ReDim bb(6)
    
        bb(0) = Asc("9")
        bb(1) = Asc("8")
        bb(2) = Asc("7")
        bb(3) = Asc(".")
        bb(4) = Asc("5")
        bb(5) = Asc("6")
        ' leave last one to zero terminate.
    
    
        Dim i As Long
        Dim nStart As Single, nStop As Single
    
    
        nStart = Timer
        For i = 1 To 40000000
            d = atof(VarPtr(bb(0)))
        Next
        nStop = Timer
        Debug.Print "atof: "; nStop - nStart            ' Reports 5.894531
    
    
        nStart = Timer
        For i = 1 To 40000000
            d = CDbl(StrConv(bb, vbUnicode))
        Next
        nStop = Timer
        Debug.Print "StrConv & CDbl: "; nStop - nStart  ' Reports 10.46094
    
    
    End Sub
    
    
    atof: 5.894531
    StrConv & CDbl: 10.46094
    So, almost twice as fast.

    This one's resolved.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  7. #7

  8. #8
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    So, almost twice as fast.

    This one's resolved.
    And probably handy for me...I wonder now if this would improve the speed issues further for me as well (it wouldn't reduce the speed of the split function I was working with, but could reduce overall run time for the entire process) :-)

  9. #9
    PowerPoster VanGoghGaming's Avatar
    Join Date
    Jan 2020
    Location
    Eve Online - Mining, Missions & Market Trading!
    Posts
    2,619

    Cool Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    If you used zero-terminated strings in your data you wouldn't need split at all.

  10. #10
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by VanGoghGaming View Post
    If you used zero-terminated strings in your data you wouldn't need split at all.
    Not quite useful considering, in my case, I'm processing someone else's data and they've comma-delimited it like most people do with CSVs :-P

  11. #11
    PowerPoster VanGoghGaming's Avatar
    Join Date
    Jan 2020
    Location
    Eve Online - Mining, Missions & Market Trading!
    Posts
    2,619

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    You could do a global "Replace" for comma with zero (vbNullChar), anything that would help you avoid splitting.

  12. #12

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Yeah, and if it's in a byte array, just spin through the byte array and replace them with vbNullChar. That way, you've already created your AsciiZ strings without tampering with the size of the byte array at all.

    SmUX2k, whenever you get ready to bite the bullet, I'm quite confident that managing things in a byte array (and just totally staying away from Strings) will vastly improve the performance you're getting.

    -------------

    With that vbNullChar replacement, just use that pointer array (that you'd generate), and pass VarPtr(bb(PointerYouWant)) into that above atof function, and there you go ... Double returned, straight from the byte array.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  13. #13
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by VanGoghGaming View Post
    You could do a global "Replace" for comma with zero (vbNullChar), anything that would help you avoid splitting.
    56m lines of data coming in (~2.5GB worth), each one with commas, and all being processed (using an alternative to split) in 50-80 seconds. I suspect replacing won't be the panacea you think it is going to be, as even if I did a blanket replace on the data as it is loaded in it would have to be done 2500+ times...plus I personally don't understand how zero terminating the string would help (I could test that theory if I did :-P ), I know I need to do something with that string once it is zero-terminated at the end of each value (I would guess using CopyMemory to splice it over an array, but I would think that would require adding some length data so it probably isn't that) but no idea what

  14. #14

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    56m lines of data coming in (~2.5GB worth), each one with commas, and all being processed (using an alternative to split) in 50-80 seconds. I suspect replacing won't be the panacea you think it is going to be, as even if I did a blanket replace on the data as it is loaded in it would have to be done 2500+ times...plus I personally don't understand how zero terminating the string would help (I could test that theory if I did :-P ), I know I need to do something with that string once it is zero-terminated at the end of each value (I would guess using CopyMemory to splice it over an array, but I would think that would require adding some length data so it probably isn't that) but no idea what
    SmUX2k, I'm guessing that you could spin through (For i = LBound to UBound) a byte array, searching for 39 (a comma), building your pointer array when you find one, and replace the 39 with a 0, and do it all faster than a Split() would execute.

    Again, staying away from Strings is going to bring large speed improvements. VB6 can do math much faster than it can massage strings.

    ------

    And also, don't for a second believe that there's not a loop under the hood of a Split().
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  15. #15
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Yeah, and if it's in a byte array, just spin through the byte array and replace them with vbNullChar. That way, you've already created your AsciiZ strings without tampering with the size of the byte array at all.

    SmUX2k, whenever you get ready to bite the bullet, I'm quite confident that managing things in a byte array (and just totally staying away from Strings) will vastly improve the performance you're getting.

    -------------

    With that vbNullChar replacement, just use that pointer array (that you'd generate), and pass VarPtr(bb(PointerYouWant)) into that above atof function, and there you go ... Double returned, straight from the byte array.
    Aha! And there's my answer!

    To be honest, I suspect my interest is going to be piqued at some point...it's not that I don't want to "bite the bullet", it's more that getting into byte array string manipulation is a huge undertaking (more so for me than for others who have experience with it) that could take tons of trial and error to get working, and potentially weeks of tweaking and playing around to fully complete that part of the project. I have the inclination, but not the time to put into it...I wanted to be further ahead than this already :-)

  16. #16
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    SmUX2k, I'm guessing that you could spin through (For i = LBound to UBound) a byte array, searching for 39 (a comma), building your pointer array when you find one, and replace the 39 with a 0, and do it all faster than a Split() would execute.
    Isn't there an InStrB for byte arrays? I'm not against iterating through the data byte by byte, but not if there's something that does it better :-)

  17. #17
    PowerPoster VanGoghGaming's Avatar
    Join Date
    Jan 2020
    Location
    Eve Online - Mining, Missions & Market Trading!
    Posts
    2,619

    Lightbulb Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Obviously InStrB is the way to go.

    The idea is to convert those strings into numerals using TheTrick's code above. "atof" for floating point and "atoi" for integers. Both require zero-terminated strings.

  18. #18

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    I threw together a speed-test to compare doing a Split() to building a pointer array into a Byte array:

    Code:
    
    Option Explicit
    
    Private Sub Form_Load()
    
        ' Build a string with every 10th chracter a comma.
        Dim s As String
        s = Space$(10000000)
        Dim i As Long
        For i = 5& To 10000000 Step 10&
            Mid$(s, i, 1) = ","
        Next
    
        Dim nStart As Single, nStop As Single
        Dim sa() As String
    
        ' Test time to do split.
        nStart = Timer
        sa = Split(s, ",")
        nStop = Timer
        Debug.Print "Split() time: "; nStop - nStart
    
        Erase sa
        s = vbNullString
    
        ' Build a byte array of 10000000 characters, with every 10th byte a 44 (comma).
        Dim ba() As Byte
        ReDim ba(9999999)
        For i = 0& To 9999999
            ba(i) = 32 ' Space, just for testing.
        Next
        For i = 4& To 9999999 Step 10&
            ba(i) = 44 ' Comma.
        Next
    
        Dim pa() As Long
        Dim pCount As Long, ptr As Long
    
    
        ' Test time to count commas, build pointer array, and replace with vbNullChar.
        nStart = Timer
        ' First, count commas.
        For i = 0& To 9999999
            If ba(i) = 44 Then pCount = pCount + 1&
        Next
        ReDim pa(pCount) ' We use full pCount because there's one more data-piece than there are commas.
        ' Now, build pointer array and replace commas with vbNullChar.
        pa(0&) = 0& ' First pointer is always 0&
        ptr = 1&
        For i = 0& To 9999999
            If ba(i) = 44 Then
                pa(ptr) = i + 1& ' Point to character just AFTER the comma.
                ptr = ptr + 1&
                ba(i) = 0
            End If
        Next
        nStop = Timer
    
        Debug.Print "Pointers from Byte Array: "; nStop - nStart
    
    
    End Sub
    
    
    
    Results:

    Split() time: 1.878906
    Pointers from Byte Array: 0.140625
    Over 13X improvement!!!

    That's just more proof that you need to stay away from Strings if you're really worried about every second you can gain.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  19. #19

  20. #20

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by The trick View Post
    You could make like this:

    Code:
        l = VarPtr(bb(0))
        For i = 1 To 40000000
            d = atof(l)
        Next
    To avoid locking/unlocking array and improve performance.
    Trick, that's not really a fair test though, as SmUX2k or I would be bouncing around in the Byte Array, fetching different pieces of data. So it's probably best to leave the VarPtr inside the loop, to be fair.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  21. #21
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by VanGoghGaming View Post
    Now test again using InStrB for commas!
    My worry there is that I suspect InStrB is going to convert the byte array to a string before doing the check...not saying it will, just that I suspect it might (without any actual experience or proof). It's still something that *should* be tried, of course, you only find out if it works better if you do.

    Also, Elroy, I suspect that once I do have the function working as it needs to be (wasn't in the mood to finish it today, brain's fried) and I move on to other elements of the app, I *will* play around with an alternative byte-array version of the function just to see how much better it performs. The OVERALL process (without compressing or sanitising the data first) takes about 11 minutes now, down from 12 minutes before I used Wqw's suggestion to get the 153s down to 72s. I'm fairly confident that byte arrays could get it down to 5m or less in total (and, of course, I still have the original function that uses split() throughout and takes ~20 minutes, and will have the current function that takes 11 minutes...they can all be benchmarked again with other data easy enough). As I said in the other thread, the source is a byte array so it really should stay as a byte array.

  22. #22

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by VanGoghGaming View Post
    Now test again using InStrB for commas!
    Unless I'm overlooking something, InstrB isn't going to be useful here. You've still got to convert to Strings to use InstrB, and that's where the massive slowdown is.

    Quote Originally Posted by SmUX2k View Post
    My worry there is that I suspect InStrB is going to convert the byte array to a string before doing the check...not saying it will, just that I suspect it might (without any actual experience or proof). It's still something that *should* be tried, of course, you only find out if it works better if you do.

    Also, Elroy, I suspect that once I do have the function working as it needs to be (wasn't in the mood to finish it today, brain's fried) and I move on to other elements of the app, I *will* play around with an alternative byte-array version of the function just to see how much better it performs. The OVERALL process (without compressing or sanitising the data first) takes about 11 minutes now, down from 12 minutes before I used Wqw's suggestion to get the 153s down to 72s. I'm fairly confident that byte arrays could get it down to 5m or less in total (and, of course, I still have the original function that uses split() throughout and takes ~20 minutes, and will have the current function that takes 11 minutes...they can all be benchmarked again with other data easy enough). As I said in the other thread, the source is a byte array so it really should stay as a byte array.
    Yeah, I just looked at InstrB. It still needs BSTR data, which is where the slowdown is.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  23. #23
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Unless I'm overlooking something, InstrB isn't going to be useful here. You've still got to convert to Strings to use InstrB, and that's where the massive slowdown is.
    Which is why I asked if there was a byte array version or alternative (I assumed it would convert to string, given the name) :-)

    https://www.vbforums.com/showthread....ays-As-Strings was returned from a google search to ask for a byte string alternative...might be adding more processing, of course, so not suggesting it as an option
    Last edited by SmUX2k; Mar 6th, 2024 at 06:56 PM.

  24. #24

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    SmUX2k, do you need to keep confidential the work you've done? Or maybe share whatever UDT you wind up with?

    Just to keep procrastinating other tasks, I might possibly build a class for you that just does this stuff.

    At a minimum though, I would need a sample ASCII file, and the UDT structure you ultimately want to wind up with. I suppose I could take that snippet of data you posted elsewhere, and use the UDT I put together (and showed elsewhere), and just do it from that. I'm going to watch some TV now with my wife though. Maybe tomorrow.

    But please do tell me what the MAXIMUM byte-file-size you'll be running into. If it's less than 1GB, I'll probably just read the whole thing into memory. If it's less than 2GB, I might recommend learning how to link with LAA, and still read it all into memory (or maybe use Far-Memory to store it all). With Far-Memory, we can break the VB6 4GB memory barrier.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  25. #25
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    SmUX2k, do you need to keep confidential the work you've done? Or maybe share whatever UDT you wind up with?

    Just to keep procrastinating other tasks, I might possibly build a class for you that just does this stuff.

    At a minimum though, I would need a sample ASCII file, and the UDT structure you ultimately want to wind up with. I suppose I could take that snippet of data you posted elsewhere, and use the UDT I put together (and showed elsewhere), and just do it from that. I'm going to watch some TV now with my wife though. Maybe tomorrow.

    But please do tell me what the MAXIMUM byte-file-size you'll be running into. If it's less than 1GB, I'll probably just read the whole thing into memory. If it's less than 2GB, I might recommend learning how to link with LAA, and still read it all into memory (or maybe use Far-Memory to store it all). With Far-Memory, we can break the VB6 4GB memory barrier.
    It's not hugely "confidential" or anything, and it's not as if I am sharing anything people don't already know...it's more that my coding style is hugely sloppy (as I'm sure you've noticed many times :-) ) and it'd be hard for people to follow my code (I often tidy code up before posting it).

    I've been using HugeBinaryFile to load the data in 1 MB at a time (which I could probably now bring up a little, if I use byte arrays...the 1MB was more to keep memory usage to a minimum) so the size of the byte array would be no more than a little more than 1MB (there's leftover data from the previous load that is prepended to the data...assumedly with CopyMemory now I am using byte arrays...though, saying that, some clever seek pointer manipulation would possibly do away with that and protect against slowdown from having to move 1MB of data into a new byte array). I have found that loading into memory with HBF adds maybe 20 seconds to the process with a 2.5GB file, and substantially less when it is stored on a RAM drive, so this block approach to loading the data seems to work well...it checks for a vbCrLf in the remaining data (using InStr) and if not EOF and no vbCrLf it loads in 1MB more data to add to the string. BTW, just posting this and realising the seek pointer manipulation is a good tweak has probably sped up the code by 30+ seconds as I am taking a ~50 byte string and appending 1MB of strconv() byte array data to it...~2500 times...not that this matters if I will use byte arrays in future, but it would also streamline the byte array process in theory :-)

    I don't use a UDT, though I should and probably will for tidiness...it's just 7 arrays (5 of which I actually use, but I'm leaving in support for the other two in case I ever choose to use them...these two are the exchange data for the bid and ask price, which I don't see any need for) and they're DIMmed at ~5m entries each (with the ability to increase that if needed, and a pointer that says exactly how many of them are used so I don't process more data than I need to when converting to bytes with CopyMemory).

    I can upload some data if you want me to, the ones I uploaded before were given self-destruct time limits (I think I chose 2 weeks, can't remember) to protect against problems from the people who own the data (unlikely to happen, but always could, as the exchanges own the data and it shouldn't be distributed without paying them) so they may or may not exist...perfectly happy to send a private message with details tomorrow (bedtime here for me) if you want to have a go, though you really don't have to and it would be a good learning exercise for me to do it myself :-)

  26. #26
    PowerPoster VanGoghGaming's Avatar
    Join Date
    Jan 2020
    Location
    Eve Online - Mining, Missions & Market Trading!
    Posts
    2,619

    Wink Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Yeah, I just looked at InstrB. It still needs BSTR data, which is where the slowdown is.
    Even so, unless you turn off array bounds checking, a dedicated API function would always search faster:

    Code:
    Private Declare Function CommaSearch Lib "shlwapi" Alias "StrStrA" (pszFirst As Any, Optional lComma As Long = 44) As Long

  27. #27

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Ok, I went ahead and knocked this out. And SmUX2k, I just made up a file from that data snippet you posted in another thread. I'm guessing (hoping) you've got some kind of standard prompting for an actual ASCII file, and I'll let you work all that out.

    Also, I didn't write anything out (to a blob or otherwise). You can work that out as well.

    My work is all attached as a ZIP file, including the sample file I used. But I'm also going to post some here (but just grab the ZIP to test).

    The file I created (Big_File.txt) has 2,044,421 lines and is 101,403,232 bytes. You said that yours were bigger, but that's as big as I wanted to go for this test. (It's in the ZIP.) And, yeah, a "rolling window" into the file, as opposed to reading the whole thing, might be needed once you start approaching 2GB files (disk size), but possibly not.

    Here's what I did:
    • Read the file, into a byte array.
    • Massage the file to have vbNullChar values for AsciiZ delimiting.
    • Build a UDT for your data.
    • Create a UDT array of the entire byte array.

    I put comments in the source code to explain everything, but, once it's in the UDT it's quite compact. In fact, you could probably delete the file's byte array once you get it into the UDT array.

    Just some other FYI. I stayed away from FUNCTION procedures because they tend to copy data around. I used SUB procedures for almost everything. They're a bit faster, and they don't copy data around nearly as much.

    The UDT takes exactly 28 bytes per record, whereas each record takes ~50 bytes sitting in the byte array, or it would take ~100 bytes if you set it into a VB6 Unicode String array. So, these UDTs are fairly compact.

    And the only use of a BSTR VB6 string is to create the specification to the file to be read.

    I decided not to create any CLS modules, as I sensed that maybe you weren't terribly comfortable with those. It's all in two BAS modules, and a Form1 with some test code in it.

    Also, VERY IMPORTANT, you must load up The Trick's CDeclFix VB6 Add-In for this to work, as it's required to make that atof (ascii-to-float) call. Once compiled, no need to worry about it.

    Ohhh, that brings up another point. Once you get comfortable that all of this runs error free, if you check all the "Advanced Optimizations" when you compile, it'll run MUCH faster.

    But, for me, just running in the IDE, nothing took over a few seconds, and that includes doing all the file reading, parsing, and building a complete array of UDTs. I've got a pretty hot machine, but nothing was taking anywhere close to a minute.

    And, just to show some of it (but again, it's all in the ZIP), here's my Form1 test code:

    Code:
    
    Private Sub Form_Load()
        LoadBigFile
        BuildFieldPointerArray
        ValidateAndParseDateFields
    
        ' Now test to see if we can get it all as a UDT array.
    
        Dim UDTs() As RecordType
        FullArrayOfUdtRecords UDTs
    
        Debug.Print UBound(UDTs)
        Debug.Print UDTs(0).Year
        Debug.Print UDTs(UBound(UDTs)).BidPrice
    
        ' I checked more, and it all seems to be there.
        ' After we get this UDT array, we could toss our bbFile array, if we wanted.
    
        ' From here, you can stuff this UDT array into a blob or whatever you want to do with it.
        ' Because it's binary, it should be compressed fairly well.
    
    End Sub
    
    Here's the code that does the heavy lifting (in a BAS module):
    Code:
    Option Explicit
    '
    Public bbFile() As Byte
    Public pData() As Long
    '
    Public Type RecordType
        Pad             As String * 2   ' 0 These two bytes aren't used, but the pad will be there whether this is here or not.
        Year            As Integer      ' 2      \ They'll either be here, or after Minute.
        Month           As Byte         ' 4       \ Done this way, the intra-udt will be perfectly packed.
        Day             As Byte         ' 5
        Hour            As Byte         ' 6
        Minute          As Byte         ' 7
        Seconds         As Single       ' 8
        '
        BidPrice        As Single       ' 12
        BidVolume       As Integer      ' 16
        BidExchange     As Integer      ' 18
        AskPrice        As Single       ' 20
        AskVolume       As Integer      ' 24
        AskExchange     As Integer      ' 26
    End Type                            ' 28 bytes total for UDT.
    '
    
    Public Sub LoadBigFile()
        '
        ' Setup and read the file into a byte array.
        ' This should probably prompt for a file.
        Dim sFileSpec As String
        sFileSpec = App.Path & "\Big_File.txt"
        '
        ' Let's make sure we've got some kind of file before proceeding.
        If Not FileExists(sFileSpec) Then Err.Raise 53&
        '
        ' Actually open and read the file.
        Dim hFile As Long
        hFile = FreeFile
        Open sFileSpec For Binary As hFile
        Dim iFileLen As Long
        iFileLen = LOF(hFile)
        ReDim bbFile(iFileLen - 1&)
        Get hFile, , bbFile
        Close hFile
    End Sub
    
    Public Sub BuildFieldPointerArray()
        ' Make sure file is loaded into bbFile() first.
        '
        Dim i As Long
        '
        ' First, set vbCR to spaces, and vbLF to commas.
        For i = LBound(bbFile) To UBound(bbFile)
            If bbFile(i) = 13 Then bbFile(i) = 32 ' 32 = Asc(Space$(1)).
            If bbFile(i) = 10 Then bbFile(i) = 44 ' 44 = Asc(",").
        Next
        '
        ' Let's space out any trailing commas on EOF.
        ' This keeps us from creating an extra field pointer on the EOF.
        i = UBound(bbFile)
        Do '                ","                 " "
            If bbFile(i) <> 44 And bbFile(i) <> 32 Then Exit Do
            bbFile(i) = 32 ' 32 = Asc(Space$(1)).
            i = i - 1&
        Loop
        '
        ' Now we can count commas.
        Dim iCount As Long
        For i = LBound(bbFile) To UBound(bbFile)
            If bbFile(i) = 44 Then iCount = iCount + 1&
        Next
        '
        ' Create space for field pointers and loop to create them.
        ' We also set our commas to vbNullChar, so we can easily pass AsciiZ from the array.
        ReDim pData(iCount)         ' One more field than commas.
        Dim p As Long
        pData(p) = 0& ' Just our first pointer, just to show it.
        p = p + 1&
        For i = LBound(bbFile) To UBound(bbFile)
            If bbFile(i) = 44 Then  ' 44 = Asc(",").
                pData(p) = i + 1&   ' The next field starts on the next byte.
                bbFile(i) = 0       ' Creates an AsciiZ string out of our fields.
                p = p + 1&
            End If
        Next
        '
        ' There's seven fields per record.
        ' So, the number of pointers should be a multiple of seven (but +1 because they're zero bound).
        ' But, we have an EOF pointer, so it's one less should be a multiple of seven.
        If (UBound(pData) + 1&) Mod 7& <> 0& Then
            MsgBox "Warning!  The field count isn't a multiple of 7!!!" & vbCrLf & _
                   "And it should be, given that there are seven fields per record.", vbExclamation
        End If
        '
        ' And now, we re-zero any/all spaces (including date/time space) to make sure AsciiZ strings work the way we want.
        For i = LBound(bbFile) To UBound(bbFile)
            If bbFile(i) = 32 Then bbFile(i) = 0
        Next
        '
        ' And now, make sure last byte is a zero.
        If bbFile(UBound(bbFile)) <> 0 Then
            ReDim Preserve bbFile(UBound(bbFile) + 1&) ' This only happens when the last line wasn't terminated with a vbLF.
        End If
    End Sub
    
    Public Sub ValidateAndParseDateFields()
        ' We just do this to do a bit of validation on the data.
        ' We check to make sure a dash ("-") is present in the 5th byte of the first field of every record.
        ' This first field should look something like:  2022-01-03 04:41:27.002
        ' And that fifth character, following the year, should always be a dash.
        '
        Dim i As Long
        Dim bError As Boolean
        For i = 0& To UBound(pData) Step 7&
            If Not bError Then
                If bbFile(pData(i) + 4&) <> 45 Then ' 45 = Asc("-").
                    MsgBox "Warning!  We've got a first field of a record that doesn't look like a date." & vbCrLf & _
                           "Data may not be parsed correctly.", vbExclamation
                    bError = True
                End If
            End If
            '
            ' Set dashes, colons, & space to vbNullChar, making separate AsciiZ strings.
            bbFile(pData(i) + 4&) = 0&      ' Year before, month after.     pData(i)       points at year.
            bbFile(pData(i) + 7&) = 0&      ' Month before, day after.      pData(i) + 5&  points at month.
            bbFile(pData(i) + 10&) = 0&     ' Day before, hour after.       pData(i) + 8&  points at day.
            bbFile(pData(i) + 13&) = 0&     ' Hour before, minute after.    pData(i) + 11& points at hour.
            bbFile(pData(i) + 16&) = 0&     ' Minute before, seconds after. pData(i) + 14& points at minute.
                                            '                               pData(i) + 17& points at seconds.
        Next
    End Sub
    
    Public Function NumberOfFields() As Long
        ' Returns the total number of fields we found.
        '
        NumberOfFields = UBound(pData) + 1& ' We're zero based.
    End Function
    
    Public Function NumberOfRecords() As Long
        ' Assuming 7 fields per record, returns the total number of records.
        '
        NumberOfRecords = NumberOfFields \ 7&
    End Function
    
    Public Sub UdtForOneRecord(pRecNumber As Long, udt As RecordType)
        ' pRecNumber is ZERO based, so max should be NumberOfRecords - 1.
        ' It's a SUB to make it a bit faster.
        '
        ' Just some FYI, an ASCII line is ~50 characters (100 bytes if stored as VB6 Unicode).
        ' The returned UDT is precisely 28 bytes, so nearly 1/4th the memory that a line would take as VB6 Unicode.
        '
        Dim pField As Long
        pField = pRecNumber * 7&
        AsciiBytesToInt bbFile, pData(pField), udt.Year
        AsciiBytesToByte bbFile, pData(pField) + 5&, udt.Month
        AsciiBytesToByte bbFile, pData(pField) + 8&, udt.Day
        AsciiBytesToByte bbFile, pData(pField) + 11&, udt.Hour
        AsciiBytesToByte bbFile, pData(pField) + 14&, udt.Minute
        udt.Seconds = atof(VarPtr(bbFile(pData(pField) + 17&)))
        '
        pField = pField + 1&: udt.BidPrice = atof(VarPtr(bbFile(pData(pField))))
        pField = pField + 1&: AsciiBytesToInt bbFile, pData(pField), udt.BidVolume
        pField = pField + 1&: AsciiBytesToInt bbFile, pData(pField), udt.BidExchange
        pField = pField + 1&: udt.AskPrice = atof(VarPtr(bbFile(pData(pField))))
        pField = pField + 1&: AsciiBytesToInt bbFile, pData(pField), udt.AskVolume
        pField = pField + 1&: AsciiBytesToInt bbFile, pData(pField), udt.AskExchange
    End Sub
    
    Public Sub FullArrayOfUdtRecords(UdtArrayOut() As RecordType)
        ' This is destructive to anything already in UdtArrayOut.
        ' The returned array will be ZERO based.
        '
        ReDim UdtArrayOut(NumberOfRecords - 1&)
        Dim pRecNumber As Long
        For pRecNumber = 0& To NumberOfRecords - 1&
            UdtForOneRecord pRecNumber, UdtArrayOut(pRecNumber)
        Next
    End Sub
    And there's another module with some utility procedures in it (atof, FileExists, AsciiBytesToByte, AsciiBytesToInt).

    --------------

    Read the comments for more info.
    Attached Files Attached Files
    Last edited by Elroy; Mar 7th, 2024 at 12:45 AM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  28. #28
    Frenzied Member 2kaud's Avatar
    Join Date
    May 2014
    Location
    England
    Posts
    1,169

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    For split a byte array can VB6 use the c function strtok? This returns a pointer to the next string delimited by a char(s). The found delimiter is replaced by NULL - giving you null-terminated strings to use with other c functions.
    https://learn.microsoft.com/en-us/cp...?view=msvc-170

    Also consider that often strdod() is faster than atof() if can be used with VB6
    https://learn.microsoft.com/en-us/cp...?view=msvc-170
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  29. #29
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Thanks, as always...I'll have to use it now it's done :-P

    One little comment, which might help reduce the UDT size a little...

    Code:
        Pad             As String * 2   ' 0 These two bytes aren't used, but the pad will be there whether this is here or not.
        Year            As Integer      ' 2      \ They'll either be here, or after Minute.
        Month           As Byte         ' 4       \ Done this way, the intra-udt will be perfectly packed.
        Day             As Byte         ' 5
        Hour            As Byte         ' 6
        Minute          As Byte         ' 7
        Seconds         As Single       ' 8
    The Y/M/D isn't actually needed, I discard that when processing as it is stored WITHIN a file/blob named with the date (so, more correctly, I collect it into a variable...easily doable though). The H:M:S.ms can be converted to a Long using DateDiff on the H:M:S (to get a numerical value of the number of seconds since "0:00:00") multiplied by 1000 with the ms added to it...or, if it is quicker, could just as easily perform the required maths on the H, M and S to get the same value if it is faster. With YMD removed that's 4 bytes gone per entry, and it adds 10 bytes to the overall storage by using it as the name. With HMSms turned into a Long that's 6 bytes reduced to 4. The datetime took up 10 bytes but really only needs to be 4 (Long). An alternative here, if it was in any way different speed-wise, is to store the HMS datediff as an Integer and then keep the ms in another Integer...still 4 bytes, but split into two variables. As you're probably aware, I've been working with this part of the problem for a little while now.

    The code I am using actually indexes the file and calculates where each line with a new day starts, so each processing run is for a specific day's data...and of course I can, in theory, pull the first 10 bytes of input to get the date easy enough.

    I mention this as I personally don't know why you have the padding there and also don't know if this change would affect it or not.

    Also, Exchange data (Bid and Ask exchange values) are 1-13, they can be stored in byte, that shaves another 2 bytes off the UDT for a total of 8 bytes making it 20b per entry. As these values are 4 bits of data (under 16) they COULD be stored as a low/high in 1 byte, bringing it down to 19b, but they could also be discarded meaning the eventual UDT is 18b (though there is always potential for me to want to include the exchange data later, but I see no need for it unless I wanted to exclude a specific exchange or only use the data from one specific exchange...I chose to just discard it for now)

    Volume I have always divided by 100 and stored in Integer, as I am worried that one day it might be higher than 32767 (I don't think it will ever get higher than 3276700 in such a short time, though in MY code I do set a cap on that and if it is higher it caps off at 3276700)...this value seemingly is always in multiples of 100.

    I can't really comment on any of the other code until I have had a play with it, but that is my feedback on the format of the UDT :-)

  30. #30
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by 2kaud View Post
    For split a byte array can VB6 use the c function strtok? This returns a pointer to the next string delimited by a char(s). The found delimiter is replaced by NULL - giving you null-terminated strings to use with other c functions.
    https://learn.microsoft.com/en-us/cp...?view=msvc-170

    Also consider that often strdod() is faster than atof() if can be used with VB6
    https://learn.microsoft.com/en-us/cp...?view=msvc-170
    I think the problem with both of these, in this context at least, is that both of these require an input STRING, I don't think they will work with a byte array without converting it to a string first (which, when done 53m x5 times, adds up...it's the main reason why it takes my string-based approach 11 minutes to complete the entire 2.5GB file). I could be wrong on the byte array front, of course...if they accept byte arrays and use them as-is, they might be handy!

  31. #31
    Frenzied Member 2kaud's Avatar
    Join Date
    May 2014
    Location
    England
    Posts
    1,169

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    They require a null-terminated array of ASCII chars (bytes) (the same format that atof() requires as a param). Is this a byte-array in VB? I don't use VB.
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C++23 Compiler: Microsoft VS2022 (17.6.5)

  32. #32
    PowerPoster
    Join Date
    Feb 2015
    Posts
    2,797

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Elroy View Post
    Trick, that's not really a fair test though, as SmUX2k or I would be bouncing around in the Byte Array, fetching different pieces of data. So it's probably best to leave the VarPtr inside the loop, to be fair.
    When you pass an array item to an API (VarPtr) it locks array (SafeArrayLock), calls the API (VarPtr), unlocks array (SafeArrayUnlock). If you need to increase performance you could use a pointer variable and pass it. You could get the pointer to the first element of array and the use index to add this pointer which is faster:

    Code:
    Option Explicit
    
    Private Declare Function memchr CDecl Lib "msvcrt" ( _
                             ByRef p As Any, _
                             ByVal b As Byte, _
                             ByVal lCount As Long) As Long
    Private Declare Function atof CDecl Lib "msvcrt" ( _
                             ByVal psz As Long) As Double
    Private Declare Function QueryPerformanceCounter Lib "kernel32" ( _
                             ByRef lpPerformanceCount As Currency) As Long
    Private Declare Function QueryPerformanceFrequency Lib "kernel32" ( _
                             ByRef lpFrequency As Currency) As Long
    
    Private Sub Parse( _
                ByRef bAnsi() As Byte, _
                ByRef pPtrs() As Long)
        Dim lCount      As Long
        Dim lRemain     As Long
        Dim pStartPos   As Long
        Dim pPrevPos    As Long
        Dim pNextPos    As Long
    
        pStartPos = VarPtr(bAnsi(0))
        pPrevPos = pStartPos
        lRemain = UBound(bAnsi) + 1
        
        Do
        
            pNextPos = memchr(ByVal pPrevPos, &H3B, lRemain)
            
            If pNextPos Then
                bAnsi(pNextPos - pStartPos) = 0
            End If
            
            If lCount Then
                If lCount > UBound(pPtrs) Then
                    ReDim Preserve pPtrs(lCount * 2 - 1)
                End If
            Else
                ReDim pPtrs(99)
            End If
            
            pPtrs(lCount) = pPrevPos
            
            lRemain = lRemain - (pNextPos - pPrevPos + 1)
            pPrevPos = pNextPos + 1
            lCount = lCount + 1
    
        Loop While pNextPos
    
        ReDim Preserve pPtrs(lCount - 1)
    
    End Sub
    
    Private Sub Form_Load()
        Dim bAnsi() As Byte
        Dim pPtrs() As Long
        Dim dVal    As Double
        Dim lIndex  As Long
        Dim cFreq   As Currency
        Dim cTime1  As Currency
        Dim cTime2  As Currency
    
    '    Dim dValue As Double
    '
    '    Open "C:\Temp\test_numbers.txt" For Output As 1
    '
    '    For lIndex = 0 To 10000000
    '
    '        dValue = Rnd * 1000000
    '
    '        Print #1, Trim$(Str(dValue)); ";";
    '
    '    Next
    
        AutoRedraw = True
        
        Open "C:\temp\test_numbers.txt" For Binary Access Read As 1
        ReDim bAnsi(LOF(1)) ' // Ensure null terminating
        Get 1, , bAnsi
        Close 1
        
        Parse bAnsi, pPtrs
        
        QueryPerformanceFrequency cFreq
        
        ' // Preloading
        atof ByVal "0"
        
        QueryPerformanceCounter cTime1
        
        For lIndex = 0 To UBound(pPtrs)
            dVal = atof(pPtrs(lIndex))
        Next
        
        QueryPerformanceCounter cTime2
        
        Print Format$((cTime2 - cTime1) / cFreq, "0.00000") & "ms. "
        
        ' // Change to indexes
    
        For lIndex = UBound(pPtrs) To 0 Step -1
            pPtrs(lIndex) = pPtrs(lIndex) - pPtrs(0)
        Next
        
        QueryPerformanceCounter cTime1
        
        For lIndex = 0 To UBound(pPtrs)
            dVal = atof(VarPtr(bAnsi(pPtrs(lIndex))))
        Next
        
        QueryPerformanceCounter cTime2
        
        Print Format$((cTime2 - cTime1) / cFreq, "0.00000") & "ms. "
        
    End Sub
    The first cycle:



    The second cycle:


  33. #33
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    I can't really comment on any of the other code until I have had a play with it, but that is my feedback on the format of the UDT :-)
    Had a quick play with your example code (after working out how to implement the CDeclfix) and I also added GetTickCount so I could properly benchmark it...8.3s with your code. By my calculations, it would take a little under 4 minutes (227s, so 3m47s) with 2.5GB of data which has 56m lines of data in it...so that is pretty good compared to my 11 minutes! I of course have to implement the block loading part (shouldn't be difficult to add it into the code) so it can process the entire 2.5GB file...Not sure when I do it, I usually have to wake up for a few hours before I get into coding (got 5 energetic kittens running around, who get in my way in the cutest way possible) so expect me to report back on it later.

    Oh, and for the record, you (or someone) mentioned the file feeding process. I have a large folder full of ZIP files, and each of those ZIPs is usually 12 files (1 per month) for a specific year of the stock data. I've written code that iterates through the folder getting the ZIP names, then looks inside to get the filenames. I also have a function that takes the ZIPfile/filename formatted data (essentially it's the zip file location followed by a > and the filename within the zip) and extracts that file to the RAM drive. Obviously the RAM drive is only 4GB so won't be able to hold all the files, I haven't got around to that bit yet but will store it on HD instead when I can't store to RAM. There are 50,000+ files waiting to be processed by this code, and it should be a simple process to iterate through the files.

    As you might be able to work out, 11m down to 4m means 3 months of processing becomes 1 month...I don't have a definite number on how long it would actually take to process, but I can take the filesize of each file when generating the file list and estimate based on 1GB being ~1.5 minutes (which works out about 1TB of data every 24 hours, and 1GB would take 90.8 seconds so 1.5m is close enough) which probably adds up to even less than a month (288GB of zipped files to process...safe to assume ~1:10 compression so 3TB of data...3 days) :-)

    Also, my PC isn't exceptionally high end...it's a mobile Ryzen 9 5900HX (the Minisforum HX90) with 8 cores/16 threads and 32GB of memory...perfect for parallel processing multiple files, I am sure you are thinking, and so am I...as long as each version of the app is pointing at a different file, as my implementation of HBF locks the file until it is finished with, I think I could maybe process everything in 12 hours...I probably won't bother, as in the long term the data will be downloaded one day at a time and those days probably won't need parallel processing, but it's there as an option.

    Oh, and I wasn't suggesting you DO the fixes I mentioned above (on the UDT), just pointing out how I did it...apart from knowing why the padding is at the start, I should be fine with everything :-)

  34. #34
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,454

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    ...8.3s with your code.
    If you want to speed this up by about factor 8 (~1sec, when native-compiled, all options checked) -
    you can use the superfast CSV-Parser-Class of the RC6 (which requires much less code as well)...

    Code:
    Option Explicit
    
    Implements ICSVCallback
    Private Csv As cCSV
    
    Private Sub Form_Click()
      Cls
     
      New_c.Timing True
        Set Csv = New_c.Csv 'create a new (CSV)TextParser-instance
            Csv.ParseFile "C:\temp\big_file.txt", Me
            'Csv.ParseFile "C:\temp\TSLA_2022_01.txt", Me
      Print "Parsing finished after" & New_c.Timing
      
      Caption = Csv.RowsParsed & " Rows processed"
    End Sub
    
    Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
      Dim Y%, M%, D%, H%, N%, S#, BidP#, BidV%, BidE%, AskP#, AskV%, AskE%
      
      Select Case ColNr
        Case 0 'slightly more extended parsing on the IsoDate-Column(0) (extracting: Y, M, D, H, N, S)
          Y = Csv.parseNumber(B, BValStartPos, 4)
          M = (B(BValStartPos + 5) - 48) * 10 + B(BValStartPos + 6) - 48
          D = (B(BValStartPos + 8) - 48) * 10 + B(BValStartPos + 9) - 48
          H = (B(BValStartPos + 11) - 48) * 10 + B(BValStartPos + 12) - 48
          N = (B(BValStartPos + 14) - 48) * 10 + B(BValStartPos + 15) - 48
          S = Csv.parseNumber(B, BValStartPos + 17, BValLen - 17)
        
        'now the 3 Bid-Columns: BidP, BidV, BidE
        Case 1: BidP = Csv.parseNumber(B, BValStartPos, BValLen)
        Case 2: BidV = Csv.parseNumber(B, BValStartPos, BValLen)
        Case 3: BidE = Csv.parseNumber(B, BValStartPos, BValLen)
        
        'and the 3 Ask-Columns: AskP, AskV, AskE
        Case 4: AskP = Csv.parseNumber(B, BValStartPos, BValLen)
        Case 5: AskV = Csv.parseNumber(B, BValStartPos, BValLen)
        Case 6: AskE = Csv.parseNumber(B, BValStartPos, BValLen)
      End Select
    End Function
    And in case, you want to import your huge file into an SQLite-FileDB first -
    (in about 3 seconds instead of 1sec, due to DB-File-Writing) - the code for that comes below:
    Code:
    Option Explicit
    
    Implements ICSVCallback
    Private Csv As cCSV, Cnn As cConnection, Cmd As cCommand
    
    Private Sub Form_Click()
      Cls
      
      Set Cmd = Nothing: Set Cnn = Nothing 'cleanup SQLite Command- and Cnn-Objects
      
      Const DBFile = "c:\temp\stock_import.db3"
      If New_c.FSO.FileExists(DBFile) Then New_c.FSO.DeleteFile DBFile 'delete the potentially already existing DB-File
      
      New_c.Timing True
      Set Cnn = New_c.Connection(DBFile, DBCreateNewFileDB)
          Cnn.Execute "Create Table T(Y Int, M Int, D Int, H Int, N Int, S Float, BidP Float, BidV Int, BidE Int, AskP Float, AskV Int, AskE Int)"
      Set Cmd = Cnn.CreateCommand("Insert Into T Values(?,?,?,?,?,?,?,?,?,?,?,?)")
      
      Cnn.BeginTrans
        Set Csv = New_c.Csv 'create a new (CSV)TextParser-instance
            Csv.ParseFile "C:\temp\big_file.txt", Me
            'Csv.ParseFile "C:\temp\TSLA_2022_01.txt", Me
      Cnn.CommitTrans
      
      Print "CSV-SQLite-Import finished after" & New_c.Timing
      
      Caption = Cnn.GetRs("Select Count(*) From T")(0) & " Records transfered into SQLite"
    End Sub
    
    Private Function ICSVCallback_NewValue(ByVal RowNr As Long, ByVal ColNr As Long, B() As Byte, ByVal BValStartPos As Long, ByVal BValLen As Long) As Long
      'the 12 Col-Names of Table T --> Y, M, D, H, N, S, BidP, BidV, BidE, AskP, AskV, AskE
      Select Case ColNr
        Case 0 'slightly more extended parsing on the IsoDate-Column(0) (extracting: Y, M, D, H, N, S)
          Cmd.SetInt32 1, Csv.parseNumber(B, BValStartPos, 4)
          Cmd.SetInt32 2, (B(BValStartPos + 5) - 48) * 10 + B(BValStartPos + 6) - 48
          Cmd.SetInt32 3, (B(BValStartPos + 8) - 48) * 10 + B(BValStartPos + 9) - 48
          Cmd.SetInt32 4, (B(BValStartPos + 11) - 48) * 10 + B(BValStartPos + 12) - 48
          Cmd.SetInt32 5, (B(BValStartPos + 14) - 48) * 10 + B(BValStartPos + 15) - 48
          Cmd.SetDouble 6, Csv.parseNumber(B, BValStartPos + 17, BValLen - 17)
        
        'now the 3 Bid-Columns: BidP, BidV, BidE
        Case 1: Cmd.SetDouble 7, Csv.parseNumber(B, BValStartPos, BValLen)
        Case 2: Cmd.SetInt32 8, Csv.parseNumber(B, BValStartPos, BValLen)
        Case 3: Cmd.SetInt32 9, Csv.parseNumber(B, BValStartPos, BValLen)
        
        'and the 3 Ask-Columns: AskP, AskV, AskE
        Case 4: Cmd.SetDouble 10, Csv.parseNumber(B, BValStartPos, BValLen)
        Case 5: Cmd.SetInt32 11, Csv.parseNumber(B, BValStartPos, BValLen)
        Case 6: Cmd.SetInt32 12, Csv.parseNumber(B, BValStartPos, BValLen)
        
        Cmd.Execute
      End Select
    End Function
    Both Code-Snippets can be pasted into a virginal VB-Form (only a reference to RC6 is needed).

    In your shoes, I'd go with the SQLite-Imports of your huge CSV-File first -
    and then perform fast queries against it, to get certain interesting "ordered SubSets" into Recordsets,
    which you can then serialize and compress (e.g. to produce "daily streams per stock").

    The SQLite-DB-File (despite importing into 12 Integer and Float-typed Columns as: Y(ear), M(onth),... a.s.o.) -
    is roughly equal in uncompressed size, to the original CSV-Files the data was imported from.

    Olaf

  35. #35

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    apart from knowing why the padding is at the start, I should be fine with everything
    UDT padding is a bit like wizardry. But I'll try and outline it.

    The basic rule is: Any particular "item" in it will have a size. So, any/all items must start on a memory address that's a multiple of their size. Also, any full UDT (in VB6) must start on a 4-byte boundary, and must also take a multiple of 4-bytes. So, a full UDT that's 23 bytes will always actually take 24 bytes.

    And that UDT above, if that padding wasn't at the beginning, it would insert (pad) two bytes after "Minute" so that the 4 byte Single for Seconds could start on a multiple-of-four memory address. I thought it was better to pad at the beginning, rather than have that somewhat arbitrary padding in the middle. And also, the whole thing needed to be a multiple-of-four bytes long, so it was getting padded regardless, and I just took control of it.

    -----------------

    Also, I need to re-read The Trick's post to see if I can eek a bit more speed out of this thing, possibly taking control of the SafeArray locking.

    -----------------

    Also, I was thinking about this thing as I was waking up this morning. If I continue to be motivated (after reading some morning news), I might put some "file read windowing" in so it can handle any sized file.

    Also, I thought I'd write it all out to a new MDB database (managed by the DAO, as that's what I'm quickest with to get things done, and I also don't have any SQL servers installed on my new box and really don't want to).

    I'll read the above again, but I might also wrap some of wqweto's ZIP work into it if I can understand your situation.

    If possible, maybe just attach one of these ASCII ZIPped files to a post here. Or IDK, maybe share it on Google Drive and send me an IM with the link. This thing should be fairly easy to just fully automate, and just let it crunch on files all day long, and, in my conceptualization, convert each ZIP file into an MDB file. (People will "ding" me for using MDB files, but hey ho. If I process each of your ZIP files separately, the MDB files get nowhere near their 2GB limit, being binary.)
    Last edited by Elroy; Mar 7th, 2024 at 09:52 AM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  36. #36
    Fanatic Member
    Join Date
    Apr 2021
    Posts
    616

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by Schmidt View Post
    If you want to speed this up by about factor 8 (~1sec, when native-compiled, all options checked) -
    you can use the superfast CSV-Parser-Class of the RC6 (which requires much less code as well)...

    (code snipped)

    Both Code-Snippets can be pasted into a virginal VB-Form (only a reference to RC6 is needed).

    In your shoes, I'd go with the SQLite-Imports of your huge CSV-File first -
    and then perform fast queries against it, to get certain interesting "ordered SubSets" into Recordsets,
    which you can then serialize and compress (e.g. to produce "daily streams per stock").

    The SQLite-DB-File (despite importing into 12 Integer and Float-typed Columns as: Y(ear), M(onth),... a.s.o.) -
    is roughly equal in uncompressed size, to the original CSV-Files the data was imported from.
    I did consider the CSV parser, but as usual had no idea how it worked and there's little or no documentation :-P

    That said, I have RC6 already included in the project as I am using both SQLite and the compression functions...it may be worth trying out the CSV parser as an option...I see it has its own loading process, and have to wonder if (1) it is faster than HugeBinaryFile and (2) if it can handle files as large as 6GB? Also, if it isn't faster than HugeBinaryFile (which, I should point out, I have already compiled as a DLL so it is running at compiled speeds), is there any way to load the data in as a byte array (or string?) and send the data to the CSV parser?

    I went with my own processing as I knew exactly what output I needed...I wanted to build up an array of the values for the day's data and to output those arrays *separately* into different streams. The timestamp, for instance, for 2022-01-01 would be stored in a "file" (blob) called (with the reference filename in another column on same row of SQL DB) "2022-01-01-TS.txt". The reason for this is twofold, firstly because if I store by column rather than row it is much easier to compress because the data clearly has a pattern (each subsequent value would be incrementally larger by a tiny amount, in the case of the timestamp...and I confirm this in advance by sorting the data before saving) but also so I can load in ONLY the data I need into the arrays and minimise load times and memory overheads.

    I did consider importing the data into SQL to pull it out as streams, it was an overly simple option and I felt there would be something more efficient

  37. #37

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Ohhh, and just FYI, my new box isn't "bleeding edge" as that just gets too expensive. It's an Intel i7 12th Gen with a 2TB SSD, and 32GB of memory running at 4GB/s (but capable of 6GB, but I need to tweak on my BIOS to get that, and haven't taken the time). And an NVidia 4060 GPU, but that shouldn't matter for these purposes. Asus motherboard & box, as I love their stuff.

    Excluding the GPU, it doesn't sound like it's much different from yours.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  38. #38
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,454

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    I did consider the CSV parser, but as usual had no idea how it worked and there's little or no documentation :-P
    If you look at the code I'Ve posted - it should be clear how to use it...:
    0) place the CsvCallback-Function in your Form- or Class-module
    1) create an instance: Set CSV = New_c.CSV
    2) call the parsing-method: CSV.ParseFile <SomeCSVFilePath>
    3) place appropriate Code inside the Callback: Select Case ColNr ... to handle each Cell-Value


    Quote Originally Posted by SmUX2k View Post
    (1) it is faster than HugeBinaryFile
    (2) if it can handle files as large as 6GB?
    Yes to both...

    It is the fastest thing out there, to handle your use-case -
    and it needs the least amount of user-code, to handle it...

    E.g. you don't have to "manually handle smaller ByteArray-chunks" -
    the callback already gives you a proper ByteArray, along with Offset and Length for each parsed "Column-Cell".

    Olaf

  39. #39

    Thread Starter
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,909

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Quote Originally Posted by SmUX2k View Post
    The Y/M/D isn't actually needed, I discard that when processing as it is stored WITHIN a file/blob named with the date (so, more correctly, I collect it into a variable...easily doable though). The H:M:S.ms can be converted to a Long using DateDiff on the H:M:S (to get a numerical value of the number of seconds since "0:00:00") multiplied by 1000 with the ms added to it...or, if it is quicker, could just as easily perform the required maths on the H, M and S to get the same value if it is faster. With YMD removed that's 4 bytes gone per entry, and it adds 10 bytes to the overall storage by using it as the name. With HMSms turned into a Long that's 6 bytes reduced to 4. The datetime took up 10 bytes but really only needs to be 4 (Long). An alternative here, if it was in any way different speed-wise, is to store the HMS datediff as an Integer and then keep the ms in another Integer...still 4 bytes, but split into two variables. As you're probably aware, I've been working with this part of the problem for a little while now.
    I'd stay away from DateDiff, as that's going to slow you down. I'll delete the Y-M-D fields though, and also make a Single out of the time (as that makes more sense to me, and it'll preserve the floating point nature of the seconds).
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  40. #40
    Hyperactive Member
    Join Date
    Jul 2021
    Posts
    267

    Re: [RESOLVED] Converting ASCII Byte Array to Single or Double ... Fast

    Regarding the original task which is
    Converting ASCII Byte Array to Single or Double ... Fast
    This may come as a stunning surprise, but the the VB's DIY function is about 6.5 times faster than atof in a compiled (and optimized) exe!
    Code:
    Function ArrToDouble(a() As Byte) As Double
    Dim n As Double
    Dim dec As Long
    Dim mul As Long
    Dim i As Long
    Dim s As String
    i = LBound(a)
    Do While a(i)
        Select Case a(i)
        Case 46: If mul Then Exit Do Else mul = 1
        Case Is < 48: Exit Do
        Case Is > 57: Exit Do
        Case Else: n = n * 10 + a(i) - 48: If mul Then mul = mul * 10
        End Select
        i = i + 1
    Loop
    If mul Then ArrToDouble = n / mul Else ArrToDouble = n
    End Function
    Also, this function exits when a non-numeric character is encountered, so no need to replace commas with nulls.
    Last edited by Dry Bone; Mar 7th, 2024 at 10:30 AM.

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width