Results 1 to 28 of 28

Thread: How to efficiently divide a string array into several arrays under certain conditions

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2014
    Posts
    341

    How to efficiently divide a string array into several arrays under certain conditions

    Code:
    Option Explicit
    Dim stopped As Boolean
    Private Sub Command1_Click()
        Me.Command2.Enabled = True
        Me.Command1.Enabled = False
        stopped = False
        Dim str As String, tpstr As String
        Dim strArray() As String
        Dim i As Long, j As Long, r As Long, d As Long
        Dim delimiter As String
        '------------- Get Source Text from File Named Text2.Text -------------
        Open App.Path & "\" & Text2.Text For Input As #1
        str = StrConv(InputB(LOF(1), 1), vbUnicode)
        Close #1
        '------------- Lines Per File Number defined by Text1.Text -------------
        r = Int(Text1.Text)
        '------------- Text of Each Line -------------
        strArray = Split(str, vbCrLf)
        i = UBound(strArray)
        '------------- Initialize ProgressBar -------------
        ProgressBar1.Max = i + 1
        ProgressBar1.Min = 0
        ProgressBar1.Value = 0
        '------------- File Index -------------
        d = 1
        '------------- Processing -------------
        For j = 0 To i
            DoEvents
            If stopped = True Then
                Exit For
                MsgBox "Dividing Finished!"
            End If
            tpstr = tpstr & strArray(j) & vbCrLf      'this needs to be optimized!!! costing huge resources!!!
            ProgressBar1.Value = ProgressBar1.Value + 1
            If j Mod r = 0 And j <> 0 Then
                Open App.Path & "\" & "D" & d & ".txt" For Output As #2
                Print #2, tpstr
                Close #2
                d = d + 1
                tpstr = ""
            End If
        Next
    
        MsgBox "Dividing Not Finished!"
    
        Me.Command1.Enabled = True
        Me.Command2.Enabled = False
    
    End Sub
    
    Private Sub Command2_Click()
        stopped = True
    End Sub
    I know problem is about the string concatenation in a huge loop

    tpstr = tpstr & strArray(j) & vbCrLf 'this needs to be optimized!!! costing huge resources!!!
    Last time I was asking almost the same question here, the probable best answer is to use Join function. But, this time, I need to get serveral arrays as opposed to one array returned by Join function.

    The source text file is probably around 100 megabytes containing more than 1'000'000 lines and shall be divided into a certain number of files based on predesignated "length" of each of them.

    And the form looks like this:

    Name:  qq.jpg
Views: 637
Size:  35.5 KB

  2. #2
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: How to efficiently divide a string array into several arrays under certain condit

    looks like something that should be done using copymemory API

    i am sure there will be many examples in this forum or google
    http://www.vbforums.com/showthread.p...ltiple-arrays-)
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  3. #3
    Frenzied Member
    Join Date
    Jun 2006
    Posts
    1,098

    Re: How to efficiently divide a string array into several arrays under certain condit

    Simplest solution: Avoid the string concatenation by simply Printing each line to its new file.

    Best solution: Use Instr to find each line break, then use Mid to get each chunk from the source string.

    Code:
    strSource = input from file
    StartPos = 1
    Pos = Instr(1, strSource, vbCrLf)
    Do While Pos > 0
      LineCount = LineCount + 1
      If LineCount = MaxLines Then
        strChunk = Mid$(strSource, StartPos, Pos - StartPos)
        ff = FreeFile
        Open output filename For Output As #ff
        Print #ff, strChunk;
        Close #ff
        LineCount = 0
        StartPos = Pos + 2
      End If
      Pos = Instr(Pos + 2, strSource, vbCrLf)
    Loop
    Last edited by Logophobic; Dec 20th, 2014 at 08:19 AM. Reason: Added code

  4. #4
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    String Array means an array of strings like a paragraph. A vbcrlf is a paragraph break, and needed only if we have one string text.
    When we read from a file line by line we use "line input". (this is not for unicode text but I have the unicode method - using binary mode to read the file, but with an interface like input line- in M2000 source, so you can find it and take it from there).
    So we feed arrays using strings without special chars on it. So the problem is how you can define multiple arrays at run time.
    This can be done by using class. Why? Because a class is an object prototype, so we can define new objects using that prototype and for each object we have members as arrays and variables and also methods and properties. Methods are like subroutines but can handle arrays and other members that we have define for our specific object (not for prototype). So we can have two or more objects with the same set of arrays and variables but with different bound values for arrays (upper and lower bounds) and different values. That kind of difference showing that an object is something that exist, and can be change or die..Die means that no reference to that object exist so the system erase it.
    So we have to put all objects in a collection and mainly for that scope we have collections. No data has a meaning if we have no key on it. So collection is a structure that holds data, as items consist of one object and one key. So our object can hold anything...as a unit. We can refer to an object in a collection without extract it from that collection, simply by using SET to set the reference. We can cut that reference setting the nothing object (SET A= Nothing). A collection holds references to objects too. So We can get an object from a collection, delete the membership by removing the key, without moving any data from object, just we keep reference to a variable and delete a reference to a member of a collection (called item).

    The play of references are big because we can hold references in variables in an object. The bad situation is to have a circular reference..C has a reference in B where B has a reference in A and C has a reference to in A. So if no variable of our main code hold a reference to an object..that objects never dies...Our application cannot terminate...So before we delete a reference we have to know that any reference inside object must turned to nothing. In a circular reference we have no terminate event, so even we set in terminate event objects variables to reference to nothing, that event never activated. So we can do something more drastic. We can call a method that is prepared by us called shutdown (I meaningful name), to make all references to nothing. (ourobject.shutdown : Set ourobject = Nothing). In a circular reference this not ensure that this object die, but has the ability to die...because eventually all other objects come to shutdown...so all the references to first shutdown object can be set to nothing, so that object can die.

    To make a class is very easy.
    To make a new class, to get memory in simply words, you can do in one of three ways:
    1) By using dim like DIM A as new myClass
    2)By using a DIM A as myClass without make a memory, only you say that A is a type myClass reference
    then in your code you can set a reference
    Set A= New Myclass
    3)By using a generic form DIM A as Object (you can also write DIM A as Variant or DIM A which means AS VARIANT)
    3.1) as Object is not equal as VARIANT because only objects (references) we can place
    3.2) Set A = new myClass
    Only reference going to A. An object is in memory, not in any variable, not even in a variable of type myClass. Variables holds only references. That is the vb way to use pointers. But instead of use offsets to pointers we use names of properties or public variables, and better they are carrying functions. We can make prototype classes as interfaces so any new class prototype can implements an interface, just implements all of the properties and methods. So we can make reference variables for an interface and we can place various objects from any implementation of the basic class (the class that other classes implements).

    So for your array you can have a class with an array. A method to add item, a property to read or write item by index, a method to perform shutdown (perhaps to erase the array). Also you can have a collection or if you don't like make a second class like the one before but instead of strings make it for objects. In shutdown method you have to walk through objects, make a shutdown call and then set to nothing, and when finish erase the array. Maybe an array without class is the same, but in any case the shutdown routine has a seat in your module.
    To output the classes with string arrays...is easy just make a method that you provide only the index and the file number...and use in the method, the print #filenumber, arr$(index) this will put a vbcrlf at the end.
    Last edited by georgekar; Dec 20th, 2014 at 09:08 AM.

  5. #5
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    My question would be, why you'd want to split this large file into a lot of smaller ones in the first place?

    I mean, you're eventually making a full data-copy on disk - and the reason for that is not yet clear.

    I can only guess, that it's not for Backup-purposes (since your current code is placing the
    bunch of split-up file-copies in the same Path as the larger source-file) - so is it perhaps, to
    be able to "Load and view these files more conveniently" (in a normal VB-Control or something)?

    In that case (if it's for Viewing-purposes) - and you want to offer your User a "scrollable view"
    on these large "LogFiles" - that would be quite easy to accomplish with any "virtual Listbox"
    or "virtual ListView" or "virtual Grid" - there's a few of them out there on the Web for grabbing.

    And to feed such a virtual "LogViewer-Control" the right Data (the current lines of the huge
    Source-File) in its Scroller-triggered "OwnerDraw-Event", you will only need to calculate
    the LineOffset-Positions (as well as the Line-Count) on the Source-File in question.

    And that can be done quite efficiently on a ByteArray directly (no need to convert your File into
    a String first, which would double the memory you need - and also no need to Split this VB-String).

    Just tested that here on a File which was 250MB in size, containing 5Mio lines - and native
    compiled such a Line-Analysis-Loop on a ByteArray took only 550msec - on your FileSize of
    ~100MB - and with more "Chars-per-Line" (~1Mio lines in the file) I expect this routine to
    return after about 200msec - so there's no need to show even a Progressbar - you just select
    this large logfile - and can view (and scroll) the results in your virtual Grid after those 200msec.

    If that matches your requirements, I could post the small Class which does this Line-analysis,
    and would play well together (as the Data-Feeder-part) with a virtual List-Control.

    Olaf
    Last edited by Schmidt; Dec 20th, 2014 at 12:51 PM.

  6. #6
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: How to efficiently divide a string array into several arrays under certain condit

    Or if you are just trying to split up a large file into smaller files of n lines each... then the "slurp'n'split" technique really falls apart on large files anyway.

    Instead maybe something more like this will improve performance:


    Name:  sshot1.png
Views: 417
Size:  8.1 KB

    Test file created


    Name:  sshot2.png
Views: 425
Size:  8.4 KB

    Split into files of 10,000 lines max each


    This is basically a variation on Lickety - an alternative to Slurp'n'Split.
    Attached Files Attached Files
    Last edited by dilettante; Dec 23rd, 2014 at 12:53 PM. Reason: reposted corrected attachment

  7. #7
    PowerPoster SamOscarBrown's Avatar
    Join Date
    Aug 2012
    Location
    NC, USA
    Posts
    9,176

    Re: How to efficiently divide a string array into several arrays under certain condit

    "slurp'n'split"
    When I started reading this post, I could only think of one thing...only dilettante would come up with something like "slurp'n'split". But, after reading the rest, I see that you didn't actually coin this phrase, just borrowed it.

  8. #8
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by SamOscarBrown View Post
    When I started reading this post, I could only think of one thing...only dilettante would come up with something like "slurp'n'split". But, after reading the rest, I see that you didn't actually coin this phrase, just borrowed it.
    Well, despite all the ranting about "slurp'n'split" (how it will give you even bad breath and things) -
    I consider it still the best technique for approaching Line-Handling in VB6 on smaller and midsized
    TextFiles (up to 100000 lines of text either faster - or not much slower than the alternatives)
    and will continue to recommend it to Newbies, who will perhaps need years to (or will never)
    encounter its limitations (since most will perhaps work with a DB now, instead of their first
    experimental "sudents.txt"-files I guess).

    Olaf

  9. #9
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: How to efficiently divide a string array into several arrays under certain condit

    I'm a lot more interested in whether the code is bug-free and whether it improves performance enough to justify the extra complexity.

    As far as being free of sneaky or hazardous bugs I can't be sure, though I haven't found any yet. As for performance though it seems to do quite well as far as I can tell.


    As for opinions ... mine are subject to change when somebody shows me reason to. But their blind disagreement with them has no impact whatsoever.

    This thread is all about performance issues handling a large input file. Trying to wish them away invoking the ghost of databases is pointless.

    So I'm not sure how slurp'n'split can be advocated except as a poor newbie habit to fall into that will kill you down the road. I thought that sort of thing had died everywhere except on the island of misfit toys (i.e. the dead NNTP group microsoft.public.vb.general.discussion).

  10. #10

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2014
    Posts
    341

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by dilettante View Post
    Or if you are just trying to split up a large file into smaller files of n lines each... then the "slurp'n'split" technique really falls apart on large files anyway.

    This is basically a variation on Lickety - an alternative to Slurp'n'Split.
    After some small modifications, I got this. Perfect works for me.

    Name:  q.jpg
Views: 290
Size:  48.7 KB

    I tried to delve more into the codes but it got a little difficult.
    I think I might need more time to eventually build something "speedy" on my own.

  11. #11
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by bPrice View Post
    I think I might need more time to eventually build something "speedy" on my own.
    Still don't know, *why* you would want to split your original (large) File into a lot of smaller ones.

    Further above you mentioned, that your original Files can be 100MByte in size, having about 1Mio lines of text
    (which is roughly 100 Characters per line).

    When you split such a file up into "SubFiles" with only 200 lines, then you will end up cluttering your
    system with about 1Mio/200 = "5000 new extra-TextFiles" in the appropriate Folder.

    Keep in mind, that each File (even when it contains zero Bytes) will (usually) take away 4KB from the FileSystem
    (in the above case, with the 5000 new files, this would amount to ~20MB total).

    So, in case this is for "easier, managable viewing of a large files Text-Content", there's *much*
    faster methods to approach this.

    Olaf

  12. #12
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    Maybe a small file with seek pointers is a better way to have indexes to pages...So we can move the file cursor to the start byte and we can check in the reading process not to found end of file but if we reach the next pointer.

  13. #13
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by bPrice View Post
    After some small modifications, I got this. Perfect works for me.
    I took a hard look and tested more "edge cases" and sure enough I found a bug. So while in there I did (and retested) some other minor tweaks, and then reposted above in post #6.

    I have no idea if the bug was the reason you had to change it, but it could be.


    This is probably the best reason to avoid such an approach: it can be tricky to get everything just right. And I'm still not certain it is correct in all cases, though it seems to be a lot closer if not error-free yet.

  14. #14
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by dilettante View Post
    As for opinions ... mine are subject to change when somebody shows me reason to.
    Well, let's see - consider this simple code here:
    Code:
    Private Function GetLinesFromANSIFile(FileName As String, Optional LSep$ = vbCrLf) As String()
      With CreateObject("ADODB.Stream")
        On Error GoTo 1
          .Type = 1 'adTypeBinary
          .Open
          .LoadFromFile FileName
          GetLinesFromANSIFile = Split(StrConv(.Read, vbUnicode), LSep)
    1:    .Close
        If Err Then Err.Raise Err.Number, Err.Source, Err.Description
      End With
    End Function
    So the above is, what you christened with "Slurp'nSplit" (in a version, which allows also Unicode-Filenames).

    In my book, this is a nice little Function with the following advantages:
    - it is "Newbie-friendly", because the code is short and simple, relative easy to understand
    - it behaves with a decent performance (not much different from your "lickety-split" up to 100,000 lines of Text in a given file
    - FileHandles are closed properly, when the System is not able to allocate enough memory (that's a bug in your "lickety-split" right there).

    So, in case we want to argue reasonably...

    What on earth is wrong with a function, which works decently and fast for about 99% of all TextFile-Types
    any Developer (Newbie or not) will ever encounter?
    (as e.g. *.txt, or code-files as *.frm, *.cls, *.bas, *.cs, *.c, *.cpp, *.css, *.js)

    E.g. when I take one of the largest code-files I so far encountered (the "amalgamated" sources of the SQLite-DBengine,
    compilable in a single 'go' -> on 'sqlite3.c') - this File is ~5.3MB in size - and contains ~153000 lines - and the
    above function will read it in completely - and split it happily after only 0.14seconds total.

    You might argue, that there's a whole lot more Text-Files which are *much* larger in size,
    containing many more lines than 150000 - but which ones do you have in mind?
    PostScript or PDF-Files? - there's dedicated parsers for that...
    Huge XML-files? - there's dedicated parsers for that...
    Huge (static) HTML-files? - there's dedicated parsers for that...
    Huge RTF-Files? - there's dedicated parsers for that...

    What remains is perhaps huge Log-Files - but these can easily reach sizes > 2GB, so one would need to write
    something more sophisticated in either case (VBs normal File-Functions not being able to open these monsters anyways).

    So, in case we want to stay decent, let me bring a sentence to your attention, written by
    a quite respected developer, you at least should take seriously:
    One should be: "...a lot more interested in whether the code is bug-free and whether it improves performance enough to justify the extra complexity"

    Well, I think you know, who wrote that <g> - and I simply can't find a reason, why the above
    logic shouldn't be applied to the small "Slurp'nSplit" function I posted above.

    It's simple and bug-free - and will work for 99% of all TextFile-LineSplit-scenarios one will ever encounter.

    Quote Originally Posted by dilettante View Post
    This thread is all about performance issues handling a large input file.
    Trying to wish them away invoking the ghost of databases is pointless.
    I didn't wish anything away dile - the OP so far has not posted anything about the background
    of his current scenario - there could be quite some alternatives to what you posted, once we know it
    (DBs included).

    Quote Originally Posted by dilettante View Post
    So I'm not sure how slurp'n'split can be advocated except as a poor newbie habit ...
    Gosh, here he goes again...

    Olaf

  15. #15
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: How to efficiently divide a string array into several arrays under certain condit

    For small files people should just use Line Input and be done.

  16. #16
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: How to efficiently divide a string array into several arrays under certain condit

    Not too bad, only 3 to 4 times as long to split the files using regular old line-oriented I/O. Of course that's potentially with data in the cache, so you'd have to create the file, reboot, then try to split for a more accurate comparison.
    Attached Files Attached Files

  17. #17
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    I think the loading of a big file is a bad function. A big file for process can be read partial. For view if only forward movement allow, only new lines for view, then partial load is ok. So we need a way to hold some pages in memory, for backward movement. For only one reason we want to use a big text file in memory....only for edit. So we can do that with a wordpad....

  18. #18
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by georgekar View Post
    I think the loading of a big file is a bad function.
    He was only trying to split a large file up into smaller files. So in that case there isn't any need to have the entire file in memory at once.


    Why he wants to do this (what he will do with the smaller files) is another question. The answer to that might suggest an entirely different solution, but file splitting was what he asked about.

  19. #19
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by dilettante View Post
    For small files people should just use Line Input and be done.
    Nope - they shouldn't - bad advise actually.

    For several reasons, let me explain:

    Take again the small Slurp'N'Split function I posted already above (slightly enhanced by an additional Optional Param).

    Code:
    Private Function GetLinesFromANSIFile(FileName As String, Optional LSep$ = vbCrLf, Optional ByVal LCID&) As String()
      With CreateObject("ADODB.Stream")
        On Error GoTo 1
          .Type = 1 'adTypeBinary
          .Open
          .LoadFromFile FileName
          GetLinesFromANSIFile = Split(StrConv(.Read, vbUnicode, LCID), LSep)
    1:    .Close
        If Err Then Err.Raise Err.Number, Err.Source, Err.Description
      End With
    End Function
    Nice, short - easy to understand.

    Now compare with its counterpart, which is using your suggested Line-Input functionality:
    Code:
    Private Function GetLinesFromANSIFile2(FileName As String) As String()
    Dim i As Long, FNr As Long, Lines() As String
       ReDim Lines(0 To 15) 'pre-allocate space for 16 lines
       FNr = FreeFile
       On Error GoTo 1 'let's ensure, that the FileHandle is closed properly in case of an error
       Open FileName For Input As FNr
         Do Until EOF(FNr)
           Line Input #FNr, Lines(i)
           i = i + 1
           If i > UBound(Lines) Then ReDim Preserve Lines(0 To i + i) 're-allocate in larger chunks
         Loop
    1: Close FNr
       ReDim Preserve Lines(0 To i - 1)
       If Err Then Err.Raise Err.Number, Err.Source, Err.Description
       GetLinesFromANSIFile2 = Lines 'return the String-Array in the last line, to avoid a copy
    End Function
    So, let's compare the two functions now:

    Your suggestion will result in:
    - longer, a bit more complex code - not that easy to understand for a Newbie
    - it is about 30% slower than the Slurp'n'Split alternative (tested against a file with 100,000 lines)
    - it does not support any ANSI-Decoding other than the System-Default
    - it will choke on Unicode-Filenames
    - it will choke also on *nix Line-Separators as vbLF

    So, I hope you begin to understand better now, why I find nothing wrong with the Slurp'N'Split approach.

    Olaf

  20. #20
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    Olaf,
    this function GetLinesFromANSIFile = Split(StrConv(.Read, vbUnicode, LCID), LSep) hold twice the memory of text or not???

  21. #21
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by georgekar View Post
    Olaf,
    this function GetLinesFromANSIFile = Split(StrConv(.Read, vbUnicode, LCID), LSep) hold twice the memory of text or not???
    Sure, for a short moment - yes.

    Where's the problem, when the performance is nevertheless comparable with more complex solutions?

    And don't bring the argument, that it could possibly choke earlier than other approaches,
    with an "out of memory-error"... If you do, then I'd surely like to know, against which
    kind of TextFiles you will plan to run this function in the end... besides, we live in 2014 now,
    working on systems which have at least 2GB of RAM installed.

    Olaf

  22. #22
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    Don't say that " at least 2GB of RAM installed". In my Virtualbox I give 1 Gbyte...for Xp and the same for Widnows 7 (both can run...the same time...in Ubuntu Studio, AMD6100 4Gbyte ram). I have installed XP as my old os in a hard disk but I don't want to shut one and go to other..so from one os I have 3. Also from XP I can use printers through usb...(virtual box get the usb printer for XP)

    Your solution is not bad at all. ADODB.Stream is always around. .LoadFromFile FileName use a unicode path..that's good. So from technology aspect, this is what he need. From learning purposes this is bad. Because, in another environment...ADODB.Stream will be not there. So we have to place the universal way, like line input, and the technology way...The second to do the job, and the first to learn and understand what is a file..(a little now, some day more and better).

  23. #23
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by georgekar View Post
    Don't say that " at least 2GB of RAM installed". In my Virtualbox ...
    Ok, then make that 1Gb - or even make it 500MB RAM in your VM - I still say, that the performance
    is not any different from other approaches, when we work with "normally sized TextFiles".

    Seriously, for anything above 10MB I'd like to know the FileType and the usage-scenario for this kind of "TextFile".

    Quote Originally Posted by georgekar View Post
    .LoadFromFile FileName use a unicode path..that's good.
    So from technology aspect, this is what he need.
    From learning purposes this is bad.
    I don't understand - *especially* from a viewpoint of "learning best practice",
    one should not teach or encourage the usage of these old, non-unicode aware
    VB File-Functions.

    There's several alternatives to ADODB.Stream or the Scripting.FileSystemObject -
    but these two come with Windows and are already there as COM-libs, and despite its
    older FileFunctions, VB is still quite good as a COM-wiring-tool, connecting to and
    interacting with these Libs.

    The alternative to them would be, to either use the GetShortPathNameW-workaround
    (although that wouldn't solve the problem, that VBs Line-Input couldn't be used against
    Unicode-Content) - or write your own little File- or Stream-Class (using the appropriate W-APIs) -
    but that's stuff one could teach perhaps in an "advanced class".

    Any of your teenage pupils will immediately agree, that the world is "more connected now"
    with Internet available even on most phones - Files get exchanged much more across countries
    than in the last decades - so Unicode-support is important.

    Besides, most other languages will have Stream-Objects available to work with files -
    so why not teach the ADO-Stream-Object - it's usage is quite similar to using FileStreams
    in other languages (it even supports UTF8-decoding directly, with the right Property-setting).

    Olaf

  24. #24
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    For M2000 I make a Line Input for unicode (the windows format..not all types), and many other.

    Here OPEN command use the WIDE to handle unicode. So Line Input can work for ANSI if we use OPEN without WIDE

    Code:
    MODULE A {
          a=1 
          a$="Американские суда находятся в международных водах."
          b=2
          open dir$+"this.txt" for wide output as f
          list f       'send list of variables in file f
          close #f
    }
    MODULE B {
          open dir$+"this.txt" for wide input as f
          while not eof(f) {
                line input #f, a$
                print a$
          }
          close f
    }

    Code:
    MODULE A {
          a$="Американские суда находятся в международных водах."
          open dir$+"uni2.txt" for wide output as k
          write #k, 1.1312,-.2e-4, a$, sqrt(3), 4**2
          close k 
    }
    MODULE B {
          open dir$+"uni2.txt" for wide input as k
          line input #k, a$
          print a$
          close k 
    }
    MODULE C {
          open dir$+"uni2.txt" for wide input as k
          input #k,  a, b,  a$, c ,e
          print a, b, a$, c, e
          close k 
    }
    Last edited by georgekar; Dec 24th, 2014 at 01:19 AM.

  25. #25
    Hyperactive Member Daniel Duta's Avatar
    Join Date
    Feb 2011
    Location
    Bucharest, Romania
    Posts
    397

    Re: How to efficiently divide a string array into several arrays under certain condit

    I just wonder if the ADODB.Stream approach could be also used to read UDT arrays stored as binary ?

  26. #26
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,253

    Re: How to efficiently divide a string array into several arrays under certain condit

    Quote Originally Posted by Daniel Duta View Post
    I just wonder if the ADODB.Stream approach could be also used to read UDT arrays stored as binary ?
    Everything is storable using a Stream, you just need to serialize it properly ...

    As for VBs older FileFunctions and their nice capability to save (and auto-serialize) UDTs -
    there's a thread where I pointed out a few workarounds with regards to:
    - forcing VBs Open command to accept Unicode-Filenames
    - and how to treat the String-Members of such UDTs, so that the AutoSerializing doesn't apply its Auto-ANSI-conversion
    http://www.vbforums.com/showthread.p...=1#post4782037

    If you're interested and have a concrete example, we could also discuss other methods, how to work with
    hierarchical data-representations in VB - and how to serialize them into ByteArrays or Streams.

    But I'd suggest to open a new thread for that.

    Olaf

  27. #27
    Frenzied Member
    Join Date
    May 2014
    Location
    Kallithea Attikis, Greece
    Posts
    1,289

    Re: How to efficiently divide a string array into several arrays under certain condit

    There is a reply for #19, or how can we use line input (a custom one)...for any kind of text, ANSI and UNICODE...using our own buffer to speed up the process.

  28. #28
    Frenzied Member
    Join Date
    Nov 2010
    Posts
    1,470

    Re: How to efficiently divide a string array into several arrays under certain condit

    way too much to read...

    i would think that simply reading in the text and count the characters fetched.
    if the accumulated count is less of equal to the divide by number write out to file
    if greater the make new file and set the accumlated value to the character fetched count and go back and fetch some more

    it looks like an old cobal process.

    here to talk

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width