Results 1 to 3 of 3

Thread: Grouping strings in HashSet to write to file

  1. #1

    Thread Starter
    New Member
    Join Date
    Aug 2014
    Posts
    10

    Grouping strings in HashSet to write to file

    I my program I deal with large quanities of strings and have recently begun looking for a way to select a chosen amount of strings at a time, write them to a .txt and do the same with the next group until the end. Below is what I have figured out so far. I use a label and a listbox to govern the amount.

    Code:
        Dim Vault As HashSet(Of String) = New HashSet(Of String)()
    
    Private Sub Counter_TextChanged(sender As Object, e As EventArgs) Handles Counter.TextChanged
            If Counter.Text = 100000 Then
                iFile = "\File(" + fileNum.ToString + ").txt"
                Dim sb As New System.Text.StringBuilder()
                For Each o As Object In ListBox1.Items
                    sb.AppendLine(o)
                Next
                System.IO.File.WriteAllText(vaultDir + iFile, sb.ToString())
                fileNum = fileNum + 1
                ListBox1.Items.Clear()
            Else
            End If
        End Sub
        Private Sub sort()
            Counter.Text = 0
            iFile = "\File(" + fileNum.ToString + ").txt"
            For Each line In Vault
                ListBox1.Items.Add(line)
                Counter.Text = Counter.Text + 1
            Next
            fileNum = fileNum + 1
            iFile = "\File(" + fileNum.ToString + ").txt"
            Dim sb As New System.Text.StringBuilder()
            For Each o As Object In ListBox1.Items
                sb.AppendLine(o)
            Next
            System.IO.File.WriteAllText(vaultDir + iFile, sb.ToString())
            fileNum = fileNum + 1
            ListBox1.Items.Clear()
        End Sub
    How can I do this cleaner and more efficently ? Please?

  2. #2
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,578

    Re: Grouping strings in HashSet to write to file

    I don't see any code to do any kind of activity I would call "grouping", so I'm sort of confused by the question.

    What this code seems to do also confuses me. I think what hapepns if you call sort() is:

    1. Each string in the HashSet is added to a ListBox. Each time this happens, a text box's text is updated. This starts a little different path:
      1. If the text is "100000" then:
      2. A (potentially large) String is created.
      3. All of the text is written to a file.
      4. The items in the list box are cleared.
    2. A file name is created.
    3. A (potentially large) String is created.
    4. The string is written to the file.
    5. The ListBox is emptied.

    I get it now. You want to write 100,000 items to each file, and "the rest" to the last file.

    But you have a problem I see a lot of people have when they first start out. I call it "using controls as variables". You are using a TextBox to keep track of what is basically a loop iteration, and a ListBox to function like an array. A lot of new people get really used to "controls store information" and don't seem to realize variables are a cheaper way to store information. I often see very elaborate forms with dozens of invisible controls, and it quickly becomes unmanageable.

    You don't need to use a TextBox to store a loop counter. Use an Integer variable. You don't need to use a ListBox to store a lot of Strings. Use a String array or a List. It turns out ListBoxes are REALLY slow once you start adding a lot of items: every time a new item is added it redraws EVERY item, even the ones that aren't visible. And most professionals use the setting "Option Strict On", which would make this line not work:
    Code:
    If Counter.Text = 100000 Then
    Counter.Text is a String. 100000 is an Integer. Because they are different types, professionals prefer VB to point this out as a compiler error. We want to either have to convert the String to an Integer or the Integer to a String so that we compare String to String or Integer to Integer. We make less mistakes if we are forced to do that.

    Anyway, let me show you a different expression of your code, and explain how I reorganized it. Then I'll make it even easier.
    Code:
    Private Sub CreateFiles(ByVal input As HashSet(Of String))
        Dim fileIndex As Integer = 1
        Dim currentItems As New List(Of String)(100000)
    
        For Each item In input
            currentItems.Add(item)
    
            If currentItems.Count = 100000 Then
                SaveItems(currentItems, fileIndex)
                currentItems.Clear()
                fileIndex += 1
            End If
        Next
    
        If currentItems.Count > 0 Then
            SaveItems(currentItems, fileIndex)
        End If
    End Sub
    
    Private Sub SaveItems(ByVal items As IEnumerable(Of String), ByVal fileIndex As Integer)
        Dim fileName = String.Format("File({0}).txt", fileIndex)
        Dim filePath = Path.Combine(vaultDir, fileName)
    
        File.WriteAllLines(filePath, items)
    End Sub
    You start by calling CreateFiles, passing it the HashSet you'd like it to save. Something like:
    Code:
    CreateFiles(Vault)
    It sets a file index to 1, and initializes a new list and tells it to go ahead and expect 100,000 items.

    Each item is added to the list. If the list reaches 100,000 items, we save the list, then clear the list and update the file index. When we finish with the loop, we save any remaining items.

    To "save a file", we take a list of items, build the file name, then write each item to its own line in the file.

    If you feel adventurous, LINQ is a sort of more natural expression:
    Code:
    Private Sub CreateFiles(ByVal input As HashSet(Of String))
        Dim fileIndex As Integer = 1
        Dim skipAmount As Integer = 0
    
        While True
            Dim items = vault.Skip(skipIndex).Take(100000)
            If items.Any()
                SaveItems(items, fileIndex)
                fileIndex += 1
                skipIndex += 100000
            Else
                Exit While
            End If
        End While
    End Sub
    Skip() will ignore items up to the index specified. Take() will take items up to the number specified. This works really well because they both handle "out of range": if you try to Skip() past the end you get "an empty set" and if you try to Take() past the end you get "as much as I could get". That's why before calling SaveItems(), the Any() method is used to make sure there's some items to save.
    This answer is wrong. You should be using TableAdapter and Dictionaries instead.

  3. #3

    Thread Starter
    New Member
    Join Date
    Aug 2014
    Posts
    10

    Re: Grouping strings in HashSet to write to file

    This is exactly what I was looking to do. Sorry my terminology is still developing as with my skills. This is the next level of handling strings for me. Thank you kind sir for the time it took you to answer ever so clearly. Props on seeing what I was trying to do as well.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width