Results 1 to 11 of 11

Thread: Extracting Specific Text from a string is slow

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Aug 2009
    Location
    Japan
    Posts
    87

    Extracting Specific Text from a string is slow

    I have a text file I'm trying to pull specific data from to make a report in excel and am wanting to know if my code can be improved or if I have something wrong. If it's apparent to anyone that I should be doing something else could you tell me please. This code does work but, very slow. These reports take up to 20 minutes for about 100 files so I am wanting to decrease the time somehow. I have attached the whole sub in a text file, it's about 900 lines long so don't think I should post all of it. This is one part that seems to take the longest:

    Partial code that reads the text into a string, then each line is read to find whether it holds the data or not. Inside this while loop, I am checking numerous lines to see what they hold and extract specific items. First I verify it's a interface by checking for "line protocol".
    Code:
    Dim str As StreamReader = File.OpenText(filefound)
                        While (str.Peek <> -1)
                            If txtLine.Contains("line protocol") = True Then 'Found an interface,
    'Then a snippet that grabs a few more things, these three items are always on a single line. Then I place into excel cells.
    Code:
    'Get MTU, BW and DLY
                            If txtLine.Contains("MTU") = True And txtLine.Contains("BW") = True And txtLine.Contains("DLY") = True Then
                                PosX = txtLine.IndexOf("MTU") + 4                           ' MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
                                PosY = txtLine.IndexOf("bytes,")                            '     ^   ^
                                PosY = PosY - PosX
                                MTU = txtLine.Substring(PosX, PosY)
                                oSheet.Cells(oRow, 6).value = MTU
    
                                PosX = txtLine.IndexOf("BW", 1) + 3                            'MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
                                PosY = txtLine.IndexOf("bit", 1)
                                PosY = PosY - PosX - 2
                                BW = txtLine.Substring(PosX, PosY)
                                oSheet.Cells(oRow, 7).value = BW
    
                                PosX = txtLine.IndexOf("DLY") + 3
                                PosY = txtLine.IndexOf("usec")
                                DLY = txtLine.Substring(PosX, PosY - PosX)
    
                                oSheet.Cells(oRow, 8).value = DLY
                            End If

    Here is another snippet that grabs a single value from within this same readline loop.
    Code:
    'Get Description
                            If txtLine.Contains("Description") = True Then
                                txtLine.Trim()
                                Desc = txtLine.Replace("Description", "")
                                Desc = Trim(Desc)
                                Desc.Replace(":", "")
                                oSheet.Cells(oRow, 5).Value = Desc.Replace(":", "")
                            End If

    Very appreciative of any help/suggestions. Thanks.
    Attached Files Attached Files

  2. #2
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,206

    Re: Extracting Specific Text from a string is slow

    For one thing rather than multiple indexof calls to locate positions I think I would Split() the string on spaces and grab the desired elements from the resulting array should be a bit faster and easier to follow.

  3. #3

    Thread Starter
    Lively Member
    Join Date
    Aug 2009
    Location
    Japan
    Posts
    87

    Re: Extracting Specific Text from a string is slow

    Thank you. I don't actually ever remember using split. Do you mean something like this:

    Code:
    If txtLine.Contains("MTU") = True And txtLine.Contains("BW") = True And txtLine.Contains("DLY") = True Then
                                Dim MBD() As String
                                MBD = txtLine.Split(" ")
                                ' MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
                                '1   2    3      4  5       6     7   8  9
                                oSheet.Cells(oRow, 6).value = MBD(3)
                                oSheet.Cells(oRow, 7).value = MBD(6)
                                oSheet.Cells(oRow, 8).value = MBD(9)
                                'PosX = txtLine.IndexOf("MTU") + 4      ' MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
                                'PosY = txtLine.IndexOf("bytes,")       '     ^   ^
                                'PosY = PosY - PosX
                                'MTU = txtLine.Substring(PosX, PosY)
                                'oSheet.Cells(oRow, 6).value = MTU
    
                                'PosX = txtLine.IndexOf("BW", 1) + 3     'MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
                                'PosY = txtLine.IndexOf("bit", 1)
                                'PosY = PosY - PosX - 2
                                'BW = txtLine.Substring(PosX, PosY)
                                'oSheet.Cells(oRow, 7).value = BW
    
                                'PosX = txtLine.IndexOf("DLY") + 3
                                'PosY = txtLine.IndexOf("usec")
                                'DLY = txtLine.Substring(PosX, PosY - PosX)
    
                                'oSheet.Cells(oRow, 8).value = DLY
                            End If
    This does work but, should I do some checks somehow to make sure the values are actually are (3), (6), and (9). I just ran a report and they do come out okay this way. And, this really cuts down on code as well. But the whole array goes up to (12) and am fairly sure they will always line up that way, just worried I might be missing something.

  4. #4
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,206

    Re: Extracting Specific Text from a string is slow

    Yes, If there is a chance the number of fields would change or be shifted then you can always test the value of one or more elements to make sure the data is where you expect it to be as well.

  5. #5

    Thread Starter
    Lively Member
    Join Date
    Aug 2009
    Location
    Japan
    Posts
    87

    Re: Extracting Specific Text from a string is slow

    Thank you for the help. This is definitely the way to go. no idea why I didn't ever use this before, makes me wonder what else I'm missing out on. Thanks again.

  6. #6
    Hyperactive Member
    Join Date
    Apr 2011
    Location
    England
    Posts
    421

    Re: Extracting Specific Text from a string is slow

    I think your original method was a better approach using IndexOf and Substring. If you start to use Split you will begin creating more strings than you need per cycle which would likely consume more memory.

    How large are the files you are working with?

    Are you closing the StreamReader after reading each file?

    You could read your files into an array and then iterate them e.g. System.IO.File.ReadAllLines(). Although i'm not sure it would be any faster.

    Another thought would be to read the entire file and then use a Regex pattern to grab the data from each line. Regex can be a little tricky to start with but it is easier to read, usually requires less code, and can provide a performance boost in comparison to performing several operations multiple times over.

    You could also consider the use of threading to seperate your tasks from the User Interface. If done well you can often find speed gains here.

    Just a few thoughts for you to consider

  7. #7
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,104

    Re: Extracting Specific Text from a string is slow

    String work is slow. The more you do, the slower it is, and that's pretty much just life. Over in the .NET CodeBank I posted a profiler class. You might find it useful, as it is a means for timing various parts of some working code to see where you are spending the most time. That might allow you to compare alternative approaches, but the ultimate problem is probably going to remain: String work is slow.
    My usual boring signature: Nothing

  8. #8
    Hyperactive Member
    Join Date
    Apr 2011
    Location
    England
    Posts
    421

    Re: Extracting Specific Text from a string is slow

    odd.. It decided to double up my post several minutes apart.. Sorry, and feel free to delete this.
    Last edited by JayJayson; Feb 24th, 2012 at 11:42 AM. Reason: Double Posted

  9. #9
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,206

    Re: Extracting Specific Text from a string is slow

    True Split() will use more memory but the amount of extra memory is trivial on a string such as this. I would think 1 call to Split() would be faster than 6 calls to IndexOF. Of course I could be wrong as I have not tested it to verify.


    ReadAllLines will almost certianly be faster than reading one line at a time. It will of course use a lot more memory but if there is enough memory to handle the task it is the better option from a processing speed POV.

  10. #10
    Code Monkey wild_bill's Avatar
    Join Date
    Mar 2005
    Location
    Montana
    Posts
    2,993

    Re: Extracting Specific Text from a string is slow

    I agree that multi threading should give you some decent performance gains. I am also curious how much time is spent parsing and how much time is spent working with excel. To start with, I would comment out all the excel code. Optimize your parsing code, then figure out the best way to get all that data into excel.
    That is the very essence of human beings and our very unique capability to perform complex reasoning and actually use our perception to further our understanding of things. We like to solve problems. -Kleinma

    Does your code in post #46 look like my code in #45? No, it doesn't. Therefore, wrong is how it looks. - jmcilhinney

  11. #11
    Code Monkey wild_bill's Avatar
    Join Date
    Mar 2005
    Location
    Montana
    Posts
    2,993

    Re: Extracting Specific Text from a string is slow

    Also using Jay's suggestion of using the ReadAllLines method, I setup a little benchmark using a 2000 line file. ReadAllLines was consistantly faster, so give that a try.
    Code:
    Dim sw As New Stopwatch
    
    sw.Start()
    Using Str = IO.File.OpenText("C:\TEMP1\test.txt")
        While Str.Peek <> -1
            Dim txtLine = Str.ReadLine
            Dim s = txtLine.Substring(0, 1)
        End While
    End Using
    sw.Stop()
    
    Console.WriteLine(sw.Elapsed.TotalMilliseconds)
    
    sw.Reset()
    
    sw.Start()
    For Each txtLine In IO.File.ReadAllLines("C:\TEMP1\test.txt")
        Dim s = txtLine.Substring(0, 1)
    Next
    sw.Stop()
    
    Console.WriteLine(sw.Elapsed.TotalMilliseconds)
    That is the very essence of human beings and our very unique capability to perform complex reasoning and actually use our perception to further our understanding of things. We like to solve problems. -Kleinma

    Does your code in post #46 look like my code in #45? No, it doesn't. Therefore, wrong is how it looks. - jmcilhinney

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width