1 Attachment(s)
Extracting Specific Text from a string is slow
I have a text file I'm trying to pull specific data from to make a report in excel and am wanting to know if my code can be improved or if I have something wrong. If it's apparent to anyone that I should be doing something else could you tell me please. This code does work but, very slow. These reports take up to 20 minutes for about 100 files so I am wanting to decrease the time somehow. I have attached the whole sub in a text file, it's about 900 lines long so don't think I should post all of it. This is one part that seems to take the longest:
Partial code that reads the text into a string, then each line is read to find whether it holds the data or not. Inside this while loop, I am checking numerous lines to see what they hold and extract specific items. First I verify it's a interface by checking for "line protocol".
Code:
Dim str As StreamReader = File.OpenText(filefound)
While (str.Peek <> -1)
If txtLine.Contains("line protocol") = True Then 'Found an interface,
'Then a snippet that grabs a few more things, these three items are always on a single line. Then I place into excel cells.
Code:
'Get MTU, BW and DLY
If txtLine.Contains("MTU") = True And txtLine.Contains("BW") = True And txtLine.Contains("DLY") = True Then
PosX = txtLine.IndexOf("MTU") + 4 ' MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
PosY = txtLine.IndexOf("bytes,") ' ^ ^
PosY = PosY - PosX
MTU = txtLine.Substring(PosX, PosY)
oSheet.Cells(oRow, 6).value = MTU
PosX = txtLine.IndexOf("BW", 1) + 3 'MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
PosY = txtLine.IndexOf("bit", 1)
PosY = PosY - PosX - 2
BW = txtLine.Substring(PosX, PosY)
oSheet.Cells(oRow, 7).value = BW
PosX = txtLine.IndexOf("DLY") + 3
PosY = txtLine.IndexOf("usec")
DLY = txtLine.Substring(PosX, PosY - PosX)
oSheet.Cells(oRow, 8).value = DLY
End If
Here is another snippet that grabs a single value from within this same readline loop.
Code:
'Get Description
If txtLine.Contains("Description") = True Then
txtLine.Trim()
Desc = txtLine.Replace("Description", "")
Desc = Trim(Desc)
Desc.Replace(":", "")
oSheet.Cells(oRow, 5).Value = Desc.Replace(":", "")
End If
Very appreciative of any help/suggestions. Thanks.
Re: Extracting Specific Text from a string is slow
For one thing rather than multiple indexof calls to locate positions I think I would Split() the string on spaces and grab the desired elements from the resulting array should be a bit faster and easier to follow.
Re: Extracting Specific Text from a string is slow
Thank you. I don't actually ever remember using split. Do you mean something like this:
Code:
If txtLine.Contains("MTU") = True And txtLine.Contains("BW") = True And txtLine.Contains("DLY") = True Then
Dim MBD() As String
MBD = txtLine.Split(" ")
' MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
'1 2 3 4 5 6 7 8 9
oSheet.Cells(oRow, 6).value = MBD(3)
oSheet.Cells(oRow, 7).value = MBD(6)
oSheet.Cells(oRow, 8).value = MBD(9)
'PosX = txtLine.IndexOf("MTU") + 4 ' MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
'PosY = txtLine.IndexOf("bytes,") ' ^ ^
'PosY = PosY - PosX
'MTU = txtLine.Substring(PosX, PosY)
'oSheet.Cells(oRow, 6).value = MTU
'PosX = txtLine.IndexOf("BW", 1) + 3 'MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
'PosY = txtLine.IndexOf("bit", 1)
'PosY = PosY - PosX - 2
'BW = txtLine.Substring(PosX, PosY)
'oSheet.Cells(oRow, 7).value = BW
'PosX = txtLine.IndexOf("DLY") + 3
'PosY = txtLine.IndexOf("usec")
'DLY = txtLine.Substring(PosX, PosY - PosX)
'oSheet.Cells(oRow, 8).value = DLY
End If
This does work but, should I do some checks somehow to make sure the values are actually are (3), (6), and (9). I just ran a report and they do come out okay this way. And, this really cuts down on code as well. But the whole array goes up to (12) and am fairly sure they will always line up that way, just worried I might be missing something.
Re: Extracting Specific Text from a string is slow
Yes, If there is a chance the number of fields would change or be shifted then you can always test the value of one or more elements to make sure the data is where you expect it to be as well.
Re: Extracting Specific Text from a string is slow
Thank you for the help. This is definitely the way to go. no idea why I didn't ever use this before, makes me wonder what else I'm missing out on. Thanks again.
Re: Extracting Specific Text from a string is slow
I think your original method was a better approach using IndexOf and Substring. If you start to use Split you will begin creating more strings than you need per cycle which would likely consume more memory.
How large are the files you are working with?
Are you closing the StreamReader after reading each file?
You could read your files into an array and then iterate them e.g. System.IO.File.ReadAllLines(). Although i'm not sure it would be any faster.
Another thought would be to read the entire file and then use a Regex pattern to grab the data from each line. Regex can be a little tricky to start with but it is easier to read, usually requires less code, and can provide a performance boost in comparison to performing several operations multiple times over.
You could also consider the use of threading to seperate your tasks from the User Interface. If done well you can often find speed gains here.
Just a few thoughts for you to consider :)
Re: Extracting Specific Text from a string is slow
String work is slow. The more you do, the slower it is, and that's pretty much just life. Over in the .NET CodeBank I posted a profiler class. You might find it useful, as it is a means for timing various parts of some working code to see where you are spending the most time. That might allow you to compare alternative approaches, but the ultimate problem is probably going to remain: String work is slow.
Re: Extracting Specific Text from a string is slow
odd.. It decided to double up my post several minutes apart.. Sorry, and feel free to delete this.
Re: Extracting Specific Text from a string is slow
True Split() will use more memory but the amount of extra memory is trivial on a string such as this. I would think 1 call to Split() would be faster than 6 calls to IndexOF. Of course I could be wrong as I have not tested it to verify.
ReadAllLines will almost certianly be faster than reading one line at a time. It will of course use a lot more memory but if there is enough memory to handle the task it is the better option from a processing speed POV.
Re: Extracting Specific Text from a string is slow
I agree that multi threading should give you some decent performance gains. I am also curious how much time is spent parsing and how much time is spent working with excel. To start with, I would comment out all the excel code. Optimize your parsing code, then figure out the best way to get all that data into excel.
Re: Extracting Specific Text from a string is slow
Also using Jay's suggestion of using the ReadAllLines method, I setup a little benchmark using a 2000 line file. ReadAllLines was consistantly faster, so give that a try.
Code:
Dim sw As New Stopwatch
sw.Start()
Using Str = IO.File.OpenText("C:\TEMP1\test.txt")
While Str.Peek <> -1
Dim txtLine = Str.ReadLine
Dim s = txtLine.Substring(0, 1)
End While
End Using
sw.Stop()
Console.WriteLine(sw.Elapsed.TotalMilliseconds)
sw.Reset()
sw.Start()
For Each txtLine In IO.File.ReadAllLines("C:\TEMP1\test.txt")
Dim s = txtLine.Substring(0, 1)
Next
sw.Stop()
Console.WriteLine(sw.Elapsed.TotalMilliseconds)