|
-
May 10th, 2008, 09:56 AM
#1
Thread Starter
Lively Member
[Resolved] Fastest way to comapre lines in a txt files
i would like to know whats the fastest way to compare lines in txt files? ( assume both files have around 1 million lines ) (ya im comparing registry files)
This is how im planning to do it atm:
Code:
My.Computer.FileSystem.ReadAllText(BOTH FILES)
Put all text from file 1 & from file 1 into array:
Dim array1() As String
array1 = Split(file1, vbNewLine) etc. for array2
Loop through both arrays & compare lines:
Last edited by goldenix; May 10th, 2008 at 12:58 PM.

M.V.B. 2008 Express Edition
-
May 10th, 2008, 10:34 AM
#2
Re: Fastest way to comapre lines in a txt files
First up, you wouldn't call ReadAllText and Split when you can just call File.ReadAllLines.
Secondly, for very large files that's going to be a lot of memory being used. I guess that might be OK if speed is the primary concern though.
Finally, what exactly are you trying to achieve? It would be faster to compare the raw binary data, rather than convert it to strings and compare them, if that would be possible within the constraints of your application.
-
May 10th, 2008, 11:14 AM
#3
Thread Starter
Lively Member
Re: Fastest way to comapre lines in a txt files
 Originally Posted by jmcilhinney
Finally, what exactly are you trying to achieve? It would be faster to compare the raw binary data, rather than convert it to strings and compare them, if that would be possible within the constraints of your application.
I presed on a "bad button" in sertain app & It screwed up something in my registry, So im going to emulate a clean op sys & backup registry first, then i will screw up the registry in that op sys & then compare 2 registry files before.reg & after.reg to see what values were changed. this is why I need to compare lines

M.V.B. 2008 Express Edition
-
May 10th, 2008, 11:17 AM
#4
Re: Fastest way to comapre lines in a txt files
If this is a one-off thing then who really cares about speed? You could just run it overnight if it takes a long time. There's not much point tuning code that's going to be run once. As long as it works it doesn't really matter because you'd probably spend more time tuning it than you'd save.
-
May 10th, 2008, 12:08 PM
#5
Thread Starter
Lively Member
Re: Fastest way to comapre lines in a txt files
 Originally Posted by jmcilhinney
If this is a one-off thing then who really cares about speed? You could just run it overnight if it takes a long time. There's not much point tuning code that's going to be run once. As long as it works it doesn't really matter because you'd probably spend more time tuning it than you'd save.
Im still a beginner at Vb so its good to know for the future:
Hire is the code i came up, thanx to your help jmcilhinney: ( the little problem, is that both files must have same amount of lines, or it will error, cuz 1 array will be bigger than the other, im still trying to figure out how to solve this, if i do ill edit the code. )
Code:
Dim array1() As String
Dim array2() As String
array1 = File.ReadAllLines("1.reg")
array2 = File.ReadAllLines("2.reg")
' check what file/array has more lines
Dim morelines As Integer
If UBound(array1) > UBound(array2) Then
morelines = UBound(array1)
Else
morelines = UBound(array2)
End If
'Loop Thru the arrays & compare values
For i = 0 To morelines
If Not array1(i) = array2(i) Then MsgBox("Before: " & array1(i) & Chr(13) & "After: " & array2(i))
Next

M.V.B. 2008 Express Edition
-
May 10th, 2008, 09:18 PM
#6
Re: [Resolved] Fastest way to comapre lines in a txt files
That code is not really going to do what you want. Imagine this. Your "before" file has 1000 lines. Your "after" file has those same 1000 lines but has an extra line inserted at the very beginning. Code like that will highlight every single line as different, even though all 1000 original lines are still there and unchanged. What you want to do is far more complex than that code.
-
May 11th, 2008, 06:23 AM
#7
Thread Starter
Lively Member
Re: [Resolved] Fastest way to comapre lines in a txt files

M.V.B. 2008 Express Edition
-
May 11th, 2008, 06:24 AM
#8
Thread Starter
Lively Member
Re: [Resolved] Fastest way to comapre lines in a txt files
 Originally Posted by jmcilhinney
That code is not really going to do what you want. Imagine this. Your "before" file has 1000 lines. Your "after" file has those same 1000 lines but has an extra line inserted at the very beginning. Code like that will highlight every single line as different, even though all 1000 original lines are still there and unchanged. What you want to do is far more complex than that code.
Yup it hit me too. So i came to conclusion:
Skip all empty lines.
1) Take a line1 from after.reg & start searching before.reg for this line.
2) If line was found, search for line2 etc..( but start searching from next line & write your searching position to log file just in case )
3) If line was not found ( Write line & line nr to log file. & continue searching for next file )
Now we can also theoretically split both arrays, lets say into 50 beginning positions & start 50 new threads (this will finish up 50 times faster but will be more complicated code)

M.V.B. 2008 Express Edition
-
May 11th, 2008, 08:00 AM
#9
Re: [Resolved] Fastest way to comapre lines in a txt files
50 threads can only be 50 times faster than a single thread if you have 50 processor cores to execute them on. On a single-core processor you can only do one thing at a time so multiple threads can't be any faster. In fact, it may be slower because of all the context switching required. The only reasons to use multiple threads are to maintain a responsive UI while performing some long-running operation or to make use of multiple processor cores. A very few desktop systems have four cores these days, many have two and a lot still have only one.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|