Thread: [RESOLVED] Fast replace in a large string

1. [RESOLVED] Fast replace in a large string

Hi

I have some large text data in a string.
The data comes from a file, the line separator is CRLF.
It hold several houndred lines.
Let's assume 10MB data.

From a certain line I need to get the data, then modify it.
The modified data can be shorter or longer than the unmodified data.

Up to now I split the data by CRLF, modify the line, and join it afterwards.
Easy and reliable.
And slow.
And uses more memory than wanted.

I think there should be a better method without split etc.

My idea is to find the xth CRLF and the xth+1 CRLF in the whole large string.
This is also not efficient, as I have to compare every single character to find CRLF.

In short:
I look for a better idea.

Thanks
Karl

2. Re: Fast replace in a large string

Why not read the data line by line, modify the line(s) needed, and write it to a new file on the go?
You can this do with buffered file IO

3. Re: Fast replace in a large string

Originally Posted by Arnoutdv
Why not read the data line by line, modify the line(s) needed, and write it to a new file on the go?
You can this do with buffered file IO
+1 to that

@Karl
give this a try
Code:
Option Explicit

Private Sub Command1_Click()
Dim sPath As String
Dim sContent As String
sPath = "E:\Bible.txt"
sContent = ReadTextFile(sPath, 0) ' lFormat -2 - System default, -1 - Unicode, 0 - ASCII

sContent = Replace(sContent, "someText", "this is myNewText")

'add more to search and replace
'sContent = Replace(sContent, vbCrLf, " ")
WriteTextFile sContent, sPath, 0

End Sub

With CreateObject("Scripting.FileSystemObject").OpenTextFile(sPath, 1, False, lFormat)
.Close
End With
End Function

Sub WriteTextFile(sContent, sPath, lFormat)
With CreateObject("Scripting.FileSystemObject").OpenTextFile(sPath, 2, True, lFormat)
.Write sContent
.Close
End With
End Sub

4. Re: Fast replace in a large string

The approaches are good and easy.
And my data originally comes from files.

But the data has to be changed in memory.
I don't want the file output.

I'll do some tests using temporary files.
Why not.

5. Re: Fast replace in a large string

Originally Posted by Karl77
The approaches are good and easy.
And my data originally comes from files.

But the data has to be changed in memory.
I don't want the file output.

I'll do some tests using temporary files.
Why not.
Up to now I split the data by CRLF, modify the line, and join it afterwards.
You need to data as a big string in your application?

Then read the file line by line, change whats needed, and add the line to a StringBuilder class.

6. Re: Fast replace in a large string

Use InStr with vbBinaryCompare. It's much faster than without.

7. Re: Fast replace in a large string

I would go with what qvb6 wrote,
read the file as binary array
use instrb to find the start and end position.
save data(0 to start-1)
save replaceddata()
save data(end+1 to endfile)

8. Re: Fast replace in a large string

Hi Karl77, if you want the best answer, you'd better post specific code and example data.

Questioning also requires skill.

9. Re: Fast replace in a large string

Originally Posted by baka
read the file as binary array
use instrb to find the start and end position.
You never post code for your suggestion (not even pseudo-code).
And that makes it really hard, to take them seriously.

Olaf

10. Re: Fast replace in a large string

well Karl77 is not a newbie, im sure he can figure it out what to do.
he will learn even more if he need to do it himself.
but I understand what u mean, if I where to suggest a very complex function, but im not, this is very basic and should be known by every vb6 programmer.

11. Re: Fast replace in a large string

Originally Posted by Karl77
It hold several houndred lines.
This data sounds like a real dogs dinner.

From a certain line I need to get the data, then modify it.
The modified data can be shorter or longer than the unmodified data.

Up to now I split the data by CRLF, modify the line, and join it afterwards.
Is there just ONE line in the whole file that needs to be changed? The preceding makes it sound that way, but the subsequent discussion is assuming that lots of lines will change. There are likely improvements that can be made if the changes are just to one, or a small number of identifiable lines, out of the whole file.

12. Re: Fast replace in a large string

10 years ago, I'd have suggested you perform these String operations as actual memory operations. That is to say find the necessary offsets for the text you're interesting in changing and use memory operations like RtlMoveMemory to implement concatenations or simply change characters in place instead of having to reallocate entire buffers just to change a few characters. This kind of approach yields great performance boosts in String processing. However, this was easy when we could safely assume all text was 1 byte per character. We live in a Unicode world today and it's just not safe to treat Strings this way anymore.

However, all is not lost. There are still a few "best practices" you could employ to boost the performance of String processing, even with complicated Unicode Strings. I'd like to see the actual code you're using. Perhaps we could help you refactor it a bit to get better performance.

13. Re: Fast replace in a large string

Originally Posted by baka
...this is very basic and should be known by every vb6 programmer.
- "read(ing) the file as binary array" ... (I assume a VB-ByteArray?)
- and "use instrb to find the start and end position"

then this is not really common, and I'd like to see a concrete implementation from you

Olaf

14. Re: Fast replace in a large string

Originally Posted by Schmidt
- "read(ing) the file as binary array" ... (I assume a VB-ByteArray?)
- and "use instrb to find the start and end position"

then this is not really common, and I'd like to see a concrete implementation from you

Olaf
Actually I don't disagree with him to be honest. Reading a file as a byte array is pretty basic unless you're talking to someone who is very new to programming. I remember writing my own version of the DOS command, COPY when I was like 10 years old in QuickBasic. All it did was divide the file into chunks by reading it 32767 bytes at a time into an array which would then be written to another file using Get/Put with the Binary access mode. I think is safe to assume that any VB6 programmer past the "Hello World" stage would know how to do this.

15. Re: Fast replace in a large string

Originally Posted by Niya
Actually I don't disagree with him to be honest.
Reading a file as a byte array is pretty basic ...
Sure, the ByteArray-reading is a must -
but he also suggested the InstrB function as a vehicle to "find things within those bytes" fast(er).

I was asking for concrete code (the combination of these two suggested things) -
e.g. in the context of "linewise looping through the bytes, whilst using InstrB",
because this is not really a common task for most devs...

If you think you can write this up for me instead, then I'd appreciate it.

Olaf

16. Re: Fast replace in a large string

Originally Posted by Schmidt
Sure, the ByteArray-reading is a must -
but he also suggested the InstrB function as a vehicle to "find things within those bytes" fast(er).

I was asking for concrete code (the combination of these two suggested things) -
e.g. in the context of "linewise looping through the bytes, whilst using InstrB",
because this is not really a common task for most devs...
Fair enough.

Originally Posted by Schmidt
If you think you can write this up for me instead, then I'd appreciate it.
I'm not entirely sure it's a good idea to commit to this method. Reading the file as a byte array and using Instr to search may not be the best approach. I mean OP hasn't provided enough information. In my mind, the most important question is whether OP is dealing strictly with 1 byte per character ANSI Strings. We have to be very careful about treating Strings as byte arrays so casually if different String encodings are involved, even if it's just UTF-8. I'd prefer hold off on providing a concrete example of anything until OP provides all the details.

OP alone knows what he is doing and perhaps baka's suggestion is exactly what he needs, we don't know. But if it's not then I'd prefer to see what code he already has so we can try to figure out why it's not performing the way he expects.

17. Re: Fast replace in a large string

Maybe a usecase for my CodeBank-Entry? --> http://www.vbforums.com/showthread.p...ght=ReplaceAny

18. Re: Fast replace in a large string

Originally Posted by qvb6
Use InStr with vbBinaryCompare. It's much faster than without.
Is that not the default compare mode?

19. Re: Fast replace in a large string

Originally Posted by Niya
OP alone knows what he is doing and perhaps baka's suggestion is exactly what he needs, we don't know. But if it's not then I'd prefer to see what code he already has so we can try to figure out why it's not performing the way he expects.
I didn't provide example data and no existing code, because I don't want noone to code for me.
I was looking for a fresh idea.
And the discussion enlighted me.

All the byte array stuff is not the solution.
The large string is already in memory.

In the large string, there is a unique decribing line (a marker) before the data to be changed.
This way I know the next line # which holds the data.

Now I do it different:
I detect the position of the marker in the whole string.
Knowing this position, I go back and forth in the string to find the CRLFs.
Knowing the positions of CRLF, a bit left$and right$ isolates the static parts.

Solved then with no major effort.
I'll post example code when double-checked and finished.

20. Re: Fast replace in a large string

Originally Posted by vbwins
Is that not the default compare mode?
You are right. I meant to say that vbBinaryCompare is faster than vbTextComapre. vbSpeed shows it to be at least 8 times faster.

21. Re: Fast replace in a large string

Originally Posted by Karl77
Knowing this position, I go back and forth in the string to find the CRLFs.
Knowing the positions of CRLF, a bit left$and right$ isolates the static parts.
Is that marker at the beggining of a line? If so, you could do something like this:

pos = InStr(s, vbCrLf & [Config])

I used the above to make my own Unicode INI file parser, rather than relying on the OS. In the case above, I was looking for a [Config] section just after a new line, so there is no reason to search backward. VB6 has InStrRev to search back for the previous line if you need it.

22. Re: Fast replace in a large string

Solved.
It is really very simple.
Fast and easy.
If someone know significantly faster, tell me.

Code:
m = "123" & vbCrLf & "blablabla\999: time\blablabla" & vbCrLf & "changeme" & vbCrLf & "\end\"

'find the marker
Posi = InStr(m, ": time\")
If Posi = 0 Then
Exit Function
End If

'now we have the marker position
'find the next CRLF
MLen = Len(m)
For i = Posi To MLen
temp = Mid$(m, i, 1) If temp = vbLf Then Lpos = i Exit For End If Next LeftPart = Left$(m, Lpos)
'and the next one
For i = Lpos + 1 To MLen
temp = Mid$(m, i, 1) If temp = vbLf Then Rpos = i Exit For End If Next If Rpos = 0 Then 'then we are at the end of the string MidPart = Mid$(m, Lpos + 1, MLen)
Nullpos = InStr(MidPart, Chr(0))
If Nullpos > 0 Then MidPart = Left$(MidPart, Nullpos - 1) RightPart = vbNullString Else MidPart = Mid$(m, Lpos + 1, Rpos - Lpos - 2)
RightPart = Mid$(m, Rpos, MLen) End If 'update the middle part ... ... NewVal = NewVal & vbCrLf m = LeftPart & NewVal & RightPart 23. Re: [RESOLVED] Fast replace in a large string you can use another InStr to find CrLf, and another to find the 2nd CrLf using: InStr(Posi, m, vbLf) when you got all 3 positions, you know what to do. 24. Re: [RESOLVED] Fast replace in a large string Originally Posted by baka you can use another InStr to find CrLf, and another to find the 2nd CrLf using: InStr(Posi, m, vbLf) That is a very good idea. Thanks for the hint! 25. Re: Fast replace in a large string Originally Posted by Karl77 Solved. It is really very simple. Fast and easy. If someone know significantly faster, tell me. Uhh... besides checking every single character and throwing temp. strings around in memory? btw: faster than what? 26. Re: [RESOLVED] Fast replace in a large string this is about InStrB just to explain a bit about it. If you have a byte array, from a file, or a memory region you want to look into. sBuffer() as byte is where we have the data and Find() as byte is the specific pattern to look for it could be its not enough, maybe we know the pattern is "RZ1" and 3 unknown "???" and 3 known "YRA" what we do is put this in a loop Code: lpos = InStrB(1 + lpos, sBuffer, Find, vbBinaryCompare) and if lpos > 0 we can check more, example: If sBuffer(lpos + 2) = Data(0) Then this example is from a "working" program that looks into the uncompressed data from the memory using ReadProcessMemory to get the dimension of a flash/swf. since theres no property in the flash component that can do that. 27. Re: [RESOLVED] Fast replace in a large string Originally Posted by Karl77 Solved. It is really very simple. Fast and easy. If someone know significantly faster, tell me. In my opinion, if we talk about a robust solution, in terms of speed the Olaf's method for parsing huge files is hard to beat in vb6. It is based on ICSVCallback_NewValue function and a StringBuilder is used as buffer that is "released" on disk after each 1Mb size. I now there are many approaches but the mechanism from this post http://www.vbforums.com/showthread.p...les&highlight= helped me to parse a file with 22 milions of rows (740Mb) in 18 seconds. Moreover, it is not third party dependent as you might think at first sight due to a regfree capability included in the DirectCOM library. 28. Re: [RESOLVED] Fast replace in a large string I think OP's pretty much nailed it. After he incorporates baka's suggestion of using Instr to find the carriage return/Line feeds, I think it's safe to say his method is adequate. There are ways of optimizing further but if OP is satisfied, it's just over engineering at this point. 29. Re: [RESOLVED] Fast replace in a large string Originally Posted by Niya I think OP's pretty much nailed it. After he incorporates baka's suggestion of using Instr to find the carriage return/Line feeds, I think it's safe to say his method is adequate. There are ways of optimizing further but if OP is satisfied, it's just over engineering at this point. I agree to a point, but consider the subject of the thread: "Fast" replace in a large string. I don't see, how comparing each character or using InStr/InStrB can be fast, especially since the OP is using temp. string-variables throwing around in memory. I've just read Daniel's Link on Olaf's solution, and i don't think that we have to discuss Olaf's skills. Yeah, the goal of programming is: 1) Make it work! 2) If 1) then make it faster without breaking the working solution 3) If 1) and 2) then Make it right! (aka Handling of "IT Layer 8"-scenarios) At least, that's my philosophy (Discussion of the order of my 3 steps not withstanding ) 30. Re: [RESOLVED] Fast replace in a large string Originally Posted by Zvoni consider the subject of the thread: "Fast" replace in a large string. I don't see, how comparing each character or using InStr/InStrB can be fast, especially since the OP is using temp. string-variables throwing around in memory. I didn't measure exactly, but I think the solution is fast enough. Instr is used 3 times. The comparision of single characters (as in my snippet) is gone. The temp$ could be avoided somehow, perhaps.
The average data amount is around 5MB (1k to 10MB) per execution.
The temp$are cleared after execution. No memory wasted. My coding approaches are simple: 1) make it work reliable 2) keep it simple, readable and understandable 3) spot the bottlenecks 4) make it as fast as neccessary - not as possible In my case I was at 3) and 4). Which is solved now. I could work further on the performance. What for? It won't have a major impact on the overall performance of the real app function. Imagine this special 'fast replace' task takes 6msec. Then I optimize it to 3msec. Now let's execute it 20x. I won 60msec. Very good. But I won't notice... 31. Re: [RESOLVED] Fast replace in a large string This discussion helped me to get the simple idea. ALL comments were helpful. Also baka's short hints. Which I followed partially. So thanks to all of you. 32. Re: [RESOLVED] Fast replace in a large string Originally Posted by Zvoni I agree to a point, but consider the subject of the thread: "Fast" replace in a large string. I don't see, how comparing each character or using InStr/InStrB can be fast, especially since the OP is using temp. string-variables throwing around in memory. I've just read Daniel's Link on Olaf's solution, and i don't think that we have to discuss Olaf's skills. Yeah, the goal of programming is: 1) Make it work! 2) If 1) then make it faster without breaking the working solution 3) If 1) and 2) then Make it right! (aka Handling of "IT Layer 8"-scenarios) At least, that's my philosophy (Discussion of the order of my 3 steps not withstanding ) If he is satisfied with it, I don't seen any need to optimize it further. Over engineering and premature optimization are huge time wasters when you need to get something up and running quickly. 33. Re: Fast replace in a large string Originally Posted by Shaggy Hiker This data sounds like a real dogs dinner. ROFL houndred... stupid me... 34. Re: [RESOLVED] Fast replace in a large string Originally Posted by baka this is about InStrB just to explain a bit about it. If you have a byte array, from a file, or a memory region you want to look into. sBuffer() as byte ... is where we have the data Find() as byte ... is the specific pattern to look for Glad you posted at least something which resembles a real code-example... Originally Posted by baka ...what we do is put this in a loop Code: lpos = InStrB(1 + lpos, sBuffer, Find, vbBinaryCompare) Because your line above shows exactly the naive approach I've expected. It's wrong to write it like that... (and wrong to suggest to others, to use it like that). See, when you use it as you suggested - in a loop - then the approach of: - passing ByteArrays directly - without prior String-conversion into InstrB Is a huge performance-hog. The code-example below shows, how huge the performance-gap to "normal Instr-usage with normal WChar-BStrings" can be. (InstrB being about factor 1000 slower than normal Instr). Code: Option Explicit Private Declare Function QueryPerformanceFrequency& Lib "kernel32" (x@) Private Declare Function QueryPerformanceCounter& Lib "kernel32" (x@) Private sLogFileInput As String, bLogFileInput() As Byte, T@ Private Sub Form_Load() 'prepare some simulated LogFile-Input in a String Dim i As Long, S(1 To 12000) As String, D As Date: D = Now For i = 1 To UBound(S) S(i) = D + i / 86400 & " ... some longer LogEntry-Line with some leading Timestamp... " & i Next sLogFileInput = Join(S, vbCrLf) 'the simulated Log-Content as String-Input bLogFileInput = StrConv(sLogFileInput, vbFromUnicode) 'and here the same content in a ByteArray End Sub Private Sub Form_Click() Print vbLf; "Input-Len: "; Format(Len(sLogFileInput) / 1024 ^ 2, "0.0MB") DoEvents: T = MsecTimer Print "ReadLines-InstrS: "; ReadLinesInstrS(sLogFileInput), Format(MsecTimer - T, "0.00msec") DoEvents: T = MsecTimer Print "ReadLines-InstrB: "; ReadLinesInstrB(bLogFileInput), Format(MsecTimer - T, "0.00msec") End Sub Function ReadLinesInstrS(sInput As String) As Long 'line-wise looping (returning the line-count) Dim Pos1 As Long, Pos2 As Long, CurLine As String Do While Pos2 < Len(sInput) Pos2 = InStr(Pos2 + 1, sInput, vbCrLf) If Pos2 = 0 Then Pos2 = Len(sInput) + 1 ' cut-out the current line (in case that's needed) ' CurLine = Mid$(sInput, Pos1 + 1, Pos2 - Pos1 - 1)

Pos1 = Pos2 + 1
Loop
End Function

Function ReadLinesInstrB(bInput() As Byte) As Long 'line-wise looping (returning the line-count)
Dim Find() As Byte: Find = StrConv(vbCrLf, vbFromUnicode)
Dim Pos1 As Long, Pos2 As Long, CurLine As String
Do While Pos2 < UBound(bInput) + 1
Pos2 = InStrB(Pos2 + 1, bInput, Find)
If Pos2 = 0 Then Pos2 = UBound(bInput) + 2

'     cut-out the current line (in case that's needed)
'     CurLine = StrConv(MidB$(bInput, Pos1 + 1, Pos2 - Pos1 - 1), vbUnicode) Pos1 = Pos2 + 1 ReadLinesInstrB = ReadLinesInstrB + 1 Loop End Function Function MsecTimer() As Currency 'a Timing-Helper Dim c@, frq@ QueryPerformanceFrequency frq If QueryPerformanceCounter(c) Then MsecTimer = CCur(c / frq) * 1000@ End Function FWIW, here's the corrected version of the InstrB-approach (now being faster than the "normal Instr"). Code: Function ReadLinesInstrB(bInput() As Byte) As Long 'line-wise looping (returning the line-count) Dim sInp As String: sInp = bInput 'place the Bytes in a BString (to avoid implicit conversions) Dim sFnd As String: sFnd = StrConv(vbCrLf, vbFromUnicode) 'same here, ANSI-content goes directly into a String Dim Pos1 As Long, Pos2 As Long, CurLine As String Do While Pos2 < LenB(sInp) Pos2 = InStrB(Pos2 + 1, sInp, sFnd) 'now InstrB will not have to perform implicit conversions If Pos2 = 0 Then Pos2 = LenB(sInp) + 1 'cut-out the current line (in case that's needed) 'CurLine = StrConv(MidB$(sInp, Pos1 + 1, Pos2 - Pos1 - 1), vbUnicode)

Pos1 = Pos2 + 1
Loop
End Function

Olaf

35. Re: [RESOLVED] Fast replace in a large string

A suggested approach need not include the optimal means of implementing such an approach. That happens all the time. Please leave personal animus out of it.

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•

Featured