Make a Split function that skips quotes that contain the delimiter. Any opening quote will prevent delimeter from splitting the string.
Syntax:
VB Code:
Public Function QuoteSplit(ByRef Expression As String, Optional ByRef Delimiter As String = " ", _
Optional ByVal Limit As Long, Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As String()
End Sub
This is a friendly challenge! This means that you can come up with entirely new suggestions, improve code posted earlier on and give suggestions to other participating in to the challenge.
Purpose is free: you can aim for shortness, you can aim for speed, you can aim for balancing code length and speed. Do what you like most
Edit!
You may drop Limit and Compare if you don't want to do them.
Also fixed Delimiter code (default = " ").
Edit #2
And now fixed the Delimiter spelling in the code.
here's my not-much-code-but-slow version (i didn't bother with Limit):
VB Code:
Public Function QuoteSplit(ByRef Expression As String, Optional ByRef Delimiter As String = " ", Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As String()
Though I'm not the type to write fast code, so I can't say it's optimized for speed.... this is just how I would write this function if I would need it...
Private Sub Command1_click()
Dim sText As String
Dim sDelimiter as string
sText = text1.text ' ie What you typing in the TextBox1
sDelimiter = Text2.Text ' may be TWO ASTERICKS ** IN TEXTBOX2
Arr = Split(sText, sDelimiter , , vbBinaryCompare)
For i = 0 To UBound(Arr)
Debug.Print Arr(i)
Next i
Private Sub Command1_click()
Dim sText As String
Dim sDelimiter as string
sText = text1.text ' ie What you typing in the TextBox1
sDelimiter = Text2.Text ' may be TWO ASTERICKS ** IN TEXTBOX2
Arr = Split(sText, sDelimiter , , vbBinaryCompare)
For i = 0 To UBound(Arr)
Debug.Print Arr(i)
Next i
End Sub
The point of this is to split a string, except for parts that are in quotes.
I also figured we were supposed to do it without using the Split() function...if not then I just made it a lot harder than it needs to be.
Just woke up from 40000 seconds of sleep, but I'll do one. I guess I could go for the "absolute insane optimization" line, which I don't normally do because it isn't practical at all I first thought about doing the very simple and minimal one, but bushmobile did it already.
Anyways, I'll setup the timings first once I'm fully awake and have eat something.
Also fixed the Delimiter/Delimeter typo in the first post.
I try to do a maximum number of lines I can do for this simple task so you can laugh at how long it is It'll also take a few full work days to get it nice and good, a complete waste of time. The problem is that I'm unemployed at the moment as I'm waiting for getting an apartment in southern Finland to begin with a new job, so I have too much spare time...
We could do an efficiency rating, ie. number of characters of required code vs. final speed. For character counting all comments should be ignored as well as indenting spaces and line changes. If someone wants to code it, you're welcome to do it.
The current results are rather equal in compiled code, surprisingly the shortest code by bushmobile is also the fastest. Under IDE bushmobile's code wipes the floor. Attached is the test project I made.
DigiRev: you have an error in your function, you'll see the results when you download the speed comparison project attached below.
Warning! More related dangerous coding ideas coming!
I guess we should talk about how the function should work in error situatations. I'd recommend working like the Split function: just return a string array with one element containing the expression string.
However, there are troublesome cases when the delimiter contains quotes. My personal opinion is that if quotes are included in the delimiter, then delimiter is more important than quotes. So if a delimiter begins before a quote, then delimiter is more important and the quotes are ignored.
Does this sound ok behavior? I guess we should look for more troublesome cases so we can make an extensive validation so someone who needs this kind of function can see if it does work for what he needs; although I believe these functions do work for the most common need: split by space, ignore spaces quotes. Should we instead of allowing any delimiter only go for space delimiter?
Then, should we add StripQuotes as an optional functionality? Meaning that quotes are removed from the final strings. I guess that is what is often wanted.
And then: QuoteJoin! Do the opposite, add quotes for strings that contain the delimiter in the passed string array.
Ok, I now did my own function, however it is not aiming for superior speed. Instead it is more of the common "this is how I'd have done it if someone requested it". I wanted to get something ready sooner than later. Will see if I feel like starting to code the extreme speed version today or not.
Attached is an upgraded project. Difference to what was before is that I put a generic code that swiches the testable function, one code to run all tests for all compatible functions. The downside is that all the testable functions must have identical syntax, so I could only make it work for bushmobile's and my own function for now. This means there is no support for Limit. These functions get additional validation test when ran.
vb Code:
Public Function Merri_QuoteSplit1(ByRef Expression As String, Optional ByRef Delimiter As String = " ", Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As String()
Const QUOTE As String = """"
Static lngEnd() As Long, lngEndUB As Long
Dim strOut() As String
Dim lngPosQ As Long, lngPosQE As Long, lngPos As Long, lngCount As Long
Dim blnInQuote As Boolean, lngLen As Long
' remember delimiter length
lngLen = Len(Delimiter)
' error detection
If (lngLen = 0) Or (LenB(Expression) < LenB(Delimiter)) Then
Make a Split function that skips quotes that contain the delimiter. Any opening quote will prevent delimeter from splitting the string.
Syntax:
VB Code:
Public Function QuoteSplit(ByRef Expression As String, Optional ByRef Delimiter As String = " ", _
Optional ByVal Limit As Long, Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As String()
End Sub
This is a friendly challenge! This means that you can come up with entirely new suggestions, improve code posted earlier on and give suggestions to other participating in to the challenge.
Purpose is free: you can aim for shortness, you can aim for speed, you can aim for balancing code length and speed. Do what you like most
Edit!
You may drop Limit and Compare if you don't want to do them.
Also fixed Delimiter code (default = " ").
Edit #2
And now fixed the Delimiter spelling in the code.
The extreme version would need to be done in C++ with asm =P
If VB had bitwise operators, you could make a very fast one. This is very similar to what I'm doing in C, which is making a function that parses CGI input from a submit form or apache api. name=value&name=value2
Here is a little slip of that code. It's actually pretty cool in that this algorithm can test 4 bytes at a time and only have 1 branch for each value I'm testing for. The example in this code is the amp & sign.
// A very simple Key=Value parser =P
int equal = 0x3d3d3d3d; // Mask for =
int amp = 0x26262626; // Mask for &
int teststr = 0;
int maskstr = 0;
unsigned int mask = 0;
int y, m, n;
int result = 4;
This is an example of what asm can do that C++ cannot do:
assign a int = char[3]. Note: always make sure you do this on a 4 byte boundry.
magic: buf = &str[i];
__asm push eax
__asm push edx
__asm mov eax, dword ptr buf
__asm mov edx, [eax] ; Get 4 charecters out of char *buf
__asm mov teststr, edx ;Assign to int teststr.
__asm pop edx
__asm pop eax
Code:
maskstr = teststr ^ amp;
mask = (maskstr & 0x7f7f7f7f) + 0x7f7f7f7f;
mask = ~(mask | maskstr | 0x7f7f7f7f);
y = -(mask >> 16);
m = (y >> 16) & 16;
n = 16 - m;
mask = mask >> m;
y = mask - 0x100;
m = (y >> 16) & 8;
n = n + m;
mask = mask << m;
y = mask - 0x1000;
m = (y >> 16) & 4;
n = n + m;
mask = mask << m;
y = mask - 0x4000;
m = (y >> 16) & 2;
n = n + m;
mask = mask << m;
y = mask >> 14;
m = y & ~(y >> 1);
result = (n + 2 - m) >> 3;
if(result != 4)
{
// str[location of amp] = 0
str[(i + 3) - result] = 0;
m_data[key] = value; // STL MAP
key = &str[((i+3) - result)+1];
}
You could do a variation of this code in c++ and call it from vb I guess. Maybe do 2 charecters at a time from vb.net
Last edited by Maven; Jan 6th, 2007 at 07:21 AM.
Education is an admirable thing, but it is well to remember from time to time that nothing that is worth knowing can be taught. - Oscar Wilde
Yea with an algorithm that worked like this from visual basic.net, about 2 bytes at a time is all you could do. but that is still 1/2 the work =P
just make sure your not testing for a NULL charecter, it'll work for everything but that. If it's null then you can't xor it with a mask. which is the first line.
Basically what this algorithm does is read 4 bytes a time and load it into a integer. This integer is masked with a 4 byte mask of the character I'm testing for and then xored. That way if the value I'm testing for is located in the string, it'll be turned to a 00 NULL.
Thats what this line does:
int amp = 0x26262626; // Mask for &
maskstr = teststr ^ amp;
The next two lines just test to see if any of the 4 bytes have a null charecter.
The rest of the lines just find out exactly where that null byte is in the word. The only reason u do this, is to avoid all branching (any kind of compares). At the end of the day, I ended up with just 1 branch per character I'm testing for in every 4 byte read, in the case of a CGI, that would be 2... the amp and the equal. Which means my algorithm is doing 1/4 the amount of branching that a obvious algorithm would do: aka a if string[i]== '=' then.
I must point out to all those ASM bashes out there. When it comes to bit twiddling code, ASM is the clear winner! I couldn't get around using asm for this algorithm. To do so would have required me to load up the int 1 charecter at a time and shift 4 bytes each time, damn that.
I'll probably end up rewriting the entire algorithm in ASM just because C compliers are very mysterious when it comes to inserting inline asm. At the end of the day, if its important enough for asm to be inserted, it's important enough to do in asm.
Who says that? I say that! The useless programmer who spreads disinformation and demoralization wherever he goes.
peace
Education is an admirable thing, but it is well to remember from time to time that nothing that is worth knowing can be taught. - Oscar Wilde
Ummmm I think I did something similar before, under the name of ParseCSV (Not released tho), hangon - Ill dig out my code.
EDIT: Yes I did, Ill need to make the interface compatible - because ','s are hardcoded in.
Last edited by Raedwulf; Jan 7th, 2007 at 02:30 AM.
Do you want to have another kind of challenge already? Atleast it seems nobody is interested in finding other solutions to the current one (besides Raedwulf).
I'll have to pass on this for a while, busy with exams. Since it is an open/not-too-serious challenge, i guess there shouldn't be any time limit. I'll post my solution when I get my exams over with, cheers .
Do you want to have another kind of challenge already? Atleast it seems nobody is interested in finding other solutions to the current one (besides Raedwulf).
I was thinking about an smart Replace, a Replace function that allows a Pattern in the Find parameter using a wildcard character, i think it could be useful.
Same Syntax than Replace:
VB Code:
Public Function SmartReplace(pExpression As String, _
pFind As String, _
pReplace As String, _
Optional pStart As Long = 1, _
Optional pCount As Long = -1, _
Optional pCompare As VbCompareMethod = vbBinaryCompare) As String
The character * could be used as wildcard, some examples:
Another..
pExpression = "Hi, how are you?"
pFind = "y*u"
pReplace = "they"
String Returned: "Hi, how are they?"
One more with numbers:
pExpression = "123 - 627 = 0"
pFind = "*2*"
pReplace = "5"
String Returned: "5 - 5 = 0"
This kind of smart Replace is native in Java but I never saw it in other programming languages, maybe Regex in .Net.
I know this doesn't seem very easy to do, but i think it would be useful, what do you people think?
I don't care as much about SmartReplace (and I don't see major trouble in it, unless one targets for speed), but how about MultiReplace? Many people seem to have problems with doing several replaces at once, so how about making it possible to pass string arrays as Find and Replace?
The difficulty is that it should always look for the next match in the string and replace that, ie. just doing multiple replaces in a row does not work.
So now we have SmartReplace and MultiReplace as a new challenge.
How about functions so simulate the commands in Unix/Linux (for strings) ?
When I learned unix at school, I found the functions quite usefull.
You could find data by pattern, modify, replace... it was so long ago that I don't even remember the functions names, I just remember that they were quite usefull for manipulating strings...
So, we'll code SmartReplace as it already interests two persons At the moment I don't have the time though, but I'll see what is the situatation tomorrow or later on in this week. Finally opened a development site, it took less time to get it somewhat running than I expected, but getting it to final polished shape will take some time... and creating all the content will surely take my time. I thought about collecting a function library and putting results of these available there if people are willing to contribute
Also, we might want to add escape character, like \, to allow searching for the wildcards characters. So \\ would represent single \ character, \* would represent * character (and not a wildcard for any characters).
But I guess we could go for "optional", so a basic function could only support the wildcard * and function creator could tell which wildcards his function supports when he contributes his own solution. Of course, the more it supports, the better
* = none, or any number of characters
? = any one character
# = any one number
\ = escape character
Edit!
I now wrote a page about the last challenge to Devve: QuoteSplit Challenge. I chose bushmobile's and CVMichael's code there besides mine as these were the ones that worked as expected
As far as I know, a fast version of a regular expression is easily readily available via some object. Never used it, but seen a few threads about it. So that takes value out of doing it.
If so, define what you mean. From what I know, regexp is always defined to function the same way, ie. regexps in Linux work the same as regexp in Windows. Maybe I should bother reading Wikipedia, but I guess I feel a bit lazy now.
I didn't get your "Merri" code to split correct don't know why; bushmobile's code is fantastic in a lot of aspects but:
I have chosen slightly different approach, lets make a piece of code that is serializable; in this way we can parse very large data; we don't use any extra memory.... We can tell how far we are - maybe even skip data...
my code is a bit shorter than other code; would normally wrap it in a class - for now i have some static variables; not so good but it shows how I believe is the best way to parse data.
I will not claim that I have verified it; but it works on my test-strings; performs OK in the IDE and really good compiled. TQ1 is returning an array, TQ2 is streaming the data....
Use:
Code:
Dim i as long, a as string
Do
a = QuoteSplitStream(i,......)
' i tells you how far we are in the string
loop until i=0
Code:
Code:
Public Function TOQ_QuoteSplitStream(ByRef i As Long, ByRef Expression As String, Optional ByRef Delimiter As String = " ", Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As String
Dim j As Long
Static k As Long, lE As Long, lD As Long
If i = 0 Then k = 0: lE = Len(Expression): lD = Len(Delimiter)
If k = 0 Then k = InStr(i + 1, Expression, """"): If k = 0 Then k = lE + 1
If i + 1 = k Then
j = InStr(i + 2, Expression, """")
If j = 0 Then Err.Raise "Unclosed quote"
TOQ_QuoteSplitStream = Mid$(Expression, i + 1, j - i) ' Including quotes, don't know it it makes sense to return the quotes
j = j + lD ' Because we assume that a quote is followed by a delimiter
k = 0
Else
j = InStr(i + 1, Expression, Delimiter, Compare)
If j = i + 1 Then
j = i + lD
TOQ_QuoteSplitStream = ""
Else
If j = 0 Then j = lE + 1
If k < j Then
j = k + lD
TOQ_QuoteSplitStream = Mid$(Expression, i + 1, j - i)
k = 0
Else
TOQ_QuoteSplitStream = Mid$(Expression, i + 1, j - i - 1)
j = j + lD - 1
End If
End If
End If
i = j
If i >= lE Then i = 0
End Function
Public Function TOQ_QuoteSplit(ByRef Expression As String, Optional ByRef Delimiter As String = " ", Optional ByVal Compare As VbCompareMethod = vbBinaryCompare) As String()
Const cGuess = 16
Dim i As Long, j As Long, s() As String
ReDim Preserve s(cGuess - 1)
Do
If j Mod cGuess = cGuess - 1 Then
ReDim Preserve s(j + cGuess)
End If
s(j) = TOQ_QuoteSplitStream(i, Expression, Delimiter, Compare)
j = j + 1
Loop Until i = 0
ReDim Preserve s(j - 1)
TOQ_QuoteSplit = s
End Function
TOQ4: your code works incorrectly, however your test string also triggered a bug in my code, which I've now fixed in the post where I originally submitted it.
I've left out the casing parameter, as it doesn't make much sense: case-insensitive with respect to what? It could only be the separator, and how often do you use an alphabetic character as separator?
And if you really need to, this version accepts multiple separators (and quoters and escapers), so if you really need to split on 'a', regardless of case, you can simply do this:
ok well i didn't test it at all, however, should be speedy:
Code:
Function ParseExceptQuotes(ByRef sText$, ByRef sParse$) As Variant
Dim tba() As Byte, pba() As Byte, oba() As Byte, i&, j&, bq As Boolean, s$, u&, c&, uu&, opa
tba = StrConv(sText, vbFromUnicode): pba = StrConv(sParse, vbFromUnicode)
ReDim opa(1000) As String: c = -1
u = UBound(tba): uu = UBound(pba)
For i = 0 To u
If tba(i) = 34 Then bq = Not bq
If bq Then
s = s & Chr$(tba(i))
Else
If tba(i) = pba(0) Then
If uu > 1 Then
For j = 1 To uu
If tba(i + j) <> pba(j) Then Exit For
Next j
End If
If j = uu Then
' found delimiter
c = c + 1
opa(c) = s
s = ""
End If
i = i + j
Else
s = s & Chr$(tba(i))
End If
End If
If i = u Then
opa(c + 1) = s
ReDim Preserve opa(c + 1) As String
ParseExceptQuotes = opa
End If
Next i
End Function
usage:
Code:
Private Sub Command1_Click()
Dim s$, a, i&
s = "1 2 3 4 ""5 5a"" 6 ""8 8a"" 7"
a = ParseExceptQuotes(s, " ")
For i = 0 To UBound(a)
Debug.Print a(i)
Next i
End Sub
If you're in the "Don't bark if you have a dog to do it for you" school of thought, here's an example of this function using VS2005 VB.NET's built-in Text Parsing engine.
vb Code:
Public Function QuoteSplit(ByVal parseString As String, ByVal ParamArray Delimiters() As String) As String()
Dim Results() As String
Dim StringEncoding As New ASCIIEncoding()
Using MemStream As New MemoryStream(StringEncoding.GetBytes(parseString))
Using Parser As New FileIO.TextFieldParser(MemStream)