Pattern matching for ASCII characters only
I'm trying to write a function that strips out all non-ASCII characters in a field when I select it, but I'm having trouble with the pattern matching. I've tried this (using the collation statement to remove the diacritics):
Code:
Create Function dbo.udfRemoveTroublesomeCharacters (@IncomingText VarChar(1000))
Returns VarChar(1000)
As
Begin
Select @IncomingText = @IncomingText Collate SQL_Latin1_General_CP1253_CI_AI
Declare @KeepValues As VarChar(50)
Set @KeepValues = '%[' + CHAR(32) + '-' + CHAR(126) + ']%'
While PatIndex(@KeepValues, @IncomingText) > 0
Set @IncomingText = Stuff(@IncomingText, PatIndex(@KeepValues, @IncomingText), 1, '')
Return @IncomingText
End
Go
However, when I test it using this:
Code:
Select dbo.udfRemoveTroublesomeCharacters('ars;tlkjDSFGHSDFGÉ hq4é59’87nq34[5vui$"£%^"$%^"(*345v')
It returns this: arstlkjDSFGHSDFGEhq4e59’87nq345vui£345v
As you can see, that still allows the character ’ to come through despite the fact that its ASCII code is 146. Conversely, it removes something like ; which has an ASCII code of 59.
Short of individually listing each character that's allowed, how can I get it to stick to the range I've asked for?
Thanks...
Re: Pattern matching for ASCII characters only
your function removes all characters defined in KeepValues so the variable name is a bit misleading.
this becomes obvious when you set KeepValues to [a-zA-Z] or [^a-z^A-Z] to negate. it also seems that char(32) and char(126) i.e. [ -~] is interpreted by the sql matching engine not as the ascii range 32 to 126.
Re: Pattern matching for ASCII characters only
I found a thread on msdn, that says PATINDEX supports pattern matching, but only in T_SQL patterns and not REGEX patterns. I suspect this is what you may have hit.
Here's a link to the thread.
http://social.msdn.microsoft.com/For...lar-expression
Re: Pattern matching for ASCII characters only
Ah, right - that would probably explain it. Thanks!