Results 1 to 4 of 4

Thread: Pattern matching for ASCII characters only

  1. #1

    Thread Starter
    Fanatic Member InvisibleDuncan's Avatar
    Join Date
    May 2001
    Location
    Eating jam.
    Posts
    819

    Pattern matching for ASCII characters only

    I'm trying to write a function that strips out all non-ASCII characters in a field when I select it, but I'm having trouble with the pattern matching. I've tried this (using the collation statement to remove the diacritics):

    Code:
    Create Function dbo.udfRemoveTroublesomeCharacters   (@IncomingText  VarChar(1000))
    Returns VarChar(1000)
    As
    Begin
    
        Select  @IncomingText   =   @IncomingText  Collate SQL_Latin1_General_CP1253_CI_AI
        Declare @KeepValues     As  VarChar(50)
        Set     @KeepValues =   '%[' + CHAR(32) + '-' + CHAR(126) + ']%'
        
        While PatIndex(@KeepValues, @IncomingText) > 0
            Set @IncomingText = Stuff(@IncomingText, PatIndex(@KeepValues, @IncomingText), 1, '')
        
        Return @IncomingText
    End
    
    Go
    However, when I test it using this:
    Code:
    Select dbo.udfRemoveTroublesomeCharacters('ars;tlkjDSFGHSDFGÉ hq4é59’87nq34[5vui$"£%^"$%^"(*345v')
    It returns this: arstlkjDSFGHSDFGEhq4e59’87nq345vui£345v

    As you can see, that still allows the character ’ to come through despite the fact that its ASCII code is 146. Conversely, it removes something like ; which has an ASCII code of 59.

    Short of individually listing each character that's allowed, how can I get it to stick to the range I've asked for?

    Thanks...
    Indecisiveness is the key to flexibility.

    www.mangojacks.com

  2. #2
    Frenzied Member
    Join Date
    May 2014
    Location
    Central Europe
    Posts
    1,372

    Re: Pattern matching for ASCII characters only

    your function removes all characters defined in KeepValues so the variable name is a bit misleading.

    this becomes obvious when you set KeepValues to [a-zA-Z] or [^a-z^A-Z] to negate. it also seems that char(32) and char(126) i.e. [ -~] is interpreted by the sql matching engine not as the ascii range 32 to 126.

  3. #3
    PowerPoster abhijit's Avatar
    Join Date
    Jun 1999
    Location
    Chit Chat Forum.
    Posts
    3,226

    Re: Pattern matching for ASCII characters only

    I found a thread on msdn, that says PATINDEX supports pattern matching, but only in T_SQL patterns and not REGEX patterns. I suspect this is what you may have hit.
    Here's a link to the thread.

    http://social.msdn.microsoft.com/For...lar-expression
    Everything that has a computer in will fail. Everything in your life, from a watch to a car to, you know, a radio, to an iPhone, it will fail if it has a computer in it. They should kill the people who made those things.- 'Woz'
    save a blobFileStreamDataTable To Text Filemy blog

  4. #4

    Thread Starter
    Fanatic Member InvisibleDuncan's Avatar
    Join Date
    May 2001
    Location
    Eating jam.
    Posts
    819

    Re: Pattern matching for ASCII characters only

    Ah, right - that would probably explain it. Thanks!
    Indecisiveness is the key to flexibility.

    www.mangojacks.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width