Results 1 to 13 of 13

Thread: [RESOLVED] RegEx with VB.NET

  1. #1

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Resolved [RESOLVED] RegEx with VB.NET

    I am creating a program to search a specific website's HTML source code for a very specific URL string structure with two variable areas in it. I am not good with these regular expressions at all and am having trouble getting what I need.

    Here is what I have so far in VB.NET:

    Code:
    Dim testNum As MatchCollection = Regex.Matches(TextBox1.Text, "<a href='series.php\?ID=[0-9]'>[^</a>]</a>")
    If testNum.Count = 0 Then
        End
    Else
        MessageBox.Show(testNum.Count)
    End If
    Expression: <a href='series.php\?ID=[0-9]'>[^</a>]</a>

    Sample Text:

    <a href='series.php?ID=23'>Aishiteru ze Baby ( Love You Baby )</a>
    <a href='series.php?ID=230'>Akage no Anne ( Anne of Green Gables )</a>

    There is an ID number that is variable in Integer value and length directly after "ID=".
    Between the HTML anchor tags, there is variable text that can have any length and any characters.

    I run my code and no values are being returned (The count of the Match collection is 0).

    I know my code is correct because if I just do a search for a single character, I get results. ("a" = 14 matches)


    So, I'm totally lost and have no idea what is wrong with my expression as, for some reason, I am totally not grasping this RegEx thing entirely and what constitutes an illegal expression. I'm not sure if it's wrong, or if it isn't compatible with VB.NET or if it's totally wrong. I used a RegEx creator program (which is where I got the above expression), and I'm not even sure if I was operating it properly and if the expression it gave me is right.

    Thanks for any help provided.

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  2. #2
    Addicted Member
    Join Date
    Dec 2008
    Posts
    185

    Re: RegEx with VB.NET

    Have you tried: \d*[0-9]

    in place of: [0-9] <-- I think this means 1 digit.

  3. #3

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    Yeah, I think you're right. But that didn't work.
    I also put \d*[0-9] in parentheses and it still didn't work.
    Of course, maybe the [^</a>] part is wrong?

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  4. #4
    Addicted Member
    Join Date
    Dec 2008
    Posts
    185

    Re: RegEx with VB.NET

    There might also be some other characters that need to be escaped.

    Good idea to try it on its own to just check the number without the rest of the string.

  5. #5

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    OK, so the number is working properly as \d*[0-9]
    I also went ahead and tried: <a href='series.php\?ID=\d*[0-9]
    and that worked as well. It is the last part of the expression that isn't working.
    Shouldn't that be correct though? It's telling it everything after that first portion can be anything but "</a>" then to make sure there is a literal "</a>" at the end of the string.

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  6. #6
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,106

    Re: RegEx with VB.NET

    What are you expecting to do with this part: [^</a>]</a>

    I have just gotten into RegEx recently, but I read that as being either the beginning of the line, or < or / or a or >, but follwed by </a>. That doesn't seem right. It seems like the ^ would be optional, but those others don't seem like they should be part of the set.
    My usual boring signature: Nothing

  7. #7

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    Read my post above yours...lol I posted just before you did.

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  8. #8
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,106

    Re: RegEx with VB.NET

    Ah, I learned something new, the ^ is not in a set. Ok, but the other characters are just characters in a set, right, so that would be Not < and Not / and Not a and Not >. When you actually want Not ("</a>") (that's not RegEx symbols, by the way).
    My usual boring signature: Nothing

  9. #9

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    How about this?
    <a href='series.php\?ID=\d*[0-9]'>[^(</a>)]</a>

    it doesn't work but is it even remotely close to correct?
    Everything up until the set works properly.

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  10. #10

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    what does a + symbol after a set mean? like [0-9]+

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  11. #11

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    Got something: <a href='series.php\?ID=\d+'>[^<]+</a>

    based on this text from Regular-Expressions.info, it doesn't make sense how [^<]+ even works:

    Quote Originally Posted by regular-expresssions.info
    If you repeat a character class by using the ?, * or + operators, you will repeat the entire character class, and not just the character that it matched. The regex [0-9]+ can match 837 as well as 222.

    If you want to repeat the matched character, rather than the class, you will need to use backreferences. ([0-9])\1+ will match 222 but not 837. When applied to the string 833337, it will match 3333 in the middle of this string. If you do not want that, you need to use lookahead and lookbehind.

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

  12. #12
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    40,106

    Re: RegEx with VB.NET

    That should match any number of Not "<" characters one or more times: not < or (not <)(not <) or (not <)(not <)(not <) etc.

    That doesn't seem likely to be what you want.
    My usual boring signature: Nothing

  13. #13

    Thread Starter
    Fanatic Member Seraph's Avatar
    Join Date
    Jul 2007
    Posts
    959

    Re: RegEx with VB.NET

    Oh, I get it. (Because it does work)
    After it gets to "'>" it is now reaching the text. as long as the values are not "<" it keeps matching. then it hits </a> and stops since it encountered a "<" then evaluates the last part (the literal "</a>") and returns a match.

    Of course, there is always the (improbable) chance that the title contains a "<" and it will ruin the match.

    Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7

    SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
    [Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]

    [.NET and MySQL Quick Guide]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width