|
-
Feb 8th, 2010, 12:40 PM
#1
Thread Starter
Fanatic Member
[RESOLVED] RegEx with VB.NET
I am creating a program to search a specific website's HTML source code for a very specific URL string structure with two variable areas in it. I am not good with these regular expressions at all and am having trouble getting what I need.
Here is what I have so far in VB.NET:
Code:
Dim testNum As MatchCollection = Regex.Matches(TextBox1.Text, "<a href='series.php\?ID=[0-9]'>[^</a>]</a>")
If testNum.Count = 0 Then
End
Else
MessageBox.Show(testNum.Count)
End If
Expression: <a href='series.php\?ID=[0-9]'>[^</a>]</a>
Sample Text:
<a href='series.php?ID=23'>Aishiteru ze Baby ( Love You Baby )</a>
<a href='series.php?ID=230'>Akage no Anne ( Anne of Green Gables )</a>
There is an ID number that is variable in Integer value and length directly after "ID=".
Between the HTML anchor tags, there is variable text that can have any length and any characters.
I run my code and no values are being returned (The count of the Match collection is 0).
I know my code is correct because if I just do a search for a single character, I get results. ("a" = 14 matches)
So, I'm totally lost and have no idea what is wrong with my expression as, for some reason, I am totally not grasping this RegEx thing entirely and what constitutes an illegal expression. I'm not sure if it's wrong, or if it isn't compatible with VB.NET or if it's totally wrong. I used a RegEx creator program (which is where I got the above expression), and I'm not even sure if I was operating it properly and if the expression it gave me is right.
Thanks for any help provided.
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 12:47 PM
#2
Addicted Member
Re: RegEx with VB.NET
Have you tried: \d*[0-9]
in place of: [0-9] <-- I think this means 1 digit.
-
Feb 8th, 2010, 12:51 PM
#3
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
Yeah, I think you're right. But that didn't work.
I also put \d*[0-9] in parentheses and it still didn't work.
Of course, maybe the [^</a>] part is wrong?
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 12:53 PM
#4
Addicted Member
Re: RegEx with VB.NET
There might also be some other characters that need to be escaped.
Good idea to try it on its own to just check the number without the rest of the string.
-
Feb 8th, 2010, 01:00 PM
#5
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
OK, so the number is working properly as \d*[0-9]
I also went ahead and tried: <a href='series.php\?ID=\d*[0-9]
and that worked as well. It is the last part of the expression that isn't working.
Shouldn't that be correct though? It's telling it everything after that first portion can be anything but "</a>" then to make sure there is a literal "</a>" at the end of the string.
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 01:01 PM
#6
Re: RegEx with VB.NET
What are you expecting to do with this part: [^</a>]</a>
I have just gotten into RegEx recently, but I read that as being either the beginning of the line, or < or / or a or >, but follwed by </a>. That doesn't seem right. It seems like the ^ would be optional, but those others don't seem like they should be part of the set.
My usual boring signature: Nothing
 
-
Feb 8th, 2010, 01:04 PM
#7
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
Read my post above yours...lol I posted just before you did.
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 01:04 PM
#8
Re: RegEx with VB.NET
Ah, I learned something new, the ^ is not in a set. Ok, but the other characters are just characters in a set, right, so that would be Not < and Not / and Not a and Not >. When you actually want Not ("</a>") (that's not RegEx symbols, by the way).
My usual boring signature: Nothing
 
-
Feb 8th, 2010, 01:11 PM
#9
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
How about this?
<a href='series.php\?ID=\d*[0-9]'>[^(</a>)]</a>
it doesn't work but is it even remotely close to correct?
Everything up until the set works properly.
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 01:14 PM
#10
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
what does a + symbol after a set mean? like [0-9]+
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 01:19 PM
#11
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
Got something: <a href='series.php\?ID=\d+'>[^<]+</a>
based on this text from Regular-Expressions.info, it doesn't make sense how [^<]+ even works:
 Originally Posted by regular-expresssions.info
If you repeat a character class by using the ?, * or + operators, you will repeat the entire character class, and not just the character that it matched. The regex [0-9]+ can match 837 as well as 222.
If you want to repeat the matched character, rather than the class, you will need to use backreferences. ([0-9])\1+ will match 222 but not 837. When applied to the string 833337, it will match 3333 in the middle of this string. If you do not want that, you need to use lookahead and lookbehind.
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
-
Feb 8th, 2010, 01:28 PM
#12
Re: RegEx with VB.NET
That should match any number of Not "<" characters one or more times: not < or (not <)(not <) or (not <)(not <)(not <) etc.
That doesn't seem likely to be what you want.
My usual boring signature: Nothing
 
-
Feb 8th, 2010, 01:32 PM
#13
Thread Starter
Fanatic Member
Re: RegEx with VB.NET
Oh, I get it. (Because it does work)
After it gets to "'>" it is now reaching the text. as long as the values are not "<" it keeps matching. then it hits </a> and stops since it encountered a "<" then evaluates the last part (the literal "</a>") and returns a match.
Of course, there is always the (improbable) chance that the title contains a "<" and it will ruin the match.
Visual Studio 2010 Professional | .NET Framework 4.0 | Windows 7
SERYSOFT.COM :: SysPad - Folder Management Program - Please comment HERE if you find this program useful, have ideas, or know of any bugs.
[Very useful for IT/DP departments where many folders are consistently accessed. Also contains a scratchpad window for quick access to notes.]
[.NET and MySQL Quick Guide]
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|