|
-
Jun 22nd, 2006, 11:05 AM
#1
Thread Starter
Hyperactive Member
regex multiline argh
i'm trying to pass google results to get each result
the regex i'm using should get every string that begins with
class=l and ends with nobr
but it only gets 2 results from the html below. the reason is that all other
results contain \n or \r.
when creating the regex object i set mutliline, whcih it thought would sort the problem, but obviously not.
Code:
Regex reg = new Regex("class=l.*?nobr", RegexOptions.Multiline);
i've donea bit of googling and found mentions that . matches anything without a newline and mentions using \s but when i try and use it within my expression it errors and says unrecognised escape sequence.
so what i'd like to know is how to amend my regular expression so that i get back all strings that start with class=l and end siwth nobr, regardless of whether there are newlines or not.
Code:
<html><head><meta HTTP-EQUIV=\"content-type\" CONTENT=\"text/html; charset=ISO-8859-1\"><title>thom yorke - Google Search</title><style><!--\nbody,td,div,.p,a{font-family:arial,sans-serif }\ndiv,td{color:#000}\n.f{color:#6f6f6f}\n.flc,.fl:link{color:#77c}\na:link,.w,a.w:link,.w a:link{color:#00c}\na:visited,.fl:visited{color:#551a8b}\na:active,.fl:active{color:#f00}\n.t a:link,.t a:active,.t a:visited,.t{color:#000}\n.t{background-color:#e5ecf9}\n.k{background-color:#36c}\n.j{width:34em}\n.h{color:#36c}\n.i,.i:link{color:#a90a08}\n.a,.a:link{color:#008000}\n.z{display:none}\ndiv.n{margin-top:1ex}\n.n a{font-size:10pt;color:#000}\n.n .i{font-size:10pt;font-weight:bold}\n.q:visited,.q:link,.q:active,.q{color:#00c;}\n.b a{font-size:12pt;color:#00c;font-weight:bold}\n.ch{cursor:pointer;cursor:hand}\n.e{margin-top:.75em;margin-bottom:.75em}\n.g{margin-top:1em;margin-bottom:1em}\n.sm{display:block;margin-top:0px;margin-bottom:0px;margin-left:40px}\n-->\n</style>\n<script>\n<!--\nfunction ss(w,id){window.status=w;return true;}\nfunction cs(){window.status='';}\nfunction ga(o,e) {return true;}\n//-->\n</script>\n</head><body bgcolor=#ffffff topmargin=3 marginheight=3><table border=0 cellspacing=0 cellpadding=0 width=100%><tr><td align=right nowrap><font size=-1><a href=\"https://www.google.com/accounts/Login?continue=http://www.google.co.uk/search%3Fhl%3Den%26q%3Dthom%2520yorke&hl=en\">Sign in</a></font></td></tr><tr height=4><td><img alt=\"\" width=1 height=1></td></tr></table><table border=0 cellpadding=0 cellspacing=0 width=100%><tr><form name=gs method=GET action=/search><td valign=top><a href=\"http://www.google.co.uk/webhp?hl=en\"><img src=\"/images/logo_sm.gif\" width=150 height=55 alt=\"Go to Google Home\" border=0 vspace=12></a></td><td> </td><td valign=top width=100% style=\"padding-top:0px\"><table cellpadding=0 cellspacing=0 border=0><tr><td height=14 valign=bottom><table border=0 cellpadding=4 cellspacing=0><tr><td nowrap><font size=-1><b>Web</b> <a id=t1a class=q href=\"http://images.google.co.uk/images?hl=en&q=thom+yorke&sa=N&tab=wi\">Images</a> <a id=t2a class=q href=\"http://groups.google.co.uk/groups?hl=en&q=thom+yorke&sa=N&tab=wg\">Groups</a> <a id=t4a class=q href=\"http://news.google.co.uk/news?hl=en&q=thom+yorke&sa=N&tab=wn\">News</a> <a id=t5a class=q href=\"http://froogle.google.co.uk/froogle?hl=en&q=thom+yorke&sa=N&tab=wf\">Froogle</a> <b><a href=\"/intl/en/options/\" class=q>more »</a></b></font></td></tr></table></td></tr><tr><td><table border=0 cellpadding=0 cellspacing=0><tr><td nowrap><input type=hidden name=hl value=\"en\"><input type=hidden name=ie value=\"ISO-8859-1\"><input type=text name=q size=41 maxlength=2048 value=\"thom yorke\" title=\"Search\"><font size=-1> <input type=submit name=\"btnG\" value=\"Search\"><span id=hf></span></font></td><td nowrap><font size=-2> <a href=/advanced_search?q=thom+yorke&hl=en&lr=&ie=UTF-8>Advanced Search</a><br> <a href=/preferences?q=thom+yorke&hl=en&lr=&ie=UTF-8>Preferences</a> </font></td></tr></table></td></tr></table><table cellpadding=0 cellspacing=0 border=0><tr><td><font size=-1>Search: <input id=all type=radio name=meta value=\"\" checked><label for=all> the web </label><input id=cty type=radio name=meta value=\"cr=countryUK|countryGB\" ><label for=cty> pages from the UK </label></font></td></tr><tr><td height=7><img width=1 height=1 alt=\"\"></td></tr></table></td></form></tr></table><table width=100% border=0 cellpadding=0 cellspacing=0><tr><td bgcolor=#3366cc><img Google</font></center></body></html>\r\n
-
Jun 26th, 2006, 12:48 PM
#2
Re: regex multiline argh
. matches anything but newline characters.
You are looking for
class=l[.\r\n]*nobr
And you still have to set the MultiLine property as well.
Need to re-register ASP.NET?
C:\WINNT\Microsoft.NET\Framework\v#VERSIONNUMBER#\aspnet_regiis -i
(Edit #VERSIONNUMBER# as needed - do a DIR if you don't know)
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|