Click to See Complete Forum and Search --> : [RESOLVED] Optimizing regular expression
TheBigB
May 27th, 2010, 05:40 PM
Hi,
I have this simple expression that retrieves the content from the HTML head.
/.*<head>(.*)<\/head>.*/
Since I don't have much experience with expressions, I'm not sure whether this is all that efficient.
Any suggestions?
Thanks.
kows
May 27th, 2010, 06:41 PM
you only really need this:
/<head>(.*)<\/head>/si
I added two modifiers: the 'i' modifier so that it would be treated as case insensitive (eg. HEAD, Head), and the 's' modifier to turn on single-line mode so that the single-character (".") also matches line-breaks.
SambaNeko
May 27th, 2010, 07:00 PM
Avoid using .* where not necessary or if there's a practical alternative that's lazy (as opposed to greedy; use excluders, not includers). Instead of the .* in kows' sample, maybe you could use a "not </head>". But my example isn't working. :)
Zach_VB6
May 28th, 2010, 11:35 AM
Or you could do it without regex: function getHead($HTML) {
$start = stripos($HTML, "<HEAD>") + 6;
if($start) {
$stop = stripos($HTML, "</HEAD>", $start);
if($stop) {
return substr($HTML, $start, $stop - $start);
}
}
return "";
}
sciguyryan
Jun 3rd, 2010, 06:07 AM
Depending on how you want it done you could even use the strip_tags (http://uk3.php.net/manual/en/function.strip-tags.php) PHP function.
kows
Jun 3rd, 2010, 07:16 AM
Depending on how you want it done you could even use the strip_tags (http://uk3.php.net/manual/en/function.strip-tags.php) PHP function.
uhh? all strip_tags() does is remove HTML tags. it doesn't remove content, which means that you wouldn't be able to use it to parse anything. it's used for sanitation of user input, usually.
regular expressions are the way to go in this case.
sciguyryan
Jun 3rd, 2010, 07:52 AM
uhh? all strip_tags() does is remove HTML tags. it doesn't remove content, which means that you wouldn't be able to use it to parse anything. it's used for sanitation of user input, usually.
regular expressions are the way to go in this case.
No. But had you looked on the page, there is a function on there that does. Such as:
function strip_selected_tags($str, $tags = "", $stripContent = false)
{
preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER);
foreach ($allTags[1] as $tag) {
$replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
$replace2 = "%(<$tag.*?>)%is";
echo $replace;
if ($stripContent) {
$str = preg_replace($replace,'',$str);
$str = preg_replace($replace2,'',$str);
}
$str = preg_replace($replace,'${2}',$str);
$str = preg_replace($replace2,'${2}',$str);
}
return $str;
}
... and it also makes for a pretty interesting demo of RegExp too.
kows
Jun 3rd, 2010, 08:54 AM
No. But had you looked on the page, there is a function on there that does.
What you said -- that you could use strip_tags() -- is still incorrect.
If you're going to point out that a function posted in the comments of the strip_tags() documentation might be useful, then you should probably say that it actually has nothing to do with strip_tags() and link to the comment itself (http://uk3.php.net/manual/en/function.strip-tags.php#93414). Otherwise, you're just misinforming.
Should I even mention that the function you posted doesn't even do what was needed (or what you suggested it did), anyway? It was written that way to support self-closing tags (like <input />).
TheBigB
Jun 5th, 2010, 10:52 AM
/<head>(.*)<\/head>/si
This one also stores the header value including the header tags in $matches[0] (instead of the whole input), which is actually something I also needed.
As the Dutch say, two flies in one swat.
I also appreciate the other suggestions made. Thanks all :wave:
vbforums.com
Copyright Internet.com Inc., All Rights Reserved.