|
-
Sep 10th, 2004, 06:49 PM
#1
Thread Starter
Stuck in the 80s
Problem with preg_replace()
I have the following code to create list tags (much like this forum):
PHP Code:
function listtag($text) {
$lststyle = 'style="margin-top: 0px; margin-bottom: 0px;"';
if (preg_match_all('/\[list=(.*)\](.*)\[\/list\]/siU', $text, $match)) {
$return = preg_replace('/<br \/>/', '', $match[0]);
$text = str_replace($match[0], $return, $text);
}
$list_tags = array('/\[list=3\](.*)\[\/list\](\r\n|)/siU', '/\[\*\](.*)\[\/\*\]/isU');
$list_html = array('<ul ' . $lststyle . '>\\1</ul>', '<li>\\1</li>');
$text = preg_replace($list_tags, $list_html, $text);
$text = str_replace("</ul><br />", '</ul>', $text);
return $text;
}
It works great, except for when there are nested tags, it doesn't seem to grab the inner list tags. Any ideas on how to fix this?
-
Sep 11th, 2004, 06:37 AM
#2
I know what is cuasing your problem but don't have the time to solve it for you. Your expression which should have the meta characters escpaped, like so:
/\[list=(.*)\](.*)\[\/list\]/siU
Will match the smallest possible occurence of the pattern, starting from the beginning of the string, the first match in the following string will be as follows:
[list][list][/list][/list]
If you turn the greediness modifier off then it will match the largest possible occurence:
[list][list][/list][/list]
Now what you want is a recursive match. I believe this can be achieved by using the R modifier, but it is complex and I cannot remember how. I found this in the PHP documentation on pattern syntax which describes how to do this:
PHP Doecumentation
Recursive patterns
Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without the use of recursion, the best that can be done is to use a pattern that matches up to some fixed depth of nesting. It is not possible to handle an arbitrary nesting depth. Perl 5.6 has provided an experimental facility that allows regular expressions to recurse (among other things). The special item (?R) is provided for the specific case of recursion. This PCRE pattern solves the parentheses problem (assume the PCRE_EXTENDED option is set so that white space is ignored): \( ( (?>[^()]+) | (?R) )* \)
First it matches an opening parenthesis. Then it matches any number of substrings which can either be a sequence of non-parentheses, or a recursive match of the pattern itself (i.e. a correctly parenthesized substring). Finally there is a closing parenthesis.
This particular example pattern contains nested unlimited repeats, and so the use of a once-only subpattern for matching strings of non-parentheses is important when applying the pattern to strings that do not match. For example, when it is applied to (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() it yields "no match" quickly. However, if a once-only subpattern is not used, the match runs for a very long time indeed because there are so many different ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported.
The values set for any capturing subpatterns are those from the outermost level of the recursion at which the subpattern value is set. If the pattern above is matched against (ab(cd)ef) the value for the capturing parentheses is "ef", which is the last value taken on at the top level. If additional parentheses are added, giving \( ( ( (?>[^()]+) | (?R) )* ) \) then the string they capture is "ab(cd)ef", the contents of the top level parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it via pcre_free afterwards. If no memory can be obtained, it saves data for the first 15 capturing parentheses only, as there is no way to give an out-of-memory error from within a recursion.
-
Sep 11th, 2004, 11:27 AM
#3
Thread Starter
Stuck in the 80s
I don't follow at all.
-
Sep 11th, 2004, 03:45 PM
#4
Thread Starter
Stuck in the 80s
This works, but I know there's a better way to do it than to loop:
PHP Code:
function listtag($text) {
$lststyle = 'style="margin-top: 0px; margin-bottom: 0px;"';
$text = preg_replace('/(\n|\r|\r\n|\n\r)/', '', $text);
do {
$text = preg_replace('/\[\*\](.*)\[\/\*\]/U', '<li>\\1</li>', $text);
} while (preg_match('/\[\*\](.*)\[\/\*\]/U', $text) != 0);
if (preg_match_all('/\[list=(.*)\](.*?)\[\/list\]/siU', $text, $match)) {
$return = preg_replace('/<br \/>/', '', $match[0]);
$text = str_replace($match[0], $return, $text);
}
do {
$text = preg_replace('/\[list=(.*)\](.*)\[\/list\]/siU', '<ul ' . $lststyle . '>\\2</ul>', $text);
} while (preg_match('/\[list=(.*)\](.*)\[\/list\]/i', $text) != 0);
$text = str_replace("</li><br />", '</li>', $text);
return $text;
}
-
Sep 12th, 2004, 11:07 AM
#5
Originally posted by The Hobo
I don't follow at all.
I don't really either. I did once because I produced a regular expression for matching recursive sub patterns, but for the life of me I can't find the script . I will have another read of that paragraph later and try an come up with something.
I spent about 2 hours looking at your script last night trying to work out why it was working. If you look casrefully you'll see that the characters which should be escaped with a backslash are not. This is VB Forums fault becuase it automatically takes them away when you post it. So if you want your PHP code to be correct when you post it you need to double up on all you backslashes, otherwise they won't show.
Your script works but only for single lists. If you have a list inside a list then it fails. Thats unless I am doing something wrong. I have been trying to get around the problem but to no avail, becuase when I fix the lists, it breaks the[*]'s matfching. Again, I think the only way is to use a recursive pattern match.
The VBulliten software only goes to a list depth of 1 too, so I guess if you are not concerned about lists inside lists your script works wonders. Here's the script I tested and the output I got:
PHP Code:
<?php
$text =
"[\list=1]
[\*] 1 [\/*]
[\*] 2[\/*]
[\*] 3[\/*]
[\*]
[\list]
[\*]a[\/*]
[\*]b[\/*]
[\*]c[\/*]
[\/list][\/*]
[\*] 8[\/*]
[\/list]";
echo (listtag($text));
function listtag($text) {
$lststyle = 'style="margin-top: 0px; margin-bottom: 0px;"';
$text = preg_replace("/(\\n|\\r|\\r\\n|\\n\\r)/", '', $text);
do {
$text = preg_replace("/\\[\\\*\\](.*)\\[\\\/\\*\\]/U", "<li>\\\\1</li>", $text);
} while (preg_match("/\\[\\\*\\](.*)\\[\\\/\\*\\]/U", $text) != 0);
if (preg_match_all("/\\[\list=(.*)\\](.*?)\\[\\\/list\\]/siU", $text, $match)) {
$return = preg_replace('/<br \\/>/', '', $match[\0]);
$text = str_replace($match[\0], $return, $text);
}
i
do {
$text = preg_replace("/\\[\list=(.*)\\](.*)\\[\\\/list\\]/siU", "<ul " . $lststyle . '>\\2</ul>', $text);
} while (preg_match("/\\[\list=(.*)\\](.*)\\[\\\/list\\]/i", $text) != 0);
$text = str_replace("<\\/li><br \\/>", "</li>", $text);
return $text;
}
?>
Output:
Code:
<ul style="margin-top: 0px; margin-bottom: 0px;">
<li> 1 </li> <li> 2</li> <li> 3</li>
<li> [list] <li>a</li>
<li>b</li> <li>c</li> </ul></li> <li> 8</li>[/list]
-
Sep 13th, 2004, 04:57 AM
#6
Fanatic Member
this
Code:
<style>
body{
font:9pt arial;
}
</style>
<?php
$text =
"[list=1]
[*] 1[/*]
[*] 2[/*]
[*] 3[/*]
[list]
[*]a[/*]
[*]b[/*]
[*]c[/*][/list]
[*] 8[/*][/list]";
echo (listtag($text));
function listtag($text){
$text=preg_replace("/\[list=1\](.*?)\[\/list\]/siU","<ol>$1</ol>",$text);
$text=preg_replace("/\[list\](.*?)\[\/list\]/siU","<ul>$1</ul>",$text);
$text=preg_replace("/\[\*\](.*)\[\/\*\]/","<li>$1</li>",$text);
return $text;
}
?>
has bug. if you do
Code:
"[list]
[*] 1[/*]
[*] 2[/*]
[*] 3[/*]
[list=1]
[*]a[/*]
[*]b[/*]
[*]c[/*][/list]
[*] 8[/*][/list]";
this outputs wrong.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|