PDA

Click to See Complete Forum and Search --> : preg_match_all, and whitespace


MasterEvilAce
Jan 17th, 2004, 01:28 PM
How do I make this code work no matter how much space is inbetween the html tags?$template = '<table cellspacing="0" cellpadding="0" border="0" width="100%"><tr><td><h1>(.*)</h1></td><td align="right">&nbsp;</td></tr></table>';
preg_match_all("|".$template."|U", $copy, $system);Doesn't work if the text appears like<table cellspacing="0" cellpadding="0" border="0" width="100%">
<tr>
<td><h1>BLAH HAHA!</h1></td>
<td align="right">&nbsp;</td>
</tr>
</table>

I'd like to keep the PHP on a single line, just so it takes up less space

The Hobo
Jan 17th, 2004, 02:14 PM
Did you try using the 'x' modifier? I believe that ignores all whitespace.

MasterEvilAce
Jan 17th, 2004, 02:32 PM
:/

tried it, but it doesn't change anything

CornedBee
Jan 19th, 2004, 03:56 AM
Then you'll have to change the template. I think \w matches a whitespace, so put \w* everywhere where there might be whitespace. And that's a lot of places ;)

It's not pretty.

MasterEvilAce
Jan 29th, 2004, 07:24 PM
ok, got the problem figured out and all...


but.. using preg_match_All, it's weird..

$img[1][1]
$img[1][2]
$img[1][3]

Grabs correctly, but it SKIPS the first instance of the found string..
(Say i'm looking for the word "Hi", I only need that word 3 times.. but there are only 3 instances of the word, in the text.

The FIRST two variables will be "Hi".. the third will be EMPTY. It's skipping the first instance of the string to be found..


if I do:
$img[1][0]
$img[1][1]
$img[1][2]
The LAST two are "Hi".. but the first one is a WHOLE LOT more text than I specified to search for.. but the second variable *IS* the second instance of "Hi" and the third IS the THIRD instance of "Hi"

So.. something is really weird.. and I have no idea what's wrong with it.... Any suggestions?

BTW: I'm using non-greedy statements (matches as little text as possible)

MasterEvilAce
Jan 30th, 2004, 12:10 AM
when i do
$img[1][0]

it seems to be grabbing TOO MUCH.
but I have the greediness OFF... but it's still grabbing too much

i don't understand.. in my expression, are there any non-literal characters that i forgot to escape?

CornedBee
Jan 30th, 2004, 04:19 AM
Arrays start with 0. But for regexp matches, [0] is the whole input string. What about $img[0][n] ?

MasterEvilAce
Feb 10th, 2004, 03:03 PM
$img[1][0] -- Too much.. or whatever..
$img[1][1] -- Is actually the SECOND image on the page
$img[1][2] -- Is the THIRD image on the page

CornedBee
Feb 10th, 2004, 03:25 PM
You sure your expression captures the first image?

tempest1
Feb 15th, 2004, 02:18 PM
To keep a string on a single line use this...


Function sameLine($string)
{
$string = str_replace("\n","",str_replace("\r","",$string));
return $string;
}

MasterEvilAce
Feb 18th, 2004, 09:24 PM
Originally posted by CornedBee
You sure your expression captures the first image?
yes, most definitely, it does...

All the images are setup the same, on the pages i'm grabbing the html from...


To keep a string on a single line use this...
I use something similar, and so yes, everything is on a single line, but the code still seems to capture too much, the first time..