PDA

Click to See Complete Forum and Search --> : [RESOLVED] PHP - Regex (Q1)


Zeuz
Feb 13th, 2010, 06:14 AM
If this is not too big effort for someone who could help me a little.

Can someone give me an example where you find whole line using word "Hit" from this website: http://MyIP.WebEge.com/ i would like to try this as example.

Results should be: "Hit Count: {Amount}"

I don't just understand PHP.net tutorials to preg_match coding.

Examples help me usually much.

Thank you.

bharanidharanit
Feb 13th, 2010, 09:48 AM
do you want to count the number of visitors hitting your website?

kows
Feb 13th, 2010, 11:14 AM
no, bharanidharanit. this has to do with regular expressions, and nothing to do with a hit counter.

Zeuz, the first thing you need to do is look at the mark up that you are going to encounter. in this case, you're looking at something along the lines of:
<p>Hit count: <b> 15</b></p>
generally speaking, this is a pretty easy thing to find a match for. if you want both the words "Hit count:" and the number, then you can capture both. if the number is the only important part, you can ignore the rest.

if you only want the number, then we can represent the markup above with the following regular expression:
/<p>.*[\s]<b>[\s]*([0-9]+)[\s]*<\/b><\/p>/i
now, to walk through it.

first, the slashes ("/") around the pattern signify the modifier. there are many different ones; in this case, we're using the modifier "i." the i modifier is to enable case insensitivity.
second, we start by putting in the actual mark-up that we're looking for. the beginning of the string is a <p> tag, so we start with that. I'll continue to be matching markup, but will not continue to mention it.
third, we match ".*"; this is an expression. it basically says, "match any character (.) repeated zero or more times (*)."
fourth, we match "[\s]*"; this is another expression. it says, "match any whitespace character ([\s]) repeated zero or more times (*)."
fifth, we match "([0-9]+)"; it says, "match and return (signified by being wrapped in parenthesis) any number from 0-9 ([0-9]) repeated one or more times (+)." this is our number.
sixth, we match anymore whitespace characters repeated 0 or more times and then end with the mark-up. make a note that the slashes ("/") used to close HTML tags must be escaped. this is done with the backslash ("\"), just like in PHP. you would also need to escape other characters -- ".", "*", "+", "(", ")", "[" and "]" for example.

I made this is as flexible as possible -- if the string "hit count:" changes to "total visitors:," this regular expression will still work. if the mark-up changes substantially, however, it will not.

if you wanted to also match the string found, you could encase the ".*" expression in parenthesis.

you can start using this expression by using a function like file_get_contents() to retrieve the page source, and then applying this pattern with preg_match() on it. if you've already gotten the source of the page, then:
$pattern = '/<p>.*[\s]<b>[\s]*([0-9]+)[\s]*<\/b><\/p>/i';
preg_match($pattern, $file, $matches);

echo $matches[1]; //the number

I would suggest you read through regular-expressions.info (http://www.regular-expressions.info/) if you'd like to learn more about them and the way they work. they can be incredibly difficult, but at the same time incredibly powerful. they could also be considered their own language -- so be prepared to not understand it.

Zeuz
Feb 14th, 2010, 06:52 AM
T.h.a.n.k you so much.