Page source code with cURL

Printable View

Aug 24th, 2009, 04:08 AM
neptun_

Page source code with cURL

Hello,

I would like to put into an array, or list, all the domains name situated at the address ‘http://www.trafic.ro/’.
It is possible, because I don’t see the source code (after retriving it with the cURL functions)? :confused:
The code is not visible nither with “view source”… :(

Thank you in advance
Aug 24th, 2009, 09:43 AM
SambaNeko

Re: Page source code with cURL

That page appears to be loading its data via Javascript; cURL cannot execute Javascript, and therefore won't get the content generated by it. I'm not sure how else you could go about doing this...
Aug 24th, 2009, 03:58 PM
neptun_

Re: Page source code with cURL

In this case, I am wondering if there is a way to loop all the pages and retrive the domains list. I think there must be a way, knowing that in informatics nothing is impossible..:)
Aug 25th, 2009, 02:49 AM
visualAd

Re: Page source code with cURL

What do you mean by "retrieve the domains list"?
Aug 25th, 2009, 07:35 AM
neptun_

Re: Page source code with cURL

I would like to create an array with the list of all domains founded on the site www.trafic.ro, for example:

$domain[0] = ‘www.trilulilu.ro’
$domain[1] = ‘forum.softpedia.com’
......

and so on, for all domains of the 3059 pages.
Aug 25th, 2009, 07:39 AM
visualAd

Re: Page source code with cURL

I don't think you would be allowed to do that as you are taking data from another site, which is effectively a breach of copyright.

In addition, if the page is generated by Javascript then you are not going to get very far with the source code unless you write your own Javascript interpreter, run the source code through it then crawl the links.
Aug 26th, 2009, 02:06 AM
neptun_

Re: Page source code with cURL

Copyright is “a document granting exclusive right to publish and sell literary or musical or artistic work”. In the mentionned site is only a collection of public web addresses, so in my opinion is not subject to copyright.
More than that I am using it only for a personal statistical analysis.
Aug 26th, 2009, 02:32 AM
kows

Re: Page source code with cURL

after a small amount of source-looking, this website might get their information from this website. you may have a much easier time crawling that website instead.
Aug 26th, 2009, 04:18 AM
visualAd

Re: Page source code with cURL

Quote:

Originally Posted by neptun_

Copyright is “a document granting exclusive right to publish and sell literary or musical or artistic work”. In the mentionned site is only a collection of public web addresses, so in my opinion is not subject to copyright.
More than that I am using it only for a personal statistical analysis.

The content has not been compiled by you, therefore you have no right to modify and republish it. It does not matter what the content is; in order to proceed you must get permission from the owner of the web site or reference fully with use of a link the location from which you pulled the information and state clearly that the information was from that source.

Please refer to the hosting countries copyright law for clarification: http://www.legi-internet.ro/en/copyright.htm

If you are using it only for personal purposes, you still need to reference the source of the information in order to give credit to the copyright owner and more importantly add the required weight to any statistics derived from those data.
Aug 26th, 2009, 08:18 AM
neptun_

Re: Page source code with cURL

Kows, thanks for the addres. Unfortunately they have only 400 sites in their statistics. Trafic.ro contains almost every site in the country (~45.000). I think they are using only a ping to the mentionned site.

visualAd, thank you too for the address regarding the copyright. All I want do do is an analysis, only for my personal curiosity, to be able to compare the data with the official reports.

It remains an interesting question, from the technical point of view, how to get the source code from that kind of site, with pages generated in Javascript. How it can be done and what is the amount of time to spend… it is 5 hours, it is 5 days..
Aug 26th, 2009, 02:40 PM
kows

Re: Page source code with cURL

it's just using ajax. if you sifted through their code you could figure it out, I'm sure. I'm just not going to.