Page source code with cURL
Hello,
I would like to put into an array, or list, all the domains name situated at the address ‘http://www.trafic.ro/’.
It is possible, because I don’t see the source code (after retriving it with the cURL functions)? :confused:
The code is not visible nither with “view source”… :(
Thank you in advance
Re: Page source code with cURL
That page appears to be loading its data via Javascript; cURL cannot execute Javascript, and therefore won't get the content generated by it. I'm not sure how else you could go about doing this...
Re: Page source code with cURL
In this case, I am wondering if there is a way to loop all the pages and retrive the domains list. I think there must be a way, knowing that in informatics nothing is impossible..:)
Re: Page source code with cURL
What do you mean by "retrieve the domains list"?
Re: Page source code with cURL
I would like to create an array with the list of all domains founded on the site www.trafic.ro, for example:
$domain[0] = ‘www.trilulilu.ro’
$domain[1] = ‘forum.softpedia.com’
......
and so on, for all domains of the 3059 pages.
Re: Page source code with cURL
I don't think you would be allowed to do that as you are taking data from another site, which is effectively a breach of copyright.
In addition, if the page is generated by Javascript then you are not going to get very far with the source code unless you write your own Javascript interpreter, run the source code through it then crawl the links.
Re: Page source code with cURL
Copyright is “a document granting exclusive right to publish and sell literary or musical or artistic work”. In the mentionned site is only a collection of public web addresses, so in my opinion is not subject to copyright.
More than that I am using it only for a personal statistical analysis.
Re: Page source code with cURL
after a small amount of source-looking, this website might get their information from this website. you may have a much easier time crawling that website instead.
Re: Page source code with cURL
Quote:
Originally Posted by
neptun_
Copyright is “a document granting exclusive right to publish and sell literary or musical or artistic work”. In the mentionned site is only a collection of public web addresses, so in my opinion is not subject to copyright.
More than that I am using it only for a personal statistical analysis.
The content has not been compiled by you, therefore you have no right to modify and republish it. It does not matter what the content is; in order to proceed you must get permission from the owner of the web site or reference fully with use of a link the location from which you pulled the information and state clearly that the information was from that source.
Please refer to the hosting countries copyright law for clarification: http://www.legi-internet.ro/en/copyright.htm
If you are using it only for personal purposes, you still need to reference the source of the information in order to give credit to the copyright owner and more importantly add the required weight to any statistics derived from those data.
Re: Page source code with cURL
Kows, thanks for the addres. Unfortunately they have only 400 sites in their statistics. Trafic.ro contains almost every site in the country (~45.000). I think they are using only a ping to the mentionned site.
visualAd, thank you too for the address regarding the copyright. All I want do do is an analysis, only for my personal curiosity, to be able to compare the data with the official reports.
It remains an interesting question, from the technical point of view, how to get the source code from that kind of site, with pages generated in Javascript. How it can be done and what is the amount of time to spend… it is 5 hours, it is 5 days..
Re: Page source code with cURL
it's just using ajax. if you sifted through their code you could figure it out, I'm sure. I'm just not going to.