|
-
Aug 24th, 2009, 04:08 AM
#1
Thread Starter
Member
Page source code with cURL
Hello,
I would like to put into an array, or list, all the domains name situated at the address ‘http://www.trafic.ro/’.
It is possible, because I don’t see the source code (after retriving it with the cURL functions)? 
The code is not visible nither with “view source”… 
Thank you in advance
-
Aug 24th, 2009, 09:43 AM
#2
Re: Page source code with cURL
That page appears to be loading its data via Javascript; cURL cannot execute Javascript, and therefore won't get the content generated by it. I'm not sure how else you could go about doing this...
-
Aug 24th, 2009, 03:58 PM
#3
Thread Starter
Member
Re: Page source code with cURL
In this case, I am wondering if there is a way to loop all the pages and retrive the domains list. I think there must be a way, knowing that in informatics nothing is impossible..
-
Aug 25th, 2009, 02:49 AM
#4
Re: Page source code with cURL
What do you mean by "retrieve the domains list"?
-
Aug 25th, 2009, 07:35 AM
#5
Thread Starter
Member
Re: Page source code with cURL
I would like to create an array with the list of all domains founded on the site www.trafic.ro, for example:
$domain[0] = ‘www.trilulilu.ro’
$domain[1] = ‘forum.softpedia.com’
......
and so on, for all domains of the 3059 pages.
-
Aug 25th, 2009, 07:39 AM
#6
Re: Page source code with cURL
I don't think you would be allowed to do that as you are taking data from another site, which is effectively a breach of copyright.
In addition, if the page is generated by Javascript then you are not going to get very far with the source code unless you write your own Javascript interpreter, run the source code through it then crawl the links.
-
Aug 26th, 2009, 02:06 AM
#7
Thread Starter
Member
Re: Page source code with cURL
Copyright is “a document granting exclusive right to publish and sell literary or musical or artistic work”. In the mentionned site is only a collection of public web addresses, so in my opinion is not subject to copyright.
More than that I am using it only for a personal statistical analysis.
-
Aug 26th, 2009, 02:32 AM
#8
Re: Page source code with cURL
after a small amount of source-looking, this website might get their information from this website. you may have a much easier time crawling that website instead.
-
Aug 26th, 2009, 04:18 AM
#9
Re: Page source code with cURL
 Originally Posted by neptun_
Copyright is “a document granting exclusive right to publish and sell literary or musical or artistic work”. In the mentionned site is only a collection of public web addresses, so in my opinion is not subject to copyright.
More than that I am using it only for a personal statistical analysis.
The content has not been compiled by you, therefore you have no right to modify and republish it. It does not matter what the content is; in order to proceed you must get permission from the owner of the web site or reference fully with use of a link the location from which you pulled the information and state clearly that the information was from that source.
Please refer to the hosting countries copyright law for clarification: http://www.legi-internet.ro/en/copyright.htm
If you are using it only for personal purposes, you still need to reference the source of the information in order to give credit to the copyright owner and more importantly add the required weight to any statistics derived from those data.
Last edited by visualAd; Aug 26th, 2009 at 04:21 AM.
-
Aug 26th, 2009, 08:18 AM
#10
Thread Starter
Member
Re: Page source code with cURL
Kows, thanks for the addres. Unfortunately they have only 400 sites in their statistics. Trafic.ro contains almost every site in the country (~45.000). I think they are using only a ping to the mentionned site.
visualAd, thank you too for the address regarding the copyright. All I want do do is an analysis, only for my personal curiosity, to be able to compare the data with the official reports.
It remains an interesting question, from the technical point of view, how to get the source code from that kind of site, with pages generated in Javascript. How it can be done and what is the amount of time to spend… it is 5 hours, it is 5 days..
-
Aug 26th, 2009, 02:40 PM
#11
Re: Page source code with cURL
it's just using ajax. if you sifted through their code you could figure it out, I'm sure. I'm just not going to.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|