SE simplification request

**cornholio** · Nov 19th, 2024, 05:23 AM

I wish to understand how search engine crawlers travers websites, in terms of the pathways they take.

**Arnoutdv** · Nov 19th, 2024, 05:34 AM

https://www.techtarget.com/whatis/fe...from-a-website

**cornholio** · Nov 19th, 2024, 05:42 AM

that's not helping me understand it.
I know it scrapes a webpage for links, ok, but next what, and after that what?

I would also assume it starts with some sort of seed site list, but if so wouldn't that leave web addresses unreachable in a void?

and if I were to assume it goes threw all scraped links, wouldn't that inflation jam up the crawler?

**Arnoutdv** · Nov 19th, 2024, 07:19 AM

Then you first need to collect a list of all domains.
https://www.quora.com/How-do-I-scrap...p-level-domain

Thread: SE simplification request

Thread Tools

Display

SE simplification request

Re: SE simplification request

Re: SE simplification request

Re: SE simplification request

Posting Permissions