ScoutJet web crawler
ScoutJet is the web crawler for blekko, a new Silicon Valley based search engine created by the founders of DMOZ and Topix.
We are developing next generation search technology, and kindly request that you permit ScoutJet access to your site so that we may refine our relevance algorithms with the broadest variety of content available from the Internet.
ScoutJet obeys robots.txt
http://www.yoursite.com/robots.txt file:
# Allow only specific directories
User-agent: ScoutJet
Disallow: /
Allow: /public
You can also limit the rate at which ScoutJet crawls your page using the Crawl-delay directive:
# Limit ScoutJet's crawl rate (example is to crawl no more than 1 page every 5 seconds)
User-agent: ScoutJet
Crawl-delay: 5
In addition, ScoutJet understands wildcards and Allow.
ScoutJet crawls from the following IP ranges:
64.13.159.*
199.87.248.*, 199.87.249.*, 199.87.250.*, 199.87.251.*, 199.87.252.*, 199.87.253.*, 199.87.254.*, 199.87.255.*
38.99.96.*, 38.99.97.*, 38.99.98.*, 38.99.99.*
ScoutJet tries its best to crawl politely. But if you do experience a problem with ScoutJet, please let us know at crawler (at) blekko (dot) com.
Why can't I submit my site?
We don't currently accept URL submissions. We discover and add new sites to our index by link-crawling. Basically you just need to make sure that you allow ScoutJet to crawl your site, and then we will need to discover it via links from other sites.