Common Crawl.

Not a new swimming style, rather an open set of five billion web pages crawled regularly. You can run a query against the whole corpus using Amazon's infrastructure for around $150.

Nice idea. If you have an internet-scale question, you can answer it relatively cheaply.

CommonCrawl | | CommonCrawl