To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a meta tag specific to robots (usually ). When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. The robots.txt file is then parsed and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled. Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. In March 2007, Google warned webmasters that they should prevent indexing of internal search results because those pages are considered search spam.
Great post, your knowledge and innovative approach never fails to amaze me! This is certainly the first time I’ve heard someone suggest the Wikipedia dead link technique. It’s great that you’re getting people to think outside of the box. Pages like reddit are great for getting keywords and can also be used for link building although this can be difficult to get right. Even if you don’t succeed at using it to link build it’s still a really valuable platform for getting useful information. Thanks!
Nice work Laura! This is going to be a great series. I'm working my way through SEOmoz's Advanced SEO Training Series (videos) Vol. 1 & 2 to build upon the advice and guidance that you and your team provided to me during my time at Yahoo!. Now many others will benefit from your knowledge, experience and passion for SEO strategy and tactics. Best wishes for great success in your new role.