The Web is Lovely, Dark and Deep


In the July/August 2008 issue of D-Lib Magazine, Kat Hagedorn and Joshua Santelli conclude that hidden web URLs are still not being indexed by Google. This study follows two years after a similar study showed that a large percentage of the web was not indexed by the top three search engines Google, MSN, and Yahoo.

The present study uses the “OAIster metadata corpus to see what percentage of the corpus was found in the Google search index only.” Of course, this leads to the question: What is OAIster?

OAIster is a union catalog of digital resources. They provide access to these digital resources by “harvesting” their descriptive metadata.

Digital resources include items such as:
• digitized (i.e., scanned) books and articles
• born-digital texts
• audio files (e.g., wav, mp3)
• images (e.g., tiff, gif)
• movies (e.g., mp4, quicktime)
• datasets (e.g., downloadable statistics files)

The study concludes that Google has not attempted “to increase their support and access to OAI materials,” which the authors believe is a mistake since much of these materials are valuable for research purposes and should be easily accessible through the world’s most popular search engine.

Access “Google Still Not Indexing Hidden Web URLs”
Visit OAIster.

Scroll to Top