作者
MA Kahttab, Yasser Fouad, Osama Abu Rawash
发表日期
2009/8
期刊
IJCSNS
卷号
9
期号
8
页码范围
40-45
简介
Web search engines use web crawlers that follow hyperlinks. This technique is ideal for discovering resources on the surface web but is often ineffective at finding deep web resources. The surface web is the portion of the World Wide Web that is indexed by conventional search engines whereas the deep web refers to web content that is not part of the surface web. The deep web represents a major gap in the coverage of web search engines as believed to be of a very high quality and is estimated to be several orders of magnitude larger than the surface Web. Understanding the nature of the deep web resources as being massively increased give us a conclusion that to be efficiently explored need an approach based on two main concepts, the first concept is to solve the problem from the side of web servers and the second concept is to automate the discovery process. In this paper we developed and implemented the Host List Protocol model that is depending on such approach to discover hidden web hosts and provide a way to be indexed through web search engines.
引用总数
2010201120122013201420152016201720181111