An intelligent system for focused crawling from Big Data sources

I Bifulco, S Cirillo, C Esposito, R Guadagni… - Expert Systems with …, 2021 - Elsevier
Nowadays, the proper management of data is a key business enabler and booster for
companies, so as to increase their competitiveness. Typically, companies hold massive …

A survey of grid-based searching techniques for large scale distributed data

MB Bashir, MSB Abd Latiff, Y Coulibaly… - Journal of Network and …, 2016 - Elsevier
The large-scale distributed dataset searching faced dynamicity, heterogeneity, and latency
issues that emphasize the importance of approach to orchestrate the search operations. The …

On the feasibility of geographically distributed web crawling

BB Cambazoglu, F Junqueira, V Plachouras… - 3rd International ICST …, 2010 - eudl.eu
We identify the issues that are important in design of a geographically distributed Web
crawler. The identified issues are discussed from a" benefit" and" challenge" point of view …

1-dimensional and pseudo 2-dimensional HMMs for the recognition of German literal amounts

R Bippus - Proceedings of the Fourth International Conference …, 1997 - ieeexplore.ieee.org
Hidden Markov models (HMMs) are frequently used in off-line cursive script recognition. In
most cases, the script is processed strictly from left to right, yielding a sequence of feature …

The research of a lightweight distributed crawling system

F Ye, Z Jing, Q Huang, Y Chen - 2018 IEEE 16th International …, 2018 - ieeexplore.ieee.org
Nowadays, information on the Internet is growing at an explosive rate. The ability of the
stand-alone web crawling system has come to its bottleneck, so more and more companies …

Design of an intelligent search engine-based UDDI for web service discovery

K Tamilarasi, M Ramakrishnan - … International Conference on …, 2012 - ieeexplore.ieee.org
Web Services and its discovery plays an important role in industry, academics and research.
This paper proposes an intelligent search engine-based UDDI for discovering web services …

The research and implementation of a distributed crawler system based on Apache Flink

F Ye, Z Jing, Q Huang, C Hu, Y Chen - Algorithms and Architectures for …, 2018 - Springer
Web information is growing at an explosive rate. The crawling ability of the single-machine
crawler becomes the bottleneck, so distributed web crawling techniques become the focus …

多Agent 主题爬虫协作策略的研究与分析

杜亚军 - 西华大学学报(自然科学版), 2013 - xhuqk.com
在多个Web 主题爬虫并行爬行中, 如何避免重复访问网页并高效地获取与主题相关网页,
成为搜索引擎主题爬行的热点研究内容之一. 为完成系统爬行任务充分发挥每个爬虫自身能力 …

[引用][C] 广域网分布式Web 爬虫

许笑, 张伟哲, 张宏莉, 方滨兴 - 软件学报, 2010

[引用][C] 分布式搜索引擎系统效能建模与评价

张伟哲, 张宏莉, 许笑, 何慧 - 软件学报, 2012