Techniques for inverted index compression

GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …

POCLib: A high-performance framework for enabling near orthogonal processing on compression

F Zhang, J Zhai, X Shen, O Mutlu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Parallel technology boosts data processing in recent years, and parallel direct data
processing on hierarchically compressed documents exhibits great promise. The high …

CC-News-En: A large English news corpus

J Mackenzie, R Benham, M Petri, JR Trippas… - Proceedings of the 29th …, 2020 - dl.acm.org
We describe a static, open-access news corpus using data from the Common Crawl
Foundation, who provide free, publicly available web archives, including a continuous crawl …

CompressDB: Enabling efficient compressed data direct processing for various databases

F Zhang, W Wan, C Zhang, J Zhai, Y Chai… - Proceedings of the 2022 …, 2022 - dl.acm.org
In modern data management systems, directly performing operations on compressed data
has been proven to be a big success facing big data problems. These systems have …

TADOC: Text analytics directly on compression

F Zhang, J Zhai, X Shen, D Wang, Z Chen, O Mutlu… - The VLDB Journal, 2021 - Springer
This article provides a comprehensive description of text analytics directly on compression
(TADOC), which enables direct document analytics on compressed textual data. The article …

Exploring data analytics without decompression on embedded GPU systems

Z Pan, F Zhang, Y Zhou, J Zhai, X Shen… - … on Parallel and …, 2021 - ieeexplore.ieee.org
With the development of computer architecture, even for embedded systems, GPU devices
can be integrated, providing outstanding performance and energy efficiency to meet the …

G-TADOC: Enabling efficient GPU-based text analytics without decompression

F Zhang, Z Pan, Y Zhou, J Zhai, X Shen… - 2021 IEEE 37th …, 2021 - ieeexplore.ieee.org
Text analytics directly on compression (TADOC) has proven to be a promising technology for
big data analytics. GPUs are extremely popular accelerators for data analytics systems …

Loggrep: Fast and cheap cloud log storage by exploiting both static and runtime patterns

J Wei, G Zhang, J Chen, Y Wang, W Zheng… - Proceedings of the …, 2023 - dl.acm.org
In cloud systems, near-line logs are mainly used for debugging, which means they prefer a
low query latency for a better user experience, and like any other logs, they also prefer a low …

Gemini: Learning to manage cpu power for latency-critical search engines

L Zhou, LN Bhuyan… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Saving energy for latency-critical applications like web search can be challenging because
of their strict tail latency constraints. State-of-the-art power management frameworks use …

[PDF][PDF] Toward Efficient Navigation of Massive-Scale Geo-Textual Streams.

C Yang, L Chen, S Shang, F Zhu, L Liu, L Shao - IJCAI, 2019 - ijcai.org
With the popularization of portable devices, numerous applications continuously produce
huge streams of geo-tagged textual data, thus posing challenges to index geo-textual …