Techniques for inverted index compression
GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …
essentially a collection of sorted integer sequences called inverted lists. Because of the …
POCLib: A high-performance framework for enabling near orthogonal processing on compression
Parallel technology boosts data processing in recent years, and parallel direct data
processing on hierarchically compressed documents exhibits great promise. The high …
processing on hierarchically compressed documents exhibits great promise. The high …
CC-News-En: A large English news corpus
We describe a static, open-access news corpus using data from the Common Crawl
Foundation, who provide free, publicly available web archives, including a continuous crawl …
Foundation, who provide free, publicly available web archives, including a continuous crawl …
CompressDB: Enabling efficient compressed data direct processing for various databases
In modern data management systems, directly performing operations on compressed data
has been proven to be a big success facing big data problems. These systems have …
has been proven to be a big success facing big data problems. These systems have …
TADOC: Text analytics directly on compression
This article provides a comprehensive description of text analytics directly on compression
(TADOC), which enables direct document analytics on compressed textual data. The article …
(TADOC), which enables direct document analytics on compressed textual data. The article …
Exploring data analytics without decompression on embedded GPU systems
With the development of computer architecture, even for embedded systems, GPU devices
can be integrated, providing outstanding performance and energy efficiency to meet the …
can be integrated, providing outstanding performance and energy efficiency to meet the …
G-TADOC: Enabling efficient GPU-based text analytics without decompression
Text analytics directly on compression (TADOC) has proven to be a promising technology for
big data analytics. GPUs are extremely popular accelerators for data analytics systems …
big data analytics. GPUs are extremely popular accelerators for data analytics systems …
Loggrep: Fast and cheap cloud log storage by exploiting both static and runtime patterns
In cloud systems, near-line logs are mainly used for debugging, which means they prefer a
low query latency for a better user experience, and like any other logs, they also prefer a low …
low query latency for a better user experience, and like any other logs, they also prefer a low …
Gemini: Learning to manage cpu power for latency-critical search engines
Saving energy for latency-critical applications like web search can be challenging because
of their strict tail latency constraints. State-of-the-art power management frameworks use …
of their strict tail latency constraints. State-of-the-art power management frameworks use …
[PDF][PDF] Toward Efficient Navigation of Massive-Scale Geo-Textual Streams.
With the popularization of portable devices, numerous applications continuously produce
huge streams of geo-tagged textual data, thus posing challenges to index geo-textual …
huge streams of geo-tagged textual data, thus posing challenges to index geo-textual …