Techniques for inverted index compression
GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …
essentially a collection of sorted integer sequences called inverted lists. Because of the …
Sparse and skew hashing of k-mers
GE Pibiri - Bioinformatics, 2022 - academic.oup.com
Motivation A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and
supports membership queries. This data structure is at the hearth of many important tasks in …
supports membership queries. This data structure is at the hearth of many important tasks in …
Fulgor: a fast and compact k-mer index for large-scale matching and color queries
The problem of sequence identification or matching—determining the subset of reference
sequences from a given collection that are likely to contain a short, queried nucleotide …
sequences from a given collection that are likely to contain a short, queried nucleotide …
A learned approach to design compressed rank/select data structures
We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …
data structures that support rank and select queries over a dictionary of integers. We shine a …
On weighted k-mer dictionaries
GE Pibiri - Algorithms for Molecular Biology, 2023 - Springer
We consider the problem of representing a set of k-mers and their abundance counts, or
weights, in compressed space so that assessing membership and retrieving the weight of ak …
weights, in compressed space so that assessing membership and retrieving the weight of ak …
A “Learned” Approach to Quicken and Compress Rank/Select Dictionaries∗
We address the well-known problem of designing, implementing and experimenting
compressed data structures for supporting rank and select queries over a dictionary of …
compressed data structures for supporting rank and select queries over a dictionary of …
Tradeoff options for bipartite graph partitioning
Web connectivity graphs and similar linked data such as inverted indexes are important
components of the information access systems provided by social media and web search …
components of the information access systems provided by social media and web search …
An accident prediction approach based on XGBoost
As an important threat to public security, urban fire accident causes huge economic loss and
catastrophic collapse. Predicting and analyzing the interior rule of urban fire accident from its …
catastrophic collapse. Predicting and analyzing the interior rule of urban fire accident from its …
Fast dictionary-based compression for inverted indexes
Dictionary-based compression schemes provide fast decoding operation, typically at the
expense of reduced compression effectiveness compared to statistical or probability-based …
expense of reduced compression effectiveness compared to statistical or probability-based …
Index compression using byte-aligned ANS coding and two-dimensional contexts
We examine approaches used for block-based inverted index compression, such as the
OptPFOR mechanism, in which fixed-length blocks of postings data are compressed …
OptPFOR mechanism, in which fixed-length blocks of postings data are compressed …