Techniques for inverted index compression

GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …

Sparse and skew hashing of k-mers

GE Pibiri - Bioinformatics, 2022 - academic.oup.com
Motivation A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and
supports membership queries. This data structure is at the hearth of many important tasks in …

Fulgor: a fast and compact k-mer index for large-scale matching and color queries

J Fan, J Khan, NP Singh, GE Pibiri, R Patro - Algorithms for Molecular …, 2024 - Springer
The problem of sequence identification or matching—determining the subset of reference
sequences from a given collection that are likely to contain a short, queried nucleotide …

A learned approach to design compressed rank/select data structures

A Boffa, P Ferragina, G Vinciguerra - ACM Transactions on Algorithms …, 2022 - dl.acm.org
We address the problem of designing, implementing, and experimenting with compressed
data structures that support rank and select queries over a dictionary of integers. We shine a …

On weighted k-mer dictionaries

GE Pibiri - Algorithms for Molecular Biology, 2023 - Springer
We consider the problem of representing a set of k-mers and their abundance counts, or
weights, in compressed space so that assessing membership and retrieving the weight of ak …

A “Learned” Approach to Quicken and Compress Rank/Select Dictionaries∗

A Boffa, P Ferragina, G Vinciguerra - 2021 Proceedings of the Workshop on …, 2021 - SIAM
We address the well-known problem of designing, implementing and experimenting
compressed data structures for supporting rank and select queries over a dictionary of …

Tradeoff options for bipartite graph partitioning

J Mackenzie, M Petri, A Moffat - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Web connectivity graphs and similar linked data such as inverted indexes are important
components of the information access systems provided by social media and web search …

An accident prediction approach based on XGBoost

X Shi, Q Li, Y Qi, T Huang, J Li - 2017 12th International …, 2017 - ieeexplore.ieee.org
As an important threat to public security, urban fire accident causes huge economic loss and
catastrophic collapse. Predicting and analyzing the interior rule of urban fire accident from its …

Fast dictionary-based compression for inverted indexes

GE Pibiri, M Petri, A Moffat - … of the twelfth ACM international conference …, 2019 - dl.acm.org
Dictionary-based compression schemes provide fast decoding operation, typically at the
expense of reduced compression effectiveness compared to statistical or probability-based …

Index compression using byte-aligned ANS coding and two-dimensional contexts

A Moffat, M Petri - Proceedings of the Eleventh ACM International …, 2018 - dl.acm.org
We examine approaches used for block-based inverted index compression, such as the
OptPFOR mechanism, in which fixed-length blocks of postings data are compressed …