PTHash: Revisiting FCH minimal perfect hashing

GE Pibiri, R Trani - Proceedings of the 44th international ACM SIGIR …, 2021 - dl.acm.org
Given a set S of n distinct keys, a function f that bijectively maps the keys of S into the range
(0,..., n-1) is called a minimal perfect hash function for S. Algorithms that find such functions …

Locality-preserving minimal perfect hashing of k-mers

GE Pibiri, Y Shibuya, A Limasset - Bioinformatics, 2023 - academic.oup.com
Motivation Minimal perfect hashing is the problem of mapping a static set of n distinct keys
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …

On weighted k-mer dictionaries

GE Pibiri - Algorithms for Molecular Biology, 2023 - Springer
We consider the problem of representing a set of k-mers and their abundance counts, or
weights, in compressed space so that assessing membership and retrieving the weight of ak …

[HTML][HTML] CoCo-trie: Data-aware compression and indexing of strings

A Boffa, P Ferragina, F Tosoni, G Vinciguerra - Information Systems, 2024 - Elsevier
We address the problem of compressing and indexing a sorted dictionary of strings to
support efficient lookups and more sophisticated operations, such as prefix, predecessor …

Parallel and external-memory construction of minimal perfect hash functions with PTHash

GE Pibiri, R Trani - IEEE Transactions on Knowledge and Data …, 2023 - ieeexplore.ieee.org
A function is a minimal perfect hash function for a set of size, if bijectively maps into the first
natural numbers. These functions are important for many practical applications in computing …

Handling Massive N-Gram Datasets Efficiently

GE Pibiri, R Venturini - ACM Transactions on Information Systems (TOIS), 2019 - dl.acm.org
Two fundamental problems concern the handling of large n-gram language models:
indexing, that is, compressing the n-grams and associated satellite values without …

Compressed string dictionaries via data-aware subtrie compaction

A Boffa, P Ferragina, F Tosoni… - … Symposium on String …, 2022 - Springer
String dictionaries are a core component of a plethora of applications, so it is not surprising
that they have been widely and deeply investigated in the literature since the introduction of …

Efficient and effective query auto-completion

S Gog, GE Pibiri, R Venturini - … of the 43rd International ACM SIGIR …, 2020 - dl.acm.org
Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems,
suggesting possible ways of completing the query being typed by the user. Efficiency is …

Compressed indexes for fast search of semantic data

R Perego, GE Pibiri, R Venturini - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The sheer increase in volume of RDF data demands efficient solutions for the triple indexing
problem, that is to devise a compressed data structure to compactly represent RDF triples by …

Engineering faster double‐array Aho–Corasick automata

S Kanda, K Akabe, Y Oda - Software: Practice and Experience, 2023 - Wiley Online Library
Multiple pattern matching in strings is a fundamental problem in text processing applications
such as regular expressions or tokenization. This article studies efficient implementations of …