PTHash: Revisiting FCH minimal perfect hashing
Given a set S of n distinct keys, a function f that bijectively maps the keys of S into the range
(0,..., n-1) is called a minimal perfect hash function for S. Algorithms that find such functions …
(0,..., n-1) is called a minimal perfect hash function for S. Algorithms that find such functions …
Locality-preserving minimal perfect hashing of k-mers
Motivation Minimal perfect hashing is the problem of mapping a static set of n distinct keys
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …
On weighted k-mer dictionaries
GE Pibiri - Algorithms for Molecular Biology, 2023 - Springer
We consider the problem of representing a set of k-mers and their abundance counts, or
weights, in compressed space so that assessing membership and retrieving the weight of ak …
weights, in compressed space so that assessing membership and retrieving the weight of ak …
[HTML][HTML] CoCo-trie: Data-aware compression and indexing of strings
We address the problem of compressing and indexing a sorted dictionary of strings to
support efficient lookups and more sophisticated operations, such as prefix, predecessor …
support efficient lookups and more sophisticated operations, such as prefix, predecessor …
Parallel and external-memory construction of minimal perfect hash functions with PTHash
A function is a minimal perfect hash function for a set of size, if bijectively maps into the first
natural numbers. These functions are important for many practical applications in computing …
natural numbers. These functions are important for many practical applications in computing …
Handling Massive N-Gram Datasets Efficiently
GE Pibiri, R Venturini - ACM Transactions on Information Systems (TOIS), 2019 - dl.acm.org
Two fundamental problems concern the handling of large n-gram language models:
indexing, that is, compressing the n-grams and associated satellite values without …
indexing, that is, compressing the n-grams and associated satellite values without …
Compressed string dictionaries via data-aware subtrie compaction
String dictionaries are a core component of a plethora of applications, so it is not surprising
that they have been widely and deeply investigated in the literature since the introduction of …
that they have been widely and deeply investigated in the literature since the introduction of …
Efficient and effective query auto-completion
Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems,
suggesting possible ways of completing the query being typed by the user. Efficiency is …
suggesting possible ways of completing the query being typed by the user. Efficiency is …
Compressed indexes for fast search of semantic data
The sheer increase in volume of RDF data demands efficient solutions for the triple indexing
problem, that is to devise a compressed data structure to compactly represent RDF triples by …
problem, that is to devise a compressed data structure to compactly represent RDF triples by …
Engineering faster double‐array Aho–Corasick automata
S Kanda, K Akabe, Y Oda - Software: Practice and Experience, 2023 - Wiley Online Library
Multiple pattern matching in strings is a fundamental problem in text processing applications
such as regular expressions or tokenization. This article studies efficient implementations of …
such as regular expressions or tokenization. This article studies efficient implementations of …