Sparse and skew hashing of k-mers

GE Pibiri - Bioinformatics, 2022 - academic.oup.com
Motivation A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and
supports membership queries. This data structure is at the hearth of many important tasks in …

Proactively identifying emerging hacker threats from the dark web: A diachronic graph embedding framework (d-gef)

S Samtani, H Zhu, H Chen - ACM Transactions on Privacy and Security …, 2020 - dl.acm.org
Cybersecurity experts have appraised the total global cost of malicious hacking activities to
be $450 billion annually. Cyber Threat Intelligence (CTI) has emerged as a viable approach …

PTHash: Revisiting FCH minimal perfect hashing

GE Pibiri, R Trani - Proceedings of the 44th international ACM SIGIR …, 2021 - dl.acm.org
Given a set S of n distinct keys, a function f that bijectively maps the keys of S into the range
(0,..., n-1) is called a minimal perfect hash function for S. Algorithms that find such functions …

SAT-Geo: A social sensing based content-only approach to geolocating abnormal traffic events using syntax-based probabilistic learning

L Shang, Y Zhang, C Youn, D Wang - Information Processing & …, 2022 - Elsevier
Social sensing has become an emerging and pervasive sensing paradigm to collect timely
observations of the physical world from human sensors. In this paper, we study the problem …

Locality-preserving minimal perfect hashing of k-mers

GE Pibiri, Y Shibuya, A Limasset - Bioinformatics, 2023 - academic.oup.com
Motivation Minimal perfect hashing is the problem of mapping a static set of n distinct keys
into the address space {1,…, n} bijectively. It is well-known that n log 2 (e) bits are necessary …

Deep learning from physicochemical information of concrete with an artificial language for property prediction and reaction discovery

S Mahjoubi, R Barhemat, W Meng, Y Bao - Resources, Conservation and …, 2023 - Elsevier
Existing machine learning-based approaches to investigate and design concrete mainly use
the mixture design variables to predict concrete properties and do not consider the …

On weighted k-mer dictionaries

GE Pibiri - Algorithms for Molecular Biology, 2023 - Springer
We consider the problem of representing a set of k-mers and their abundance counts, or
weights, in compressed space so that assessing membership and retrieving the weight of ak …

Parallel and external-memory construction of minimal perfect hash functions with PTHash

GE Pibiri, R Trani - IEEE Transactions on Knowledge and Data …, 2023 - ieeexplore.ieee.org
A function is a minimal perfect hash function for a set of size, if bijectively maps into the first
natural numbers. These functions are important for many practical applications in computing …

HyperEmbed: Tradeoffs between resources and performance in NLP tasks with hyperdimensional computing enabled embedding of n-gram statistics

P Alonso, K Shridhar, D Kleyko… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Recent advances in Deep Learning have led to a significant performance increase on
several NLP tasks, however, the models become more and more computationally …

DFSMN-T: 结合强语言模型Transformer 的中文语音识别.

胡章芳, 蹇芳, 唐珊珊, 明子平… - Journal of Computer …, 2022 - search.ebscohost.com
自动语音识别系统由声学模型和语言模型两部分构成, 但传统语言模型N-gram
存在忽略词条语义相似性, 参数过大等问题, 限制了语音识别字符错误率的进一步降低 …