Parallel computing for genome sequence processing

Y Zou, Y Zhu, Y Li, FX Wu, J Wang - Briefings in Bioinformatics, 2021 - academic.oup.com
The rapid increase of genome data brought by gene sequencing technologies poses a
massive challenge to data processing. To solve the problems caused by enormous data and …

Searching for repetitions in biological networks: methods, resources and tools

S Panni, SE Rombo - Briefings in bioinformatics, 2015 - academic.oup.com
We present here a compact overview of the data, models and methods proposed for the
analysis of biological networks based on the search for significant repetitions. In particular …

SIESTA: A scalable infrastructure of sequential pattern analysis

I Mavroudopoulos, A Gounaris - IEEE Transactions on Big Data, 2022 - ieeexplore.ieee.org
Sequential pattern analysis has become a mature topic with a lot of techniques for a variety
of sequential pattern mining-related problems. Moreover, tailored solutions for specific …

Pangenome comparison via ED strings

E Gabory, MN Mwaniki, N Pisanti, SP Pissis… - Frontiers in …, 2024 - frontiersin.org
Introduction An elastic-degenerate (ED) string is a sequence of sets of strings. It can also be
seen as a directed acyclic graph whose edges are labeled by strings. The notion of ED …

[PDF][PDF] Sequence detection in event log files.

I Mavroudopoulos, T Toliopoulos, C Bellas… - EDBT, 2021 - datalab-old.csd.auth.gr
Sequential pattern analysis has become a mature topic, with a lot of techniques for a variety
of sequential pattern mining-related problems. Moreover, tailored solutions for specific …

Parallel motif extraction from very long sequences

M Sahli, E Mansour, P Kalnis - Proceedings of the 22nd ACM …, 2013 - dl.acm.org
Motifs are frequent patterns used to identify biological functionality in genomic sequences,
periodicity in time series, or user trends in web logs. In contrast to a lot of existing work that …

ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

M Sahli, E Mansour, P Kalnis - The VLDB Journal, 2014 - Springer
Modern applications, including bioinformatics, time series, and web log analysis, require the
extraction of frequent patterns, called motifs, from one very long (ie, several gigabytes) …

Searching for compact hierarchical structures in DNA by means of the Smallest Grammar Problem

M Gallé - 2011 - theses.hal.science
Motivated by the goal of discovering hierarchical structures inside DNA sequences, we
address the Smallest Grammar Problem, the problem of finding a smallest context-free …

Pattern masking for dictionary matching: theory and practice

P Charalampopoulos, H Chen, P Christen, G Loukides… - Algorithmica, 2024 - Springer
Data masking is a common technique for sanitizing sensitive data maintained in database
systems which is becoming increasingly important in various application areas, such as in …

[HTML][HTML] Irredundant tandem motifs

L Parida, C Pizzi, SE Rombo - Theoretical Computer Science, 2014 - Elsevier
Eliminating the possible redundancy from a set of candidate motifs occurring in an input
string is fundamental in many applications. The existing techniques proposed to extract …