Limitless—light-weight monitoring tool for large scale systems

A Cascajo, DE Singh, J Carretero - Microprocessors and Microsystems, 2022 - Elsevier
This work presents LIMITLESS, a HPC framework that provides new strategies for
monitoring clusters. LIMITLESS is a scalable light-weight monitor that is integrated with other …

On scalability for mpi runtime systems

G Bosilca, T Herault, A Rezmerita… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
The future of high performance computing, as being currently foretold, will gravitate toward
hundreds of thousands to million node machines, harnessing the computing power of …

Scalable failure recovery for high-performance data aggregation

DC Arnold, BP Miller - 2010 IEEE International Symposium on …, 2010 - ieeexplore.ieee.org
Many high-performance tools, applications and infrastructures, such as Paradyn, STAT,
TAU, Ganglia, SuperMon, Astrolabe, Borealis, and MRNet, use data aggregation to …

TAUmon: scalable online performance data analysis in TAU

CW Lee, AD Malony, A Morris - … HPCF, PROPER, CCPI, VHPC, Ischia, Italy …, 2011 - Springer
In this paper, we present an update on the scalable online support for performance data
analysis and monitoring in TAU. Extending on our prior work with TAUoverSupermon and …

Group file operations for scalable tools and middleware

MJ Brim, BP Miller - 2009 International Conference on High …, 2009 - ieeexplore.ieee.org
Group file operations are a new, intuitive idiom for tools and middleware-including parallel
debuggers and runtimes, performance measurement and steering, and distributed resource …

[PDF][PDF] Reliable, scalable tree-based overlay networks

DC Arnold - 2008 - ftp1.cs.wisc.edu
Ultimately, our Creator is responsible for all. I was fortunate to have an excellent and strong
support system throughout this process. With great pleasure and extreme gratitude, I thank …

Estimating the size of peer-to-peer networks using lambert's w function

J Bustos-Jimenez, N Bersano, SE Schaeffer… - Grid Computing …, 2008 - Springer
In this work, we address the problem of locally estimating the size of a Peerto-Peer (P2P)
network using local information. We present a novel approach for estimating the size of a …

[PDF][PDF] University

PM Michael, AM Texas - United States, 2008 - cs.wisc.edu
Group file operations are a new, intuitive idiom for tools and middleware-including parallel
debuggers and runtimes, performance measurement and steering, and distributed resource …

A scalable prescriptive parallel debugging model

NB Jensen, NQ Nielsen, GL Lee… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
Debugging is a critical step in the development of any parallel program. However, the
traditional interactive debugging model, where users manually step through code and …

MATE: toward scalable automated and dynamic performance tuning environment

A Morajko, A Martínez, E César, T Margalef… - Applied Parallel and …, 2012 - Springer
The use of parallel/distributed programming increases as it enables high performance
computing. There are many tools that help a user in the performance analysis of the …