Computational fluid dynamics (CFD) is an important tool for the simulation of the cardiovascular function and dysfunction. Due to the complexity of the anatomy, the transitional regime of blood flow in the heart, and the strong mutual influence between the ...
Computer simulations of experimentally comparable system sizes in soft matter often require considerable elapsed times. The use of many cores can reduce the needed time, ideally proportionally to the number of processors. In this paper a parallel computati ...
The increased number of cores integrated on a chip has brought about a number of challenges. Concerns about the scalability of cache coherence protocols have urged both researchers and practitioners to explore alternative programming models, where cache co ...
The advent of manycore architectures raises new scalability challenges for concurrent applications. Implementing scalable data structures is one of them. Several manycore architectures provide hardware message passing as a means to efficiently exchange dat ...
The constant increase in single core frequency reached a plateau during recent years since the produced heat inside the chip cannot be cooled down by existing technologies anymore. An alternative to harvest more computational power per die is to fabricate ...
Appearing frequently in applications, generalized eigenvalue problems represent one of the core problems in numerical linear algebra. The QZ algorithm of Moler and Stewart is the most widely used algorithm for addressing such problems. Despite its importan ...
This paper introduces SIMinG-1k-a manycore simulator infrastructure. SIMinG-1k is a graphics processing unit accelerated, parallel simulator for design-space exploration of large-scale manycore systems. It features an optimal trade-off between modeling acc ...
This paper focuses on the design of an asynchronous broadcast primitive on the Intel SCC. Our solution is based on OC-Bcast, a state-of-the-art k-ary tree synchronous broadcast algorithm that leverages the parallelism provided by on-chip Remote Memory Acce ...
At the heart of distributed computing lies the fundamental result that the level of agreement that can be obtained in an asynchronous shared memory model where t processes can crash is exactly t + 1. In other words, an adversary that can crash any subset o ...