Computer Science – Distributed – Parallel – and Cluster Computing
Scientific paper
2010-11-16
Computer Science
Distributed, Parallel, and Cluster Computing
Scientific paper
Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including Permute, Reorder, Interlace/De-interlace, etc. We have also built kernels for generic Stencil computations on a two-dimensional data using templates and functors that allow application developers to rapidly build customized high performance kernels. All the kernels built achieve or surpass best-known performance in terms of bandwidth utilization.
Bader Michael
Bungartz Hans-Joachim
Mudigere Dheevatsa
Narasimhan Srihari
Narayanan Babu
No associations
LandOfFree
Fast GPGPU Data Rearrangement Kernels using CUDA does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Fast GPGPU Data Rearrangement Kernels using CUDA, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast GPGPU Data Rearrangement Kernels using CUDA will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-463566