









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A problem statement for accelerating computations using gpus, openmp, and mpi for an n-body simulation. An algorithm for calculating new positions of vortices, finding the maximum magnitude of angular velocity, and assigning new positions and angular velocities to vortices that are out of the box. The document also discusses memory allocation and handling, as well as future improvements and references.
What you will learn
Typology: Slides
1 / 15
This page cannot be seen from the preview
Don't miss anything!
start create 1D array of floats Allocate memory in device cudaMalloc() initialize array randomly uniform distributed value copying initalized array from host to dvice cudaMemcpy()
Parallelization to the problem (Continued...)
Parallel Code : Number Blocks : 700 ; Number of threads in each block = 700 radiika = powf (V[(bx * 3) + 0] - V[(tx * 3) + 0], 2) + powf ( V[(bx * 3) + 1] - V[(tx * 3) + 1], 2 ) + powf ( V[(bx * 3) + 2] - V[(tx * 3) + 2], 2 ); dssss_dr = expf (-((3.1416f * 2.0f) / (S[tx]* S[tx] ) ) )*expf ( ( -radiika ) * ( ( 3.1416f * 2.0f ) / ( S[tx]
Block 1 (bx = 1)
*float V; Size = 2100 Block 0 (bx = 0) Block 0 (bx = 0) *float V; size = 2100 Block 1 (bx = 1)
if (tx < 3) { VN[(bx * 3) + tx] = V[(bx * 3) + tx] + 0.01f * Vc[tx]; domdt[tx] = dVx[0] * O[bx * 3 + 0] + dVx[1] * O[bx * 3 + 1] + dVx[2] * O[bx * 3 + 2]; ON[bx * 3 + tx] = O[bx * 3 + tx] + domdt[tx] * 0.01f; }
Tx = 0 Tx = 1 Tx = 2 Tx = 0 Tx = 1 Tx = 2 Tx = 0^ Tx = 1^ Tx = 2^ Tx = 0 Tx = 1 Tx = 2 Tx = 0 Tx = 1 Tx = 2 Tx = 0 Tx = 1 Tx = 2
Tesla M
Sequential V/s Parallel code Percentage difference:- 57%
Future Improvements