Programs
- M. Tech. in Automotive Engineering -Postgraduate
- B. Tech. in Computer Science and Engineering (Quantum Computing) 4 Years -Undergraduate
Publication Type : Conference Paper
Publisher : Springer Nature Singapore
Source : Lecture Notes in Networks and Systems
Url : https://doi.org/10.1007/978-981-96-3247-3_8
Campus : Bengaluru
School : School of Computing
Year : 2025
Abstract : With advancements in computer architecture and the widespread adoption of GPU devices, the matrix multiplication operation can now be efficiently offloaded to GPUs for enhanced performance through parallel processing. Matrix multiplication entails computing the product of input matrices composed of n rows and columns, where n denotes the size of the input matrices. This operation serves as a fundamental building block in recursive algorithms applied across various domains such as neural networks, graph problems, and machine learning. Therefore, optimizing the elapsed computation time for matrix multiplication is crucial. Work focuses on measuring the computational efficiency achieved by offloading matrix multiplication operations from CPU to GPU. The experimental work is implemented in C++ utilizing the CUDA API interface for GPU programming, compiled with nvcc compiler. The experiments are conducted using matrix sizes ranging from 8 to 8192 data elements. Furthermore, the study categorizes GPU activities based on computation time across available execution units and analyzes the time required for memory I/O operations with varying sizes of input matrices.
Cite this Research Publication : Yogesh Narayan Gaur, B. M. Beena, Manju Khanna, Performance Optimization for Matrix Multiplication When Offloading Computation to GPU Devices, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-3247-3_8