Back close

Performance Optimization for Matrix Multiplication When Offloading Computation to GPU Devices

Publication Type : Conference Paper

Publisher : Springer Nature Singapore

Source : Lecture Notes in Networks and Systems

Url : https://doi.org/10.1007/978-981-96-3247-3_8

Campus : Bengaluru

School : School of Computing

Year : 2025

Abstract : With advancements in computer architecture and the widespread adoption of GPU devices, the matrix multiplication operation can now be efficiently offloaded to GPUs for enhanced performance through parallel processing. Matrix multiplication entails computing the product of input matrices composed of n rows and columns, where n denotes the size of the input matrices. This operation serves as a fundamental building block in recursive algorithms applied across various domains such as neural networks, graph problems, and machine learning. Therefore, optimizing the elapsed computation time for matrix multiplication is crucial. Work focuses on measuring the computational efficiency achieved by offloading matrix multiplication operations from CPU to GPU. The experimental work is implemented in C++ utilizing the CUDA API interface for GPU programming, compiled with nvcc compiler. The experiments are conducted using matrix sizes ranging from 8 to 8192 data elements. Furthermore, the study categorizes GPU activities based on computation time across available execution units and analyzes the time required for memory I/O operations with varying sizes of input matrices.

Cite this Research Publication : Yogesh Narayan Gaur, B. M. Beena, Manju Khanna, Performance Optimization for Matrix Multiplication When Offloading Computation to GPU Devices, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2025, https://doi.org/10.1007/978-981-96-3247-3_8

Admissions Apply Now