Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Online Resource
    Online Resource
    SAGE Publications ; 2010
    In:  The International Journal of High Performance Computing Applications Vol. 24, No. 4 ( 2010-11), p. 511-515
    In: The International Journal of High Performance Computing Applications, SAGE Publications, Vol. 24, No. 4 ( 2010-11), p. 511-515
    Abstract: We present an improved matrix—matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA). We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi’s new architectural features, most notably their extended memory hierarchy and memory sizes. The improved kernels run at up to 300 GFlop/s in double precision and up to 645 GFlop/s in single precision arithmetic (on a C2050), which is correspondingly 58% and 63% of the theoretical peak. We compare the improved kernels with the currently available version in CUBLAS 3.1. Further, we show the effect of the new kernels on higher-level dense linear algebra (DLA) routines such as the one-sided matrix factorizations, and compare their performances with corresponding, currently available routines running on homogeneous multicore systems.
    Type of Medium: Online Resource
    ISSN: 1094-3420 , 1741-2846
    Language: English
    Publisher: SAGE Publications
    Publication Date: 2010
    detail.hit.zdb_id: 2017480-9
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. Further information can be found on the KOBV privacy pages