Updated on 2024/12/10

写真a

 
MUKUNOKI Daichi
 
Organization
Information Technology Center High Performance Computing division Designated assistant professor
Title
Designated assistant professor
Contact information
メールアドレス
External link

Degree 1

  1. 博士(工学) ( 2013.11   筑波大学 ) 

Research Interests 7

  1. High performance computing

  2. Accurate computation

  3. Auto-tuning

  4. Numerical computation

  5. Reproducible computation

  6. Parallel Computing

  7. GPU computing

Research Areas 2

  1. Informatics / High performance computing

  2. Informatics / Computer system

Research History 13

  1. Nagoya University   Information Technology Center   Designated assistant professor

    2024.12

      More details

    Country:Japan

    researchmap

  2. Shibaura Institute of Technology   Temporary Technical Staff

    2024.4 - 2024.11

      More details

    Country:Japan

    researchmap

  3. Sony Interactive Entertainment Inc.   Sr. Software Engineer

    2023.11 - 2024.2

      More details

    Country:Japan

    researchmap

  4. RIKEN Center for Computational Science   Large-scale Parallel Numerical Computing Technology Research Team   Researcher

    2019.4 - 2023.10

      More details

    Country:Japan

    researchmap

  5. Tokyo Woman's Christian University   Graduate School of Science   Postdoctoral Research Fellow

    2017.10 - 2019.3

      More details

    Country:Japan

    researchmap

  6. RIKEN Advanced Institute of Computational Science   Architecture Development Team, Flagship 2020 Project   Postdoctoral Researcher

    2017.4 - 2017.9

      More details

    Country:Japan

    researchmap

  7. RIKEN Advanced Institute of Computational Science   Co-design Team, Flagship 2020 Project   Postdoctoral Researcher

    2015.5 - 2017.3

      More details

    Country:Japan

    researchmap

  8. RIKEN Advanced Institute for Computational Science   Large-scale Parallel Numerical Computing Technology Research Team, Research Division   Postdoctoral Researcher

    2014.6 - 2017.9

      More details

    Country:Japan

    researchmap

  9. Japan Society for the Promotion of Science   Research Fellow (PD)

    2013.12 - 2014.5

      More details

    Country:Japan

    researchmap

  10. Japan Society for the Promotion of Science   Research Fellow (DC2)

    2013.4 - 2013.11

      More details

    Country:Japan

    researchmap

  11. Information Technology Center, The University of Tokyo   Visiting Researcher

    2021.11 - 2023.3

      More details

    Country:Japan

    researchmap

  12. RIKEN Center for Computational Science   Architecture Development Team, Flagship 2020 Project   Visiting Researcher

    2017.10 - 2019.3

      More details

    Country:Japan

    researchmap

  13. RIKEN Advanced Institute of Computational Science   Large-scale Parallel Numerical Computing Technology Research Team, Research Division   Visiting Researcher

    2017.10 - 2019.3

      More details

    Country:Japan

    researchmap

▼display all

Education 4

  1. University of Tsukuba   Graduate School of Systems and Information Engineering

    2011.4 - 2013.11

      More details

    Country: Japan

    researchmap

  2. University of Tsukuba   Graduate School of Systems and Information Engineering

    2009.4 - 2011.3

      More details

    Country: Japan

    researchmap

  3. University of Tsukuba   School of Library and Information Science

    2006.4 - 2009.3

      More details

    Country: Japan

    researchmap

  4. Gifu National College of Technology

    2001.4 - 2006.3

      More details

    Country: Japan

    researchmap

Professional Memberships 2

  1. Information Processing Society of Japan

    2008

      More details

  2. Auto-Tuning Resarch Group

      More details

Committee Memberships 34

  1. The 15th International Conference on Parallel Processing & Applied Mathematics (PPAM 2024)   Program Committee Member  

    2024   

      More details

  2. Mini Symposium: Exploring Arithmetic and Data Representation Beyond the Standard in HPC (at ICIAM 2023)   Mini-Symposium Organizer  

    2023   

      More details

  3. 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2023)   Program Committee Member  

    2023   

      More details

  4. Special Session: Performance Optimization and Auto-Tuning of Software on Multicore/Manycore Systems (POAT 2023) (in conjunction with MCSoC-2023)   Program Chair  

    2023   

      More details

    Committee type:Academic society

    researchmap

  5. The 24th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2023) (in conjunction with IPDPS 2023)   Program Committee Member  

    2023   

      More details

  6. The 22nd International Conference on Computational Science (ICCS 2022)   Program Committee Member  

    2022   

      More details

  7. 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022)   Program Committee Member (Algorithm track)  

    2022   

      More details

  8. The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022)   Publicity Chair  

    2022   

      More details

  9. 自動チューニング研究会   幹事(交流促進委員会)  

    2021 - 2023   

      More details

    Committee type:Academic society

    researchmap

  10. 情報処理学会論文誌コンピューティングシステム   編集委員  

    2020 - 2024   

      More details

    Committee type:Academic society

    researchmap

  11. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20)   Research Poster Committee Member  

    2020   

      More details

  12. The 4th International Workshop on GPU Computing and AI (GCA'19) (in conjunction with CANDAR'19)   Program Committee Member  

    2019   

      More details

  13. The Fourteenth International Workshop on Automatic Performance Tuning (iWAPT2019) (in conjunction with IPDPS 2019)   Program Committee Member  

    2019   

      More details

  14. The 14th International Conference on Parallel Processing & Applied Mathematics (PPAM 2022)   Program Committee Member  

    2022   

      More details

  15. Special Session: Auto-Tuning for Multicore and GPU (ATMG2022) (in conjunction with MCSoC-2022)   Program Chair  

    2022   

      More details

  16. IEEE 22nd International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2021) (in conjunction with IPDPS 2021)   Program Committee Member  

    2021   

      More details

  17. Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020 January)   Program Committee Member  

    2020   

      More details

  18. The 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020) (in conjunction with IPDPS 2020)   Program Committee Member  

    2020   

      More details

  19. 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019)   Program Committee Member  

    2019   

      More details

  20. The 20th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2019) (in conjunction with IPDPS 2019)   Program Committee Member  

    2019   

      More details

  21. Mini Symposium: Development of Numerical Computing Software on Emerging Computing Platforms (at SIAM PP 18)   Mini-Symposium Organizer  

    2018   

      More details

  22. Special Session: Auto-Tuning for Multicore and GPU (ATMG 2018) (in conjunction with MCSoC-2018)   Program Committee Member  

    2018   

      More details

  23. The Thirteenth International Workshop on Automatic Performance Tuning (iWAPT2018) (in conjunction with IPDPS 2018)   Program Committee Member  

    2018   

      More details

  24. 2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2018)   Program Committee Member  

    2018   

      More details

  25. The 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018) (in conjunction with IPDPS 2018)   Program Committee Member  

    2018   

      More details

  26. The Third International Workshop on GPU Computing and AI (GCA'18) (in conjunction with CANDAR'18)   Program Committee Member  

    2018   

      More details

  27. The Second International Workshop on GPU Computing and AI (GCA'17) (in conjunction with CANDAR'17)   Program Committee Member  

    2017   

      More details

  28. The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017) (in conjunction with IPDPS 2017)   Program Committee Member  

    2017   

      More details

  29. The Twelfth International Workshop on Automatic Performance Tuning (iWAPT2017) (in conjunction with IPDPS 2017)   Program Committee Member  

    2017   

      More details

  30. Special Session: Auto-Tuning for Multicore and GPU (ATMG 2017) (in conjunction with MCSoC-17)   Program Committee Member  

    2017   

      More details

  31. The First International Workshop on GPU Computing and Applications (GCA'16) (in conjunction with CANDAR'16)   Program Committee Member  

    2016   

      More details

  32. The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016) (in conjunction with IPDPS 2016)   Program Committee Member  

    2016   

      More details

  33. The 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2015) (in conjunction with IPDPS 2015)   Program Committee Member  

    2015   

      More details

  34. The 15th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2014) (in conjunction with IPDPS 2014)   Program Committee Member  

    2014   

      More details

▼display all

Awards 9

  1. Best Paper Award

    2023.12   6th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2023)   Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor

    Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

     More details

    Award type:Award from international society, conference, symposium, etc. 

    researchmap

  2. Research Poster Award 2nd Place Winner

    2022.6   ISC High Performance 2022   A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers

    Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

     More details

  3. RIKEN Ohbu Award 2021

    2022.3  

     More details

  4. Research Poster Award

    2021.6   ISC High Performance 2021   Accurate Matrix Multiplication on Binary128 using Ozaki Scheme

    Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

     More details

  5. Best Research Poster Award

    2019.9   Russian Supercomputing Days   Accurate and Reproducible Linear Algebra Operations for Many-core Architectures

    Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

     More details

  6. PRACE-ISC Research Poster Award 2017

    2017.6   ISC High Performance 2017   Implementation & Evaluation of 2.5D Matrix Multiplication on K Computer

    Daichi Mukunoki, Toshiyuki Imamura

     More details

  7. IPSJ Yamashita SIG Research Award

    2016   Information Processing Society of Japan  

     More details

  8. IPSJ Computer Science Research Award for Young Scientists

    2013   Information Processing Society of Japan  

     More details

  9. Young Researcher Award

    2013   IPSJ Special Interest Group on System Architecture  

     More details

▼display all

 

Papers 54

  1. Extension of accurate numerical algorithms for matrix multiplication based on error-free transformation Reviewed

    Katsuhisa Ozaki, Daichi Mukunoki, Takeshi Ogita

    Japan Journal of Industrial and Applied Mathematics     2024.10

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Springer Science and Business Media LLC  

    DOI: 10.1007/s13160-024-00677-z

    researchmap

    Other Link: https://link.springer.com/article/10.1007/s13160-024-00677-z/fulltext.html

  2. Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix–Vector Product Reviewed

    Stef Graillat, Fabienne Jézéquel, Theo Mary, Roméo Molina, Daichi Mukunoki

    Lecture Notes in Computer Science   Vol. 14803   page: 17 - 30   2024.8

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer Nature Switzerland  

    DOI: 10.1007/978-3-031-69583-4_2

    researchmap

  3. Mixed-precision conjugate gradient algorithm using the groupwise update strategy Reviewed

    Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

    Japan Journal of Industrial and Applied Mathematics     2024.2

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Springer Science and Business Media LLC  

    DOI: 10.1007/s13160-024-00644-8

    researchmap

    Other Link: https://link.springer.com/article/10.1007/s13160-024-00644-8/fulltext.html

  4. Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor Reviewed

    Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

    2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)     page: 608 - 615   2023.12

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/mcsoc60832.2023.00094

    researchmap

  5. Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors Reviewed

    Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

    Parallel Processing and Applied Mathematics     page: 40 - 54   2023.4

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer International Publishing  

    DOI: 10.1007/978-3-031-30442-2_4

    researchmap

  6. Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs Reviewed

    Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

    Proc. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)     page: 234 - 241   2021.12

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  7. A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow. Reviewed

    Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

    Proc. The 2021 International Conference on Computational Science and Its Applications (ICCSA 2021), Lecture Notes in Computer Science   Vol. 12949   page: 95 - 110   2021.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-030-86653-2_7

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/iccsa/iccsa2021-1.html#HarayamaKMIT21

  8. Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme Reviewed

    Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

    Proc. The 50th International Conference on Parallel Processing (ICPP-2021)     2021.8

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  9. Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? Reviewed

    Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang 0001, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

    Proc. 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)     page: 1056 - 1065   2021.6

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/IPDPS49936.2021.00114

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ipps/ipdps2021.html#DomkeVDCO0SMPWM21

  10. Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme. Reviewed

    Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Roman Iakymchuk

    Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021)     page: 100 - 109   2021.1

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ACM  

    DOI: 10.1145/3432261.3432270

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2021.html#MukunokiOOI21

  11. Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results? Reviewed

    Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

    Proc. 13th International Workshop on Numerical Software Verification 2020 (NSV 20), Lecture Notes in Computer Science   Vol. 12549   page: 163 - 177   2020.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-030-63618-0_10

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/vstte/vstte2020.html#JezequelGMII20

  12. Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs. Reviewed

    Daichi Mukunoki, Takeshi Ogita

    J. Comput. Appl. Math.   Vol. 372   page: 112701 - 112701   2020.7

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:Elsevier {BV}  

    DOI: 10.1016/j.cam.2019.112701

    researchmap

  13. DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions Reviewed

    Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

    Proc. ISC High Performance 2020, Lecture Notes in Computer Science   Vol. 12151   page: 230 - 248   2020.6

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-030-50743-5_12

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/supercomputer/isc2020.html#MukunokiOOI20

  14. Design of an FPGA-Based Matrix Multiplier with Task Parallelism. Reviewed

    Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

    Proc. International Conference on Parallel Computing (ParCo2019), Parallel Computing: Technology Trends   Vol. 36   page: 241 - 250   2019

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IOS Press  

    DOI: 10.3233/APC200047

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/parco/parco2019.html#TanIM19

  15. Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures. Reviewed

    Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

    Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM2019), Lecture Notes in Computer Science   Vol. 12043   page: 516 - 527   2019

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-030-43229-4_44

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ppam/ppam2019-1.html#MukunokiOO19

  16. Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster. Reviewed

    Daichi Mukunoki, Toshiyuki Imamura

    Proc. International Conference on Computational Science (ICCS 2018), Lecture Notes in Computer Science   Vol. 10862   page: 853 - 858   2018.6

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-319-93713-7_85

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/iccS/iccS2018-3.html#MukunokiI18

  17. Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats. Reviewed

    Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, Masahiko Machida

    Proc. International Conference on Parallel Computing (ParCo2017), Advances in Parallel Computing     page: 97 - 106   2017.9

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IOS Press  

    DOI: 10.3233/978-1-61499-843-3-97

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/parco/parco2017.html#ImamuraMHYM17

  18. Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer. Reviewed

    Daichi Mukunoki, Toshiyuki Imamura

    Proc. 12th International Conference on Parallel Processing and Applied Mathematics (PPAM2017), Lecture Notes in Computer Science   Vol. 10777   page: 348 - 358   2017

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-319-78024-5_31

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ppam/ppam2017-1.html#MukunokiI17

  19. Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs. Reviewed

    Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

    Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16)     page: 377 - 384   2016

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

    DOI: 10.1109/MCSoC.2016.32

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/mcsoc/mcsoc2016.html#MukunokiIT16

  20. Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation. Reviewed

    Daichi Mukunoki, Toshiyuki Imamura

    Proc. IEEE International Conference on Cluster Computing (Cluster 2016)     page: 144 - 145   2016

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

    DOI: 10.1109/CLUSTER.2016.77

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/cluster/cluster2016.html#MukunokiI16

  21. Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs. Reviewed

    Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

    Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015)     page: 642 - 650   2015

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

    DOI: 10.1109/PDP.2015.66

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/pdp/pdp2015.html#MukunokiIT15

  22. Implementation and Evaluation of Triple and Quadruple Precision Floating-point Operations on GPUs Reviewed

      Vol. 6 ( 1 ) page: 66 - 77   2013.1

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese  

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1001/00089921/

  23. Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. Reviewed

    Daichi Mukunoki, Daisuke Takahashi

    Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science   Vol. 7975   page: 211 - 223   2013

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-642-39640-3_15

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/iccsa/iccsa2013-5.html#MukunokiT13

  24. Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs. Reviewed

    Daichi Mukunoki, Daisuke Takahashi

    Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science   Vol. 8384   page: 632 - 642   2013

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-642-55224-3_59

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ppam/ppam2013-1.html#MukunokiT13

  25. Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs. Reviewed

    Daichi Mukunoki, Daisuke Takahashi

    Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12)     page: 1378 - 1386   2012

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE Computer Society  

    DOI: 10.1109/IPDPSW.2012.175

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/ipps/ipdps2012w.html#MukunokiT12

  26. Implementation and Evaluation of Quadruple and Octuple Precision BLAS on GPUs Reviewed

      Vol. 2011 ( 2011 ) page: 148 - 156   2011.1

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese  

    researchmap

  27. Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs. Reviewed

    Daichi Mukunoki, Daisuke Takahashi

    Proc. 10th International Conference on Applied Parallel and Scientific Computing (PARA 2010), Part I, Lecture Notes in Computer Science   Vol. 7133   page: 249 - 259   2010

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Springer  

    DOI: 10.1007/978-3-642-28151-8_25

    researchmap

    Other Link: https://dblp.uni-trier.de/db/conf/para/para2010-1.html#MukunokiT10

  28. Performance Evaluation of Adaptive-Precision SpMV with Reduced-Precision Formats

    Stef Grailla, Fabienne Jézéquel, Théo Mary, Roméo Molina, Daichi Mukunoki

    HAL   Vol. hal-04261073   2023.10

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  29. White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.

    Roman Iakymchuk, Daichi Mukunoki, Artur Podobas, Fabienne Jézéquel, Toshiyuki Imamura, Norihisa Fujita, Jens Huthmann, Shuhei Kudo, Yiyu Tan, Jens Domke, Kai Torben Ohlhus, Takeshi Fukaya, Takeo Hoshi, Yuki Murakami, Maho Nakata, Takeshi Ogita, Kentaro Sano, Taisuke Boku

    CoRR   Vol. abs/2004.04628   2020.4

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Spe cifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/.

    arXiv

    researchmap

    Other Link: https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-04628

  30. GPUの単精度演算・Tensorコアを用いた行列積のエラーフリー変換

    尾崎克久, 椋木大地, 荻田武史

    日本応用数理学会年会講演予稿集(CD-ROM)   Vol. 2020   2020

  31. 尾崎スキームを用いたbinary128による4倍精度行列積

    椋木大地, 尾崎克久, 荻田武史

    日本応用数理学会年会講演予稿集(CD-ROM)   Vol. 2020   2020

  32. オーバー・アンダーフローを抑えた高精度かつ高速な2ノルム計算手法

    原山赳幸, 工藤周平, 椋木大地, 今村俊幸, 高橋大介

    情報処理学会研究報告(Web)   Vol. 2020 ( HPC-177 )   2020

  33. 尾崎スキームによる高精度かつ再現性のあるBLAS実装

    椋木大地, 荻田武史, 尾崎克久, 今村俊幸

    日本応用数理学会年会講演予稿集(CD-ROM)   Vol. 2019   2019

  34. Level-3BLASに基づく高精度行列積計算法による高精度かつ再現性のあるBLASルーチンの実装とその最適化

    椋木大地, 荻田武史, 尾崎克久

    情報処理学会研究報告(Web)   Vol. 2018 ( HPC-166 )   2018

  35. 京コンピュータにおける2.5次元アルゴリズムを用いた分散並列行列積の実装と評価

    椋木大地, 今村俊幸

    情報処理学会研究報告(Web)   Vol. 2017 ( HPC-159 )   2017

  36. KMATHLIB-High Performance and Scalable Numerical Library for the K Computer-

    大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸

    日本応用数理学会年会講演予稿集(CD-ROM)   Vol. 2016   2016

  37. 大規模並列計算機における連立一次方程式の精度保証付き数値計算に対する性能評価

    森倉悠介, 椋木大地, 深谷猛, 山中脩也, 大石進一

    情報処理学会研究報告(Web)   Vol. 2016 ( HPC-157 )   2016

  38. コンシューマレンジのGPUに最適化した固有値ソルバーの実装と評価

    今村俊幸, 椋木大地

    情報処理学会研究報告(Web)   Vol. 2016 ( HPC-157 )   2016

  39. CUDA-BLAS等の選択による最速GPU固有値ソルバーの性能評価

    今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

    情報処理学会研究報告(Web)   Vol. 2015 ( HPC-148 )   2015

  40. FFTを使った時間発展問題における累積誤差

    佐々成正, 山田進, 町田昌彦, 椋木大地, 今村俊幸

    日本応用数理学会年会講演予稿集(CD-ROM)   Vol. 2015   2015

  41. 短尺浮動小数点形式の検討

    椋木大地, 今村俊幸

    情報処理学会研究報告(Web)   Vol. 2015 ( HPC-152 )   2015

  42. 京・FX10における倍々精度演算の高速化

    佐々木信一, 菱沼利彰, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸

    情報処理学会研究報告(Web)   Vol. 2015 ( HPC-151 )   2015

  43. SYMV・GEMVルーチン群のマルチGPU化とその評価

    今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

    情報処理学会研究報告(Web)   Vol. 2015 ( HPC-151 )   2015

  44. NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法

    椋木大地, 今村俊幸, 高橋大介

    情報処理学会研究報告(Web)   Vol. 2015 ( HPC-150 )   2015

  45. NVIDIA GPUにおけるGEMVカーネルの自動チューニング

    椋木大地, 今村俊幸, 高橋大介

    計算工学講演会論文集(CD-ROM)   Vol. 20   2015

  46. CUDA-xSYMVの実装と評価

    今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

    情報処理学会研究報告(Web)   Vol. 2014 ( HPC-146 )   2014

  47. MaxwellアーキテクチャGPUにおける疑似倍精度演算を用いたDGEMMの実装と評価

    椋木大地, 今村俊幸

    情報処理学会研究報告(Web)   Vol. 2014 ( ARC-213 )   2014

  48. GPUにおける高速なCRS形式疎行列ベクトル積の実装

    椋木大地, 高橋大介

    研究報告ハイパフォーマンスコンピューティング(HPC)   Vol. 2013 ( 5 ) page: 1 - 7   2013.2

     More details

    Language:Japanese  

    疎行列ベクトル積 (SpMV) は科学技術計算において多用される重要な基本演算である.本稿では GPU における高速な CRS 形式 SpMV の実装について報告する.GPU として NVIDIA 社の Kepler アーキテクチャを対象とし,CUDA5.0 環境において実装を行った.従来の Fermi アーキテクチャまでの GPU を対象に提案されていた実装手法をベースに,Kepler アーキテクチャで新たにサポートされた機能や仕様変更を活用して,最適化を行った.Kepler アーキテクチャの Tesla K20 における性能評価では,CUDA5.0 に付属の cuSPARSE における CRS 形式の倍精度 SpMV ルーチンに対して,200 種類の行列において,平均で約 1.86 倍,177 種類の行列で性能向上を達成した.

    CiNii Books

    researchmap

  49. GPUにおける4倍精度浮動小数点演算を用いたクリロフ部分空間法の高速化

    椋木大地, 椋木大地, 高橋大介

    情報処理学会研究報告(Web)   Vol. 2013 ( HPC-140 )   2013

  50. GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価

    椋木大地, 高橋大介

    情報処理学会研究報告(CD-ROM)   Vol. 2012 ( 5 )   2013

  51. GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価

    椋木大地, 高橋大介

    研究報告ハイパフォーマンスコンピューティング(HPC)   Vol. 2012 ( 37 ) page: 1 - 8   2012.12

     More details

    Language:Japanese  

    疎行列の反復解法として用いられるクリロフ部分空間法は,丸め誤差の影響によって収束までの反復回数が増加したり,収束しなくなるケースがある.このような場合に高精度演算を用いることで収束性を改善できるケースがあることが報告されている.このとき,高精度演算を行うことによる1反復あたりの計算時間の増大に対して,反復回数の削減による計算時間の短縮効果が大きければ,求解までの計算時間を短縮できる可能性がある.我々は GPU (Tesla M2050) において Double-Double (DD) 演算による 4 倍精度を用いて,クリロフ部分空間法の一つである BiCGStab 法を実装し性能を評価した. GPU 上では 4 倍精度 BiCGStab 法の 1 反復あたりの計算時間が,倍精度の約 1.0-2.2 倍となり,反復回数の削減量によっては, 4 倍精度演算を用いることで求解までの計算時間を短縮できる場合が存在した.本稿では GPU 上の疎行列反復解法における 4 倍精度演算の性能と有効性について検討する.

    CiNii Books

    researchmap

  52. GPUによる3倍精度浮動小数点演算の検討

    椋木大地, 高橋大介

    情報処理学会研究報告(CD-ROM)   Vol. 2011 ( 4 )   2011

  53. GPUによる4倍精度BLASの実装と評価

    椋木大地, 高橋大介

    計算工学講演会論文集   Vol. 15 ( 2 )   2010

  54. Implementation and Evaluation of Quadruple Precision BLAS on GPU

    椋木大地, 高橋大介

    情報処理学会研究報告(CD-ROM)   Vol. 2009 ( 4 )   2009

▼display all

KAKENHI (Grants-in-Aid for Scientific Research) 6

  1. Development of Accurate and Validated Matrix Computation Software for Next Generation Supercomputers

    Grant number:20KK0259  2022.4 - 2023.10

    Japan Society for the Promotion of Science (JSPS)  Fund for the Promotion of Joint International Research (Fostering Joint International Research (A))  Fund for the Promotion of Joint International Research (Fostering Joint International Research (A))

    Daichi Mukunoki

      More details

    Authorship:Principal investigator 

    Grant amount:\9230000 ( Direct Cost: \7100000 、 Indirect Cost:\2130000 )

    researchmap

  2. 超並列計算環境のための高精度かつ再現性のある行列計算ライブラリの開発

    Grant number:19K20286  2019.4 - 2022.3

    日本学術振興会  科学研究費助成事業 若手研究  若手研究

    椋木 大地

      More details

    本研究の目的は数値計算における計算の高精度化および再現性の保証を実現し,かつ最先端の超並列計算機アーキテクチャにおいて高性能を実現できるBLASライブラリの開発を行うことである.本研究では4つの手法:(1)尾崎スキーム,(2)ExBLASスキーム,(3)DotKスキーム,(4)CADNAスキームに着目し,このうち(1)を主たる手法として検討する.
    2019年度は主として(1)(4)に関する進捗が得られた.(1)に関してはCPU・GPU向けのBLASの基本ルーチンを開発し,オープンソースソフトウェアとして公開した.またこれらに関する査読付き論文を国際学会(PPAM2019)において発表した.さらにその応用として,疎行列反復解法(CG法)への適用,FP16の活用に関する研究を前倒しして実施した(これらは当初2021年度の実施を予定していた).このうち後者については,FP16/32の混合精度ハードウェアであるTensor Coresを活用して高速に高精度・再現性のある実装を行う方法を開発し,査読付き論文が国際学会(ISC2020)に採択された.また(4)CADNAスキームについては,その開発元であり共同研究を進めているソルボンヌ大学側で新しい手法が考案され,共著者として参加した論文を国際学会に投稿した(プレプリント公開済み,現在査読中).
    一方,計算結果の精度を担保しながら数値計算に用いられる演算精度を最適化して計算の高速化,省電力化を実現する方法の研究を開始した.本科研費課題で取り組む上記(1)-(4)の手法はその要素技術となりうるため,本研究の応用として位置付けられる.これに関しては本年度は国際会議(SC19)での査読付きポスター発表を行った.

    researchmap

  3. Reduced-precision formats for high-performance and energy-efficient computations

    Grant number:16K16062  2016.4 - 2019.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B)  Grant-in-Aid for Young Scientists (B)

    Mukunoki Daichi

      More details

    This study explored the possibility of reduced-precision formats which have shorter bit length against the IEEE 32/64 bit floating-point for enchance the performance of numerical computations in terms of both computation speed and energy efficiency. We proposed a light-weight implementation of reduced-precision formats on software and demonstrated the performance improvement, in terms of both speed and energy efficiency, on some data-intensive operations on basic linear algebra.

    researchmap

  4. GPUスパコンのための3倍・4倍精度線形演算ライブラリの開発に関する研究

    Grant number:13J01290  2013.4 - 2015.3

    日本学術振興会  科学研究費助成事業 特別研究員奨励費  特別研究員奨励費

    椋木 大地

      More details

    本研究の目的は,GPUスパコンにおける3倍・4倍精度演算の実用化を目的として,GPUにおける高性能な3倍・4倍精度線形計算ライブラリの実現に向けた基礎研究を行うことであった.本年度は主として,GPUにおける複数の演算精度に対応した線形計算ライブラリの効率的な実装手法に関する研究を行った,その結果として,複数のNVIDIA GPUアーキテクチャに対応した高速な行列ベクトル積ルーチン(GEMV)の実装手法を開発した.本実装ではGPUにおけるプログラムの実行メカニズムをモデル化し,実行効率が最大となるようなスレッドブロックサイズを自動的に決定するオンライン自動チューニングを採用する.これにより既存の実装と比べ,実行環境や問題サイズに依存して生じる性能の変動を防ぎ,常に高い性能を維持できる.本手法は,ある線形計算を行うプログラム(例えばBLASルーチンなど)において演算精度が異なる複数バージョンを実装・最適化する上で有効であると考えられる.またこの他に,4倍精度演算手法の応用として,倍精度演算性能が単精度演算性能の1/32であるNVIDIA社の最新GPUにおいて,ソフトウェアエミュレーションによる疑似倍精度演算を実装し,倍精度行列積ルーチン(DGEMM)においてハードウェア処理による実装を上回る性能が得られることを示した.本年度に開発したGPU向けソフトウェアの一部は,オープンソースのライブラリとしてウェブ上で公開しており,今後も開発を継続する予定である.

    researchmap

  5. Research on high-performance and high-dimensional numerical linear algebra applying an asynchronous task mechanism on the exascale computing era

    Grant number:19H04127  2019.4 - 2022.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

      More details

    Authorship:Coinvestigator(s) 

    researchmap

  6. Theory and Application of Scalable Numerical Software on an O(100M) core environment

    Grant number:15H02709  2015.4 - 2018.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)  Grant-in-Aid for Scientific Research (B)

    IMAMURA Toshiyuki, YAMAMOTO Yusaku, Todo Shinji

      More details

    This research project aims to realize high performance numerical services investigated in the past based on new mathematical principles in the emerging computing system where tens of thousands to hundreds of millions of processing cores are installed. Giving two important themes, `Mixed-granularity numerical kernel' and `Asynchronous numerical algorithm,' we conducted; i) the research on the theory of asynchronous numerical algorithms. Also avoidance of communication and synchronization at a practical level, then CAHTR and a new method for the FDTD scheme were proposed. Furthermore, we have practiced; ii) promoting research on core numerical infrastructure technologies such as automatic tuning for scalable, lightweight code generation at super-many-core, and promoting innovative research leading to the next generation numerical calculation software.

    researchmap

▼display all

 

Teaching Experience (Off-campus) 1

  1. 情報処理技法(リテラシ)II

    2018.9 - 2019.1 Tokyo Woman's Christian University)

     More details