Faculty Profiles - MUKUNOKI Daichi

写真a

MUKUNOKI Daichi

Organization

Information Technology Center Assistant Professor

Graduate School

Graduate School of Informatics

Contact information

Homepage

https://mukunoki.github.io/index_en.html

External link

Degree 3

博士（工学）（ 2013.11 筑波大学）
修士（工学）（ 2011.3 筑波大学）
学士（図書館情報学）（ 2009.3 筑波大学）

To the head of Degree.▲

Research Interests 9

High performance computing
Accurate computation
Auto-tuning
Numerical computation
Reproducible computation
Parallel Computing
GPU computing
Mixed precision computation
Large Language Models

To the head of Research Interests.▲

Research Areas 2

Informatics / High-performance computing
Informatics / Computer systems

To the head of Research Areas.▲

Research History 19

Nagoya University Information Technology Center Assistant Professor

2025.4

　 More details

Country：Japan

researchmap
Nagoya University Information Technology Center Assistant Professor

2024.12 - 2025.3

　 More details

Country：Japan

researchmap
Shibaura Institute of Technology Temporary Technical Staff

2024.4 - 2024.10

　 More details

Country：Japan

researchmap
Sony Interactive Entertainment Inc. Sr. Software Engineer

2023.11 - 2024.2

　 More details

Country：Japan

researchmap
Information Technology Center, The University of Tokyo Visiting Researcher

2021.11 - 2023.3

　 More details

Country：Japan

researchmap
RIKEN Center for Computational Science Large-scale Parallel Numerical Computing Technology Research Team Research Scientist

2019.4 - 2023.10

　 More details

Country：Japan

researchmap
RIKEN Center for Computational Science Research Scientist

2019.4 - 2021.3

　 More details

Country：Japan

researchmap
RIKEN Center for Computational Science Large Scale Parallel Computation Technology Research Team Visiting Researcher

2018.4 - 2019.3

　 More details

Country：Japan

researchmap
RIKEN Center for Computational Science Visiting Researcher

2018.4 - 2019.3

　 More details

Country：Japan

researchmap
Tokyo Woman's Christian University Graduate School of Science Postdoctoral Research Fellow

2017.10 - 2019.3

　 More details

Country：Japan

researchmap
RIKEN Center for Computational Science Architecture Development Team, Flagship 2020 Project Visiting Researcher

2017.10 - 2018.3

　 More details

Country：Japan

researchmap
RIKEN Advanced Institute of Computational Science Large-scale Parallel Numerical Computing Technology Research Team, Research Division Visiting Researcher

2017.10 - 2018.3

　 More details

Country：Japan

researchmap
RIKEN Advanced Institute of Computational Science Architecture Development Team, Flagship 2020 Project Postdoctoral Researcher

2017.4 - 2017.9

　 More details

Country：Japan

researchmap
RIKEN Advanced Institute for Computational Science Postdoctoral Researcher

2016.4 - 2017.3

　 More details

Country：Japan

researchmap
RIKEN Advanced Institute of Computational Science Co-design Team, Flagship 2020 Project Postdoctoral Researcher

2015.5 - 2016.3

　 More details

Country：Japan

researchmap
RIKEN Advanced Institute for Computational Science Large-scale Parallel Numerical Computing Technology Research Team, Research Division Postdoctoral Researcher

2014.6 - 2017.9

　 More details

Country：Japan

researchmap
Japan Society for the Promotion of Science Research Fellow (PD)

2013.12 - 2014.5

　 More details

Country：Japan

researchmap
Japan Society for the Promotion of Science Research Fellow (DC2)

2013.4 - 2013.11

　 More details

Country：Japan

researchmap
Nagoya University Information Technology Center Assistant Professor

2025.11

　 More details

researchmap

▼display all

To the head of Research History.▲

Education 4

University of Tsukuba Graduate School of Systems and Information Engineering Doctoral Program in Computer Science

2011.4 - 2013.11

　 More details

Country： Japan

researchmap
University of Tsukuba Graduate School of Systems and Information Engineering Master's Program in Computer Science

2009.4 - 2011.3

　 More details

Country： Japan

researchmap
University of Tsukuba School of Library and Information Science

2006.4 - 2009.3

　 More details

Country： Japan

researchmap
Gifu National College of Technology

2001.4 - 2006.3

　 More details

Country： Japan

researchmap

To the head of Education.▲

Professional Memberships 4

日本医用画像工学会

2025.8

　 More details

researchmap
Association for Computing Machinery (ACM)

2025

　 More details

researchmap
Information Processing Society of Japan

2008

　 More details

researchmap
Auto-Tuning Resarch Group

　 More details

researchmap

To the head of Professional Memberships.▲

Committee Memberships 40

The 1st International Workshop on Agentic AI for HPC (AgenticAI4HPC 2026) Co-Chair

2026

　 More details

Committee type：Academic society

researchmap
The 15th International Conference on Parallel Processing & Applied Mathematics (PPAM 2024) Program Committee Member

2024

　 More details

researchmap
Mini Symposium: Exploring Arithmetic and Data Representation Beyond the Standard in HPC (at ICIAM 2023) Mini-Symposium Organizer

2023

　 More details

researchmap
The 24th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2023) (in conjunction with IPDPS 2023) Program Committee Member

2023

　 More details

researchmap
2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2023) Program Committee Member

2023

　 More details

researchmap
Special Session: Performance Optimization and Auto-Tuning of Software on Multicore/Manycore Systems (POAT 2023) (in conjunction with MCSoC-2023) Program Chair

2023

　 More details

Committee type：Academic society

researchmap
The 22nd International Conference on Computational Science (ICCS 2022) Program Committee Member

2022

　 More details

researchmap
The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022) Publicity Chair

2022

　 More details

researchmap
36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022) Program Committee Member (Algorithm track)

2022

　 More details

researchmap
自動チューニング研究会幹事（交流促進委員会）

2021 - 2023

　 More details

Committee type：Academic society

researchmap
情報処理学会論文誌コンピューティングシステム編集委員

2020 - 2024

　 More details

Committee type：Academic society

researchmap
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20) Research Poster Committee Member

2020

　 More details

researchmap
The 4th International Workshop on GPU Computing and AI (GCA'19) (in conjunction with CANDAR'19) Program Committee Member

2019

　 More details

researchmap
The Fourteenth International Workshop on Automatic Performance Tuning (iWAPT2019) (in conjunction with IPDPS 2019) Program Committee Member

2019

　 More details

researchmap
The 16th International Conference on Parallel Processing & Applied Mathematics (PPAM 2026) Program Committee Member

2026.9

　 More details

Committee type：Academic society

researchmap
The 2nd International Workshop on Foundational Large Language Models Advances for HPC (LLM4HPC 2026) Program Committee Member

2026.6

　 More details

Committee type：Academic society

researchmap
The International Conference on High Performance Computing in Asia-Pacific Region 2026 (HPCAsia2026) Poster Chair

2026.1

　 More details

Committee type：Academic society

researchmap
The 28th Workshop on Advances in Parallel and Distributed Computational Models (APDCM2026) Program Committee Member

2026

　 More details

Committee type：Academic society

researchmap
自動チューニング研究会研究推進委員

2025

　 More details

researchmap
The 14th International Conference on Parallel Processing & Applied Mathematics (PPAM 2022) Program Committee Member

2022

　 More details

researchmap
Special Session: Auto-Tuning for Multicore and GPU (ATMG2022) (in conjunction with MCSoC-2022) Program Chair

2022

　 More details

researchmap
IEEE 22nd International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2021) (in conjunction with IPDPS 2021) Program Committee Member

2021

　 More details

researchmap
Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020 January) Program Committee Member

2020

　 More details

researchmap
The 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020) (in conjunction with IPDPS 2020) Program Committee Member

2020

　 More details

researchmap
2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) Program Committee Member

2019

　 More details

researchmap
The 20th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2019) (in conjunction with IPDPS 2019) Program Committee Member

2019

　 More details

researchmap
Mini Symposium: Development of Numerical Computing Software on Emerging Computing Platforms (at SIAM PP 18) Mini-Symposium Organizer

2018

　 More details

researchmap
2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2018) Program Committee Member

2018

　 More details

researchmap
The 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018) (in conjunction with IPDPS 2018) Program Committee Member

2018

　 More details

researchmap
The Third International Workshop on GPU Computing and AI (GCA'18) (in conjunction with CANDAR'18) Program Committee Member

2018

　 More details

researchmap
The Thirteenth International Workshop on Automatic Performance Tuning (iWAPT2018) (in conjunction with IPDPS 2018) Program Committee Member

2018

　 More details

researchmap
Special Session: Auto-Tuning for Multicore and GPU (ATMG 2018) (in conjunction with MCSoC-2018) Program Committee Member

2018

　 More details

researchmap
The Second International Workshop on GPU Computing and AI (GCA'17) (in conjunction with CANDAR'17) Program Committee Member

2017

　 More details

researchmap
Special Session: Auto-Tuning for Multicore and GPU (ATMG 2017) (in conjunction with MCSoC-17) Program Committee Member

2017

　 More details

researchmap
The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017) (in conjunction with IPDPS 2017) Program Committee Member

2017

　 More details

researchmap
The Twelfth International Workshop on Automatic Performance Tuning (iWAPT2017) (in conjunction with IPDPS 2017) Program Committee Member

2017

　 More details

researchmap
The First International Workshop on GPU Computing and Applications (GCA'16) (in conjunction with CANDAR'16) Program Committee Member

2016

　 More details

researchmap
The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016) (in conjunction with IPDPS 2016) Program Committee Member

2016

　 More details

researchmap
The 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2015) (in conjunction with IPDPS 2015) Program Committee Member

2015

　 More details

researchmap
The 15th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2014) (in conjunction with IPDPS 2014) Program Committee Member

2014

　 More details

researchmap

▼display all

To the head of Committee Memberships.▲

Awards 10

Best Paper Award

2026.1 The 1st International Workshop on Foundational Large Language Models Advances for HPC in Asia (LLM4HPCAsia 2026) Evaluating Claude Code's Coding and Test Automation for GPU Acceleration of a Legacy Fortran Application: A GeoFEM Case Study

Tetsuya Hoshino, Shun-Ichiro Hayashi, Daichi Mukunoki, Takahiro Katagiri, Toshihiro Hanawa

　More details

researchmap
Best Paper Award

2023.12 6th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2023) Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

　More details

Award type：Award from international society, conference, symposium, etc.

researchmap
Research Poster Award 2nd Place Winner

2022.6 ISC High Performance 2022 A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

　More details

researchmap
RIKEN Ohbu Award 2021

2022.3

　More details

researchmap
Research Poster Award

2021.6 ISC High Performance 2021 Accurate Matrix Multiplication on Binary128 using Ozaki Scheme

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

　More details

researchmap
Best Research Poster Award

2019.9 Russian Supercomputing Days Accurate and Reproducible Linear Algebra Operations for Many-core Architectures

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

　More details

researchmap
PRACE-ISC Research Poster Award 2017

2017.6 ISC High Performance 2017 Implementation & Evaluation of 2.5D Matrix Multiplication on K Computer

Daichi Mukunoki, Toshiyuki Imamura

　More details

researchmap
IPSJ Yamashita SIG Research Award

2016 Information Processing Society of Japan

　More details

researchmap
IPSJ Computer Science Research Award for Young Scientists

2013 Information Processing Society of Japan

　More details

researchmap
Young Researcher Award

2013 IPSJ Special Interest Group on System Architecture

　More details

researchmap

▼display all

To the head of Awards.▲

Papers 87

Performance Evaluation of Loop Body Splitting for Fast Modal Filtering in SCALE-DG on A64FX Reviewed Open Access

Xuanzhengbo Ren, Yuta Kawai, Hirofumi Tomita, Seiya Nishizawa, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai, Toru Nagai

Proceedings of the 2025 International Conference on High Performance Computing in Asia-Pacific Region Workshops page： 36 - 44 2025.2

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3703001.3724385

Open Access

researchmap
Performance evaluation and modelling of single-precision matrix multiplication on Cerebras CS-2 Reviewed

Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis page： 727 - 731 2024.11

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/scw63240.2024.00101

researchmap
Evaluating Claude Code's Coding and Test Automation for GPU Acceleration ofa Legacy Fortran Application: A GeoFEM Case Study. Reviewed

Tetsuya Hoshino, Shun-ichiro Hayashi, Daichi Mukunoki, Takahiro Katagiri, Toshihiro Hanawa

Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - The 1st International Workshop on Foundational Large Language Models Advances for HPC in Asia (LLM4HPCAsia 2026) page： 353 - 360 2026.1

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3784828.3785335

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2026w.html#HoshinoHMKH26
Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms. Reviewed

Daichi Mukunoki, Katsuhisa Ozaki

Proc. 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2025) page： 33 - 40 2025.12

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/MCSoC67473.2025.00016

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/mcsoc/mcsoc2025.html#MukunokiO25
DGEMM without FP64 Arithmetic - Using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme Reviewed Open Access

Daichi Mukunoki

Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - ExHET'26: The Fifth International Workshop on Extreme Heterogeneity Solutions Vol. abs/2508.00441 page： 303 - 311 2025.8

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

As the demand for AI computation rapidly increases, more hardware is being developed to efficiently perform the low-precision matrix multiplications required by such workloads. However, these operations are generally not directly applicable to scientific computations due to accuracy requirements. The Ozaki scheme - an accurate matrix multiplication method proposed by Ozaki et al. in 2012 - enables FP64 matrix multiplication (DGEMM) using low-precision matrix multiplication units, such as FP16 Tensor Cores. This approach has since been extended to utilize integer arithmetic, offering lower computational cost compared to floating-point-based implementations. In fact, it has achieved higher performance than hardware FP64 operations on GPUs equipped with fast INT8 Tensor Cores designed for AI workloads. However, recent AI-oriented processors trends have shifted toward improving the performance of low-precision floating-point operations, such as FP8, rather than integer operations. Motivated by this shift, this study revisits the use of low-precision floating-point operations in the Ozaki scheme. Specifically, we explore the use of FP8 Tensor Cores. In addition, for processors that support very slow or no hardware-based FP64 operations, we also consider FP64 arithmetic emulation based on integer arithmetic. This completely eliminates hardware FP64 instructions. Furthermore, we explore the use of blocking in the inner-product dimension to accelerate FP16-based implementations. We demonstrate the effectiveness of these methods by evaluating the performance on an NVIDIA RTX Blackwell architecture GPU.

DOI： 10.1145/3784828.3785017

Open Access

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2508.00441v3
An Algorithm Portfolio Approach for Parameter Tuning in Coherent Ising Machines. Reviewed

Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

Proc. 2025 Thirteenth International Symposium on Computing and Networking Workshops (CANDARW) - 17th International Workshop on Parallel and Distributed Algorithms and Applications (PDAA 2025) page： 142 - 148 2025

　More details

Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/CANDARW68385.2025.00032

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/candar/candar2025w.html#HanyuKMH25
Extension of accurate numerical algorithms for matrix multiplication based on error-free transformation Reviewed

Katsuhisa Ozaki, Daichi Mukunoki, Takeshi Ogita

Japan Journal of Industrial and Applied Mathematics Vol. 42 ( 1 ) page： 1 - 20 2024.10

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s13160-024-00677-z

researchmap

Other Link： https://link.springer.com/article/10.1007/s13160-024-00677-z/fulltext.html
Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix–Vector Product Reviewed Open Access

Stef Graillat, Fabienne Jézéquel, Theo Mary, Roméo Molina, Daichi Mukunoki

Lecture Notes in Computer Science Vol. 14803 page： 17 - 30 2024.8

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer Nature Switzerland

DOI： 10.1007/978-3-031-69583-4_2

researchmap
Mixed-precision conjugate gradient algorithm using the groupwise update strategy Reviewed

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

Japan Journal of Industrial and Applied Mathematics 2024.2

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Springer Science and Business Media LLC

DOI： 10.1007/s13160-024-00644-8

researchmap

Other Link： https://link.springer.com/article/10.1007/s13160-024-00644-8/fulltext.html
Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor Reviewed

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) page： 608 - 615 2023.12

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

DOI： 10.1109/mcsoc60832.2023.00094

researchmap
Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors Reviewed

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Parallel Processing and Applied Mathematics page： 40 - 54 2023.4

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer International Publishing

DOI： 10.1007/978-3-031-30442-2_4

researchmap
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs Reviewed

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

Proc. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) page： 234 - 241 2021.12

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

researchmap
A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow. Reviewed

Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

Proc. The 2021 International Conference on Computational Science and Its Applications (ICCSA 2021), Lecture Notes in Computer Science Vol. 12949 page： 95 - 110 2021.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-030-86653-2_7

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iccsa/iccsa2021-1.html#HarayamaKMIT21
Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme Reviewed Open Access

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Proc. The 50th International Conference on Parallel Processing (ICPP-2021) ( 78 ) page： 1 - 11 2021.8

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings)

DOI： 10.1145/3472456.3472493

Open Access

researchmap
Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme. Reviewed Open Access

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Roman Iakymchuk

Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021) page： 100 - 109 2021.1

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：ACM

DOI： 10.1145/3432261.3432270

Open Access

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2021.html#MukunokiOOI21
Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results? Reviewed

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Proc. 13th International Workshop on Numerical Software Verification 2020 (NSV 20), Lecture Notes in Computer Science Vol. 12549 page： 163 - 177 2020.12

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-030-63618-0_10

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/vstte/vstte2020.html#JezequelGMII20
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws? Reviewed

Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

Proc. 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021) page： 1056 - 1065 2020.10

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced from the No.1 benchmark in supercomputing, namely High Performance Linpack, one would expect an awakened enthusiasm by the HPC community, too.
Hence, our goal is to identify the practical added benefits for HPC and machine learning applications by having access to matrix engines. For this purpose, we perform an in-depth survey of software stacks, proxy applications and benchmarks, and historical batch job records. We provide a cost-benefit analysis of matrix engines, both asymptotically and in conjunction with state-of-the-art processors. While our empirical data will temper the enthusiasm, we also outline opportunities to misuse these dense matrix-multiplication engines if they come for free.

DOI： 10.1109/IPDPS49936.2021.00114

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ipps/ipdps2021.html#DomkeVDCO0SMPWM21
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs. Reviewed

Daichi Mukunoki, Takeshi Ogita

J. Comput. Appl. Math. Vol. 372 page： 112701 - 112701 2020.7

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier {BV}

DOI： 10.1016/j.cam.2019.112701

researchmap
DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions Reviewed Open Access

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Proc. ISC High Performance 2020, Lecture Notes in Computer Science Vol. 12151 page： 230 - 248 2020.6

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-030-50743-5_12

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/supercomputer/isc2020.html#MukunokiOOI20
Design of an FPGA-Based Matrix Multiplier with Task Parallelism. Reviewed Open Access

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

Proc. International Conference on Parallel Computing (ParCo2019), Parallel Computing: Technology Trends Vol. 36 page： 241 - 250 2019

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IOS Press

DOI： 10.3233/APC200047

Open Access

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/parco/parco2019.html#TanIM19
Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures. Reviewed

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM2019), Lecture Notes in Computer Science Vol. 12043 page： 516 - 527 2019

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-030-43229-4_44

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ppam/ppam2019-1.html#MukunokiOO19
Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster. Reviewed

Daichi Mukunoki, Toshiyuki Imamura

Proc. International Conference on Computational Science (ICCS 2018), Lecture Notes in Computer Science Vol. 10862 page： 853 - 858 2018.6

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-319-93713-7_85

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iccS/iccS2018-3.html#MukunokiI18
Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats. Reviewed

Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, Masahiko Machida

Proc. International Conference on Parallel Computing (ParCo2017), Advances in Parallel Computing page： 97 - 106 2017.9

　More details

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IOS Press

DOI： 10.3233/978-1-61499-843-3-97

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/parco/parco2017.html#ImamuraMHYM17
Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer. Reviewed

Daichi Mukunoki, Toshiyuki Imamura

Proc. 12th International Conference on Parallel Processing and Applied Mathematics (PPAM2017), Lecture Notes in Computer Science Vol. 10777 page： 348 - 358 2017

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-319-78024-5_31

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ppam/ppam2017-1.html#MukunokiI17
Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs. Reviewed

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16) page： 377 - 384 2016

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

DOI： 10.1109/MCSoC.2016.32

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/mcsoc/mcsoc2016.html#MukunokiIT16
Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation. Reviewed

Daichi Mukunoki, Toshiyuki Imamura

Proc. IEEE International Conference on Cluster Computing (Cluster 2016) page： 144 - 145 2016

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

DOI： 10.1109/CLUSTER.2016.77

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/cluster/cluster2016.html#MukunokiI16
Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs. Reviewed

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015) page： 642 - 650 2015

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

DOI： 10.1109/PDP.2015.66

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/pdp/pdp2015.html#MukunokiIT15
Implementation and Evaluation of Triple and Quadruple Precision Floating-point Operations on GPUs Reviewed Open Access

Vol. 6 ( 1 ) page： 66 - 77 2013.1

　More details

Authorship：Lead author,　Corresponding author Language：Japanese

Open Access

CiNii Research

researchmap

Other Link： http://id.nii.ac.jp/1001/00089921/
Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. Reviewed

Daichi Mukunoki, Daisuke Takahashi

Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science Vol. 7975 page： 211 - 223 2013

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-642-39640-3_15

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/iccsa/iccsa2013-5.html#MukunokiT13
Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs. Reviewed

Daichi Mukunoki, Daisuke Takahashi

Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science Vol. 8384 page： 632 - 642 2013

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-642-55224-3_59

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ppam/ppam2013-1.html#MukunokiT13
Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs. Reviewed

Daichi Mukunoki, Daisuke Takahashi

Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12) page： 1378 - 1386 2012

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE Computer Society

DOI： 10.1109/IPDPSW.2012.175

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/ipps/ipdps2012w.html#MukunokiT12
Implementation and Evaluation of Quadruple and Octuple Precision BLAS on GPUs Reviewed Open Access

Vol. 2011 ( 2011 ) page： 148 - 156 2011.1

　More details

Authorship：Lead author,　Corresponding author Language：Japanese

Open Access

CiNii Research

researchmap
Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs. Reviewed

Daichi Mukunoki, Daisuke Takahashi

Proc. 10th International Conference on Applied Parallel and Scientific Computing (PARA 2010), Part I, Lecture Notes in Computer Science Vol. 7133 page： 249 - 259 2010

　More details

Authorship：Lead author,　Corresponding author Language：English Publishing type：Research paper (international conference proceedings) Publisher：Springer

DOI： 10.1007/978-3-642-28151-8_25

researchmap

Other Link： https://dblp.uni-trier.de/db/conf/para/para2010-1.html#MukunokiT10
Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Ryo Mikasa, Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

2026.2

　More details

Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train LLMs using runtime performance as a reward in the HPC domain. We propose an online reinforcement learning approach that executes LLM-generated code on a supercomputer and directly feeds back the measured runtime performance (GFLOPS) as a reward. We further introduce a Staged Quality-Diversity (SQD) algorithm that progressively varies the permitted optimization techniques on a per-problem basis, enabling the model to learn code optimization from diverse perspectives. We build a distributed system connecting a GPU training cluster with a CPU benchmarking cluster, and train Qwen2.5 Coder 14B on a double-precision matrix multiplication task using Group Relative Policy Optimization (GRPO). Through two experiments, we show that reinforcement learning combining runtime performance feedback with staged optimization can improve the HPC code generation capability of LLMs.

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2602.12049v1
Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM

Xuanzhengbo Ren, Yuta Kawai, Tetsuya Hoshino, Hirofumi Tomita, Takahiro Katagiri, Daichi Mukunoki, Seiya Nishizawa

2026.1

　More details

Accurate performance prediction is essential for optimizing scientific applications on modern high-performance computing (HPC) architectures. Widely used performance models primarily focus on cache and memory bandwidth, which is suitable for many memory-bound workloads. However, it is unsuitable for highly arithmetic intensive cases such as the sum-factorization with tensor $n$-mode product kernels, which are an optimization technique for high-order finite element methods (FEM). On processors with relatively high single instruction multiple data (SIMD) instruction latency, such as the Fujitsu A64FX, the performance of these kernels is strongly influenced by loop-body splitting strategies. Memory-bandwidth-oriented models are therefore not appropriate for evaluating these splitting configurations, and a model that directly reflects instruction-level efficiency is required. To address this need, we develop a dependency-chain-based analytical formulation that links loop-splitting configurations to instruction dependencies in the tensor $n$-mode product kernel. We further use XGBoost to estimate key parameters in the analytical model that are difficult to model explicitly. Evaluations show that the learning-augmented model outperforms the widely used standard Roofline and Execution-Cache-Memory (ECM) models. On the Fujitsu A64FX processor, the learning-augmented model achieves mean absolute percentage errors (MAPE) between 1% and 24% for polynomial orders ($P$) from 1 to 15. In comparison, the standard Roofline and ECM models yield errors of 42%-256% and 5%-117%, respectively. On the Intel Xeon Gold 6230 processor, the learning-augmented model achieves MAPE values from 1% to 13% for $P$=1 to $P$=14, and 24% at $P$=15. In contrast, the standard Roofline and ECM models produce errors of 1%-73% and 8%-112% for $P$=1 to $P$=15, respectively.

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2601.06886v1
Single-precision Matrix Multiplication Performance on Cerebras CS-2: Evaluation and Modelling of Performance, Scalability and Energy Efficiency Reviewed Open Access

Takaaki Miyajima, Ryunosuke Matsuzaki, Daichi Mukunoki

Journal of Information Processing Vol. 34 page： 132 - 139 2026

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：Information Processing Society of Japan

DOI： 10.2197/ipsjjip.34.132

Open Access

researchmap
Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms

Daichi Mukunoki, Katsuhisa Ozaki

CoRR Vol. abs/2510.13536 2025.10

　More details

Publishing type：Research paper (scientific journal)

To obtain accurate results in numerical computation, high-precision arithmetic is a straightforward approach. However, most processors lack hardware support for floating-point formats beyond double precision (FP64). Double-word arithmetic (Dekker 1971) extends precision by using standard floating-point operations to represent numbers with twice the mantissa length. Building on this concept, various multi-word arithmetic methods have been proposed to further increase precision by combining additional words. Simplified variants, known as quasi algorithms, have also been introduced, which trade a certain loss of accuracy for reduced computational cost. In this study, we investigate the performance of quasi algorithms for double- and triple-word arithmetic in sparse iterative solvers based on the Conjugate Gradient method, and compare them with both non-quasi algorithms and standard FP64. We evaluate execution time on an x86 processor, the number of iterations to convergence, and solution accuracy. Although quasi algorithms require appropriate normalization to preserve accuracy - without it, convergence cannot be achieved - they can still reduce runtime when normalization is applied correctly, while maintaining accuracy comparable to full multi-word algorithms. In particular, quasi triple-word arithmetic can yield more accurate solutions without significantly increasing execution time relative to double-word arithmetic and its quasi variant. Furthermore, for certain problems, a reduction in iteration count contributes to additional speedup. Thus, quasi triple-word arithmetic can serve as a compelling alternative to conventional double-word arithmetic in sparse iterative solvers.

DOI： 10.48550/arXiv.2510.13536

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2510.13536v1
3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG

Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Satoshi Ohshima, Takahiro Katagiri

CoRR Vol. abs/2510.04536 2025.10

　More details

Publishing type：Research paper (scientific journal)

This paper proposes "3Dify," a procedural 3D computer graphics (3D-CG) generation framework utilizing Large Language Models (LLMs). The framework enables users to generate 3D-CG content solely through natural language instructions. 3Dify is built upon Dify, an open-source platform for AI application development, and incorporates several state-of-the-art LLM-related technologies such as the Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG). For 3D-CG generation support, 3Dify automates the operation of various Digital Content Creation (DCC) tools via MCP. When DCC tools do not support MCP-based interaction, the framework employs the Computer-Using Agent (CUA) method to automate Graphical User Interface (GUI) operations. Moreover, to enhance image generation quality, 3Dify allows users to provide feedback by selecting preferred images from multiple candidates. The LLM then learns variable patterns from these selections and applies them to subsequent generations. Furthermore, 3Dify supports the integration of locally deployed LLMs, enabling users to utilize custom-developed models and to reduce both time and monetary costs associated with external API calls by leveraging their own computational resources.

DOI： 10.48550/arXiv.2510.04536

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2510.04536v1
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

CoRR Vol. abs/2510.00031 2025.9

　More details

Publishing type：Research paper (scientific journal)

We propose VibeCodeHPC, an automatic tuning system for HPC programs based on multi-agent LLMs for code generation. VibeCodeHPC tunes programs through multi-agent role allocation and iterative prompt refinement. We describe the system configuration with four roles: Project Manager (PM), System Engineer (SE), Programmer (PG), and Continuous Delivery (CD). We introduce dynamic agent deployment and activity monitoring functions to facilitate effective multi-agent collaboration. In our case study, we convert and optimize CPU-based matrix-matrix multiplication code written in C to GPU code using CUDA. The multi-agent configuration of VibeCodeHPC achieved higher-quality code generation per unit time compared to a solo-agent configuration. Additionally, the dynamic agent deployment and activity monitoring capabilities facilitated more effective identification of requirement violations and other issues.

DOI： 10.48550/arXiv.2510.00031

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2510.00031v1
Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach

Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

CoRR Vol. abs/2507.20295 2025.7

　More details

Publishing type：Research paper (scientific journal)

Coherent Ising Machines (CIMs) have recently gained attention as a promising computing model for solving combinatorial optimization problems. In particular, the Chaotic Amplitude Control (CAC) algorithm has demonstrated high solution quality, but its performance is highly sensitive to a large number of hyperparameters, making efficient tuning essential. In this study, we present an algorithm portfolio approach for hyperparameter tuning in CIMs employing Chaotic Amplitude Control with momentum (CACm) algorithm. Our method incorporates multiple search strategies, enabling flexible and effective adaptation to the characteristics of the hyperparameter space. Specifically, we propose two representative tuning methods, Method A and Method B. Method A optimizes each hyperparameter sequentially with a fixed total number of trials, while Method B prioritizes hyperparameters based on initial evaluations before applying Method A in order. Performance evaluations were conducted on the Supercomputer "Flow" at Nagoya University, using planted Wishart instances and Time to Solution (TTS) as the evaluation metric. Compared to the baseline performance with best-known hyperparameters, Method A achieved up to 1.47x improvement, and Method B achieved up to 1.65x improvement. These results demonstrate the effectiveness of the algorithm portfolio approach in enhancing the tuning process for CIMs.

DOI： 10.48550/arXiv.2507.20295

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2507.20295v1
Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation

Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri

CoRR Vol. abs/2507.04697 2025.7

　More details

Publishing type：Research paper (scientific journal)

Generative AI technology based on Large Language Models (LLM) has been developed and applied to assist or automatically generate program codes. In this paper, we evaluate the capability of existing general LLMs for Basic Linear Algebra Subprograms (BLAS) code generation for CPUs. We use two LLMs provided by OpenAI: GPT-4.1, a Generative Pre-trained Transformer (GPT) model, and o4-mini, one of the o-series of Reasoning models. Both have been released in April 2025. For the routines from level-1 to 3 BLAS, we tried to generate (1) C code without optimization from routine name only, (2) C code with basic performance optimizations (thread parallelization, SIMD vectorization, and cache blocking) from routine name only, and (3) C code with basic performance optimizations based on Fortran reference code. As a result, we found that correct code can be generated in many cases even when only routine name are given. We also confirmed that thread parallelization with OpenMP, SIMD vectorization, and cache blocking can be implemented to some extent, and that the code is faster than the reference code.

DOI： 10.48550/arXiv.2507.04697

arXiv

researchmap

Other Link： https://arxiv.org/pdf/2507.04697v1
Application of AT to Parameter Tuning in Coherent Ising Machines

羽生達郎, 片桐孝洋, 森下誠, 高橋一郎, 河合直聡, 椋木大地, 星野哲也, 永井亨

計算工学講演会論文集(CD-ROM) Vol. 30 page： 957 - 960 2025.6

　More details

Language：Japanese

J-GLOBAL

researchmap

Other Link： https://ndlsearch.ndl.go.jp/books/R000000004-I034175077
BLASコードを題材としたGPTモデルによる数値計算コード実装支援に関する考察

椋木大地, 林俊一郎, 星野哲也, 片桐孝洋

情報処理学会研究報告(Web) Vol. 2025 ( HPC-200 ) 2025

　More details

J-GLOBAL

researchmap
疎行列反復解法の深層学習を用いた実行時間予測モデル構築と評価

中谷崇真, 河合直聡, 河合直聡, 片桐孝洋, 星野哲也, 永井亨, 椋木大地

情報処理学会研究報告(Web) Vol. 2025 ( HPC-199 ) 2025

　More details

J-GLOBAL

researchmap
機械学習によるLAPACK固有値計算ルーチンのテストシーケンス最適化の試行

樫村寛大, 片桐孝洋, 森崎修司, 星野哲也, 椋木大地

情報処理学会研究報告(Web) Vol. 2025 ( HPC-201 ) 2025

　More details

J-GLOBAL

researchmap
コヒーレントイジングマシンの性能パラメタ最適化のための探索アルゴリズム選択可能な手法の提案

羽生達郎, 森下誠, 水木直也, 片桐孝洋, 椋木大地, 河合直聡, 星野哲也, 永井亨

情報処理学会研究報告(Web) Vol. 2025 ( HPC-198 ) 2025

　More details

J-GLOBAL

researchmap
SVMによる誤差を含むクラス分類における多種疑似量子アニーラの性能評価

水木直也, 森下誠, 河合直聡, 片桐孝洋, 椋木大地, 星野哲也, 永井亨

情報処理学会研究報告(Web) Vol. 2025 ( HPC-198 ) 2025

　More details

J-GLOBAL

researchmap
MCP・RAGを用いたプロシージャル3D生成LLMエージェント3Difyの提案とスパコンの利用

林俊一郎, 椋木大地, 片桐孝洋, 星野哲也, 大島聡史

情報処理学会研究報告(Web) Vol. 2025 ( HPC-200 ) 2025

　More details

J-GLOBAL

researchmap
GeoFEMを対象としたClaude CodeによるGPUコード開発の評価

星野哲也, 林俊一郎, 椋木大地, 片桐孝洋, 塙敏博

情報処理学会研究報告(Web) Vol. 2025 ( HPC-201 ) 2025

　More details

J-GLOBAL

researchmap
VibeCodeHPC:HPCコード自動チューニングのためのマルチLLMエージェントシステム

林俊一郎, 森田光貴, 椋木大地, 星野哲也, 片桐孝洋

情報処理学会研究報告(Web) Vol. 2025 ( ARC-263 ) 2025

　More details

J-GLOBAL

researchmap
gpt-oss-120bを用いたコード自動最適化マルチエージェントシステムの試作

椋木大地, 森田光貴, 林俊一郎, 三笠諒, 星野哲也, 片桐孝洋

情報処理学会研究報告(Web) Vol. 2025 ( ARC-263 ) 2025

　More details

J-GLOBAL

researchmap
csDF:Cerebras CS-2向け疑似倍精度浮動小数点演算ライブラリの実装

村上魁, 長島令旺, 中村暁, 松崎竜之介, 吉井一友, 椋木大地, 宮島敬明

情報処理学会研究報告(Web) Vol. 2025 ( ARC-263 ) 2025

　More details

J-GLOBAL

researchmap
Application of Sextuple-Precision Operations using Quasi Triple-Word Arithmetic to Sparse Iterative Solvers Open Access

椋木大地, 尾崎克久

情報処理学会研究報告(Web) Vol. 2024-HPC-197 ( 11 ) page： 1 - 7 2024.12

　More details

Authorship：Lead author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

Open Access

J-GLOBAL

researchmap
Performance Evaluation of Adaptive-Precision SpMV with Reduced-Precision Formats

Stef Grailla, Fabienne Jézéquel, Théo Mary, Roméo Molina, Daichi Mukunoki

HAL Vol. hal-04261073 2023.10

　More details

Language：English Publishing type：Research paper (other academic)

researchmap
CPUにおけるbatched BLASのためのタスクスケジューリング戦略

椋木大地, 廣田悠輔, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2021 2022

　More details

J-GLOBAL

researchmap
尾崎スキームによる無限精度内積と再現可能疎行列反復ソルバーへの応用

椋木大地, 尾崎克久, 荻田武史, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2022 2022

　More details

J-GLOBAL

researchmap
不等分割による行列積のエラーフリー変換の高精度計算への応用

尾崎克久, 椋木大地, 荻田武史, 荻田武史

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2022 2022

　More details

J-GLOBAL

researchmap
A mixed-precision algorithm of the CG method using the group-wise update strategy

AIHARA Kensuke, OZAKI Katsuhisa, MUKUNOKI Daichi

Conference Proceedings. JSST Annual International Conference on Simulation Technology (Web) Vol. 41st 2022

　More details

J-GLOBAL

researchmap
Acceleration of Error-Free Transformation of Matrix Multiplication using GPU Tensor Cores

OZAKI Katsuhisa, MUKUNOKI Daichi, OGITA Takeshi

International Conference on Simulation Technology (CD-ROM) Vol. 40th 2021

　More details

J-GLOBAL

researchmap
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

Roman Iakymchuk, Daichi Mukunoki, Artur Podobas, Fabienne Jézéquel, Toshiyuki Imamura, Norihisa Fujita, Jens Huthmann, Shuhei Kudo, Yiyu Tan, Jens Domke, Kai Torben Ohlhus, Takeshi Fukaya, Takeo Hoshi, Yuki Murakami, Maho Nakata, Takeshi Ogita, Kentaro Sano, Taisuke Boku

CoRR Vol. abs/2004.04628 2020.4

　More details

Language：English Publishing type：Research paper (conference, symposium, etc.)

In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Spe cifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/.

arXiv

researchmap

Other Link： https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-04628
GPUの単精度演算・Tensorコアを用いた行列積のエラーフリー変換

尾崎克久, 椋木大地, 荻田武史

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2020 2020

　More details

J-GLOBAL

researchmap
オーバー・アンダーフローを抑えた高精度かつ高速な2ノルム計算手法 Open Access

原山赳幸, 工藤周平, 椋木大地, 今村俊幸, 高橋大介

情報処理学会研究報告(Web) Vol. 2020 ( HPC-177 ) 2020

　More details

Open Access

J-GLOBAL

researchmap
尾崎スキームを用いたbinary128による4倍精度行列積

椋木大地, 尾崎克久, 荻田武史

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2020 2020

　More details

J-GLOBAL

researchmap
尾崎スキームによる高精度かつ再現性のあるBLAS実装

椋木大地, 荻田武史, 尾崎克久, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2019 2019

　More details

J-GLOBAL

researchmap
Level-3BLASに基づく高精度行列積計算法による高精度かつ再現性のあるBLASルーチンの実装とその最適化 Open Access

椋木大地, 荻田武史, 尾崎克久

情報処理学会研究報告(Web) Vol. 2018 ( HPC-166 ) 2018

　More details

Open Access

J-GLOBAL

researchmap
京コンピュータにおける2.5次元アルゴリズムを用いた分散並列行列積の実装と評価 Open Access

椋木大地, 今村俊幸

情報処理学会研究報告(Web) Vol. 2017 ( HPC-159 ) 2017

　More details

Open Access

J-GLOBAL

researchmap
KMATHLIB-High Performance and Scalable Numerical Library for the K Computer-

大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2016 2016

　More details

J-GLOBAL

researchmap
大規模並列計算機における連立一次方程式の精度保証付き数値計算に対する性能評価 Open Access

森倉悠介, 椋木大地, 深谷猛, 山中脩也, 大石進一

情報処理学会研究報告(Web) Vol. 2016 ( HPC-157 ) 2016

　More details

Open Access

J-GLOBAL

researchmap
コンシューマレンジのGPUに最適化した固有値ソルバーの実装と評価 Open Access

今村俊幸, 椋木大地

情報処理学会研究報告(Web) Vol. 2016 ( HPC-157 ) 2016

　More details

Open Access

J-GLOBAL

researchmap
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS

MUKUNOKI Daichi, IMAMURA Toshiyuki, TAKAHASHI Daisuke

Plans and Future for International Collaborations on Extreme Scale Computing. 6th AICS International Symposium. RIKEN Symposium, 2016 2016

　More details

J-GLOBAL

researchmap
CUDA-BLAS等の選択による最速GPU固有値ソルバーの性能評価 Open Access

今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

情報処理学会研究報告(Web) Vol. 2015 ( HPC-148 ) 2015

　More details

Open Access

J-GLOBAL

researchmap
FFTを使った時間発展問題における累積誤差

佐々成正, 山田進, 町田昌彦, 椋木大地, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) Vol. 2015 2015

　More details

J-GLOBAL

researchmap
NVIDIA GPUにおけるGEMVカーネルの自動チューニング

椋木大地, 今村俊幸, 高橋大介

計算工学講演会論文集(CD-ROM) Vol. 20 2015

　More details

J-GLOBAL

researchmap
短尺浮動小数点形式の検討 Open Access

椋木大地, 今村俊幸

情報処理学会研究報告(Web) Vol. 2015 ( HPC-152 ) 2015

　More details

Open Access

J-GLOBAL

researchmap
京・FX10における倍々精度演算の高速化 Open Access

佐々木信一, 菱沼利彰, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸

情報処理学会研究報告(Web) Vol. 2015 ( HPC-151 ) 2015

　More details

Open Access

J-GLOBAL

researchmap
SYMV・GEMVルーチン群のマルチGPU化とその評価 Open Access

今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

情報処理学会研究報告(Web) Vol. 2015 ( HPC-151 ) 2015

　More details

Open Access

J-GLOBAL

researchmap
NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法 Open Access

椋木大地, 今村俊幸, 高橋大介

情報処理学会研究報告(Web) Vol. 2015 ( HPC-150 ) 2015

　More details

Open Access

J-GLOBAL

researchmap
CUDA-xSYMVの実装と評価 Open Access

今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

情報処理学会研究報告(Web) Vol. 2014 ( HPC-146 ) 2014

　More details

Open Access

J-GLOBAL

researchmap
MaxwellアーキテクチャGPUにおける疑似倍精度演算を用いたDGEMMの実装と評価 Open Access

椋木大地, 今村俊幸

情報処理学会研究報告(Web) Vol. 2014 ( ARC-213 ) 2014

　More details

Open Access

J-GLOBAL

researchmap
GPUにおける高速なCRS形式疎行列ベクトル積の実装 Open Access

椋木大地, 高橋大介

研究報告ハイパフォーマンスコンピューティング（HPC） Vol. 2013 ( 5 ) page： 1 - 7 2013.2

　More details

Language：Japanese

疎行列ベクトル積（SpMV）は科学技術計算において多用される重要な基本演算である．本稿では GPU における高速な CRS 形式 SpMV の実装について報告する．GPU として NVIDIA 社の Kepler アーキテクチャを対象とし，CUDA5.0 環境において実装を行った．従来の Fermi アーキテクチャまでの GPU を対象に提案されていた実装手法をベースに，Kepler アーキテクチャで新たにサポートされた機能や仕様変更を活用して，最適化を行った．Kepler アーキテクチャの Tesla K20 における性能評価では，CUDA5.0 に付属の cuSPARSE における CRS 形式の倍精度 SpMV ルーチンに対して，200 種類の行列において，平均で約 1.86 倍，177 種類の行列で性能向上を達成した．

Open Access

CiNii Research

researchmap
GPUにおける4倍精度浮動小数点演算を用いたクリロフ部分空間法の高速化 Open Access

椋木大地, 椋木大地, 高橋大介

情報処理学会研究報告(Web) Vol. 2013 ( HPC-140 ) 2013

　More details

Open Access

J-GLOBAL

researchmap
Implementation and Evaluation of Triple and Quadruple Precision Floating-point Operations on GPUs

椋木大地, 高橋大介

情報処理学会論文誌トランザクション(CD-ROM) Vol. 2012 ( 2 ) 2013

　More details

J-GLOBAL

researchmap
GPUにおける高速なCRS形式疎行列ベクトル積の実装

椋木大地, 高橋大介

情報処理学会研究報告(CD-ROM) Vol. 2012 ( 6 ) 2013

　More details

J-GLOBAL

researchmap
GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価 Open Access

椋木大地, 高橋大介

研究報告ハイパフォーマンスコンピューティング（HPC） Vol. 2012 ( 37 ) page： 1 - 8 2012.12

　More details

Language：Japanese

疎行列の反復解法として用いられるクリロフ部分空間法は，丸め誤差の影響によって収束までの反復回数が増加したり，収束しなくなるケースがある．このような場合に高精度演算を用いることで収束性を改善できるケースがあることが報告されている．このとき，高精度演算を行うことによる1反復あたりの計算時間の増大に対して，反復回数の削減による計算時間の短縮効果が大きければ，求解までの計算時間を短縮できる可能性がある．我々は GPU （Tesla M2050）において Double-Double （DD）演算による 4 倍精度を用いて，クリロフ部分空間法の一つである BiCGStab 法を実装し性能を評価した． GPU 上では 4 倍精度 BiCGStab 法の 1 反復あたりの計算時間が，倍精度の約 1.0-2.2 倍となり，反復回数の削減量によっては， 4 倍精度演算を用いることで求解までの計算時間を短縮できる場合が存在した．本稿では GPU 上の疎行列反復解法における 4 倍精度演算の性能と有効性について検討する．

Open Access

CiNii Research

researchmap
GPUによる3倍精度浮動小数点演算の検討 Open Access

椋木大地, 高橋大介

情報処理学会研究報告(CD-ROM) Vol. 2011 ( 4 ) 2011

　More details

Open Access

J-GLOBAL

researchmap
GPUによる4倍精度BLASの実装と評価

椋木大地, 高橋大介

計算工学講演会論文集 Vol. 15 ( 2 ) 2010

　More details

J-GLOBAL

researchmap
Implementation and Evaluation of Quadruple Precision BLAS on GPU Open Access

椋木大地, 高橋大介

情報処理学会研究報告(CD-ROM) Vol. 2009 ( 4 ) 2009

　More details

Open Access

J-GLOBAL

researchmap

▼display all

To the head of Papers.▲

Presentations 287

Toward Automatic Generation of High Performance Numerical Codes by LLMs International conference

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri

SIAM Conference on Parallel Processing for Scientific Computing (PP26) 2026.3

　More details

Event date： 2026.3

Language：English Presentation type：Oral presentation (general)
Toward Automatic Generation of High Performance Numerical Codes by LLMs International conference

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri

SIAM Conference on Parallel Processing for Scientific Computing (PP26) 2026.3

　More details

Event date： 2026.3

Language：English Presentation type：Oral presentation (general)
高性能計算のためのコード生成AIエージェント開発

椋木大地

MateriAI 2025 〜計算物質科学分野におけるAI技術の活用 2026.2.2

　More details

Event date： 2026.2

Language：Japanese Presentation type：Oral presentation (general)
高性能計算のためのコード生成AIエージェント開発

椋木大地

MateriAI 2025 〜計算物質科学分野におけるAI技術の活用 2026.2.2

　More details

Event date： 2026.2

Language：Japanese Presentation type：Oral presentation (general)
生成AIの活用によるHPCコードGPU化の展望

椋木大地

「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会 2026.1.21

　More details

Event date： 2026.1

Language：Japanese Presentation type：Oral presentation (general)
生成AIの活用によるHPCコードGPU化の展望

椋木大地

「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会 2026.1.21

　More details

Event date： 2026.1

Language：Japanese Presentation type：Oral presentation (general)
Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method International conference

Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Performance Evaluation of SVM with Multiple Quantum-inspired Annealers International conference

Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs International conference

Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
A Multi Agent System for Local LLM-Based HPC Code Generation International conference

Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models International conference

Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme International conference

Daichi Mukunoki

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes International conference

Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization International conference

Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning International conference

Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme International conference

Daichi Mukunoki

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method International conference

Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs International conference

Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Performance Evaluation of SVM with Multiple Quantum-inspired Annealers International conference

Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
A Multi Agent System for Local LLM-Based HPC Code Generation International conference

Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models International conference

Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning International conference

Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization International conference

Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes International conference

Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Event date： 2026.1

Language：English Presentation type：Poster presentation
AI時代のハードウェアとFP64エミュレーション

椋木大地

第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025） 2025.12.23

　More details

Event date： 2025.12

Language：Japanese Presentation type：Oral presentation (general)
AI時代のハードウェアとFP64エミュレーション

椋木大地

第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025） 2025.12.23

　More details

Event date： 2025.12

Language：Japanese Presentation type：Oral presentation (general)
Automatic Generation of Numerical Codes for GPUs Using LLMs International conference

Daichi Mukunoki

JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing 2025.12.5

　More details

Event date： 2025.12

Language：English Presentation type：Oral presentation (general)
Automatic Generation of Numerical Codes for GPUs Using LLMs International conference

Daichi Mukunoki

JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing 2025.12.5

　More details

Event date： 2025.12

Language：English Presentation type：Oral presentation (general)
Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI International conference

Daichi Mukunoki

58th ASE Seminar 2025.12.1

　More details

Event date： 2025.12

Language：English Presentation type：Oral presentation (general)
Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI International conference

Daichi Mukunoki

58th ASE Seminar 2025.12.1

　More details

Event date： 2025.12

Language：English Presentation type：Oral presentation (general)
csDF: a double-float arithmetic library for the Cerebras CS-2 International conference

Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC25 research poster session 2025.11.16

　More details

Event date： 2025.11

Language：English Presentation type：Poster presentation
csDF: a double-float arithmetic library for the Cerebras CS-2 International conference

Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC25 research poster session 2025.11.16

　More details

Event date： 2025.11

Language：English Presentation type：Poster presentation
LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性

林俊一郎、森田光貴、椋木大地、星野哲也、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025.10.20

　More details

Event date： 2025.10

Language：Japanese Presentation type：Poster presentation
LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望

椋木大地、林俊一郎、星野哲也、森田光貴、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025.10.20

　More details

Event date： 2025.10

Language：Japanese Presentation type：Poster presentation
LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性

林俊一郎、森田光貴、椋木大地、星野哲也、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025.10.20

　More details

Event date： 2025.10

Language：Japanese Presentation type：Poster presentation
LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望

椋木大地、林俊一郎、星野哲也、森田光貴、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025.10.20

　More details

Event date： 2025.10

Language：Japanese Presentation type：Poster presentation
生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望

林俊一郎、椋木大地

2025年度第2回物性アプリオープンフォーラム 2025.9.29

　More details

Event date： 2025.9

Language：Japanese Presentation type：Oral presentation (general)
生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望

林俊一郎、椋木大地

2025年度第2回物性アプリオープンフォーラム 2025.9.29

　More details

Event date： 2025.9

Language：Japanese Presentation type：Oral presentation (general)
Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI International conference

Daichi Mukunoki

The 6th "FugakuNEXT" Application Seminar 2025.9.25

　More details

Event date： 2025.9

Language：English Presentation type：Oral presentation (general)
Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI International conference

Daichi Mukunoki

The 6th "FugakuNEXT" Application Seminar 2025.9.25

　More details

Event date： 2025.9

Language：English Presentation type：Oral presentation (general)
汎用LLMによるBLASコード自動生成能力の考察

椋木大地

第6回スーパーコンピュータ「不老」ユーザ会 2025.9.11

　More details

Event date： 2025.9

Language：Japanese Presentation type：Oral presentation (general)
汎用LLMによるBLASコード自動生成能力の考察

椋木大地

第6回スーパーコンピュータ「不老」ユーザ会 2025.9.11

　More details

Event date： 2025.9

Language：Japanese Presentation type：Oral presentation (general)
GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化

湯淺義尚、小田昌宏、椋木大地、片桐孝洋、星野哲也、河合直聡、永井亨、森健策

第44回日本医用画像工学会大会（JAMIT 2025） 2025.8.28

　More details

Event date： 2025.8

Language：Japanese Presentation type：Poster presentation
GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化

湯淺義尚、小田昌宏、椋木大地、片桐孝洋、星野哲也、河合直聡、永井亨、森健策

第44回日本医用画像工学会大会（JAMIT 2025） 2025.8.28

　More details

Event date： 2025.8

Language：Japanese Presentation type：Poster presentation
HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution

林俊一郎、椋木大地、星野哲也、片桐孝洋

xSIG 2025 2025.8

　More details

Event date： 2025.8

Language：Japanese Presentation type：Poster presentation
HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution

林俊一郎、椋木大地、星野哲也、片桐孝洋

xSIG 2025 2025.8

　More details

Event date： 2025.8

Language：Japanese Presentation type：Poster presentation
LLMによるBLASコード生成に関する考察

椋木大地

第33回AT研究会オープンアカデミックセッション（ATOS33） 2025.7.28

　More details

Event date： 2025.7

Language：Japanese Presentation type：Oral presentation (general)
LLMによるBLASコード生成に関する考察

椋木大地

第33回AT研究会オープンアカデミックセッション（ATOS33） 2025.7.28

　More details

Event date： 2025.7

Language：Japanese Presentation type：Oral presentation (general)
生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望

椋木大地

情報処理学会東海支部主催第6回講演会 2025.1.9

　More details

Event date： 2025.1

Language：Japanese Presentation type：Oral presentation (general)
生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望

椋木大地

情報処理学会東海支部主催第6回講演会 2025.1.9

　More details

Event date： 2025.1

Language：Japanese Presentation type：Oral presentation (general)
Multiple- and Mixed-Precision BLAS with C++ Template International conference

Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023.8.24

　More details

Event date： 2023.8

Language：English Presentation type：Oral presentation (general)
Multiple- and Mixed-Precision BLAS with C++ Template International conference

Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023.8.24

　More details

Event date： 2023.8

Language：English Presentation type：Oral presentation (general)
Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications International conference

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023.8.21

　More details

Event date： 2023.8

Language：English Presentation type：Oral presentation (general)
Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications International conference

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023.8.21

　More details

Event date： 2023.8

Language：English Presentation type：Oral presentation (general)
tmBLAS: a Mixed Precision BLAS by C++ Template International conference

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2023) 2023.5

　More details

Event date： 2023.5

Language：English Presentation type：Poster presentation
tmBLAS: a Mixed Precision BLAS by C++ Template International conference

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2023) 2023.5

　More details

Event date： 2023.5

Language：English Presentation type：Poster presentation
Multiple and Mixed Precision BLAS with C++ Template International conference

Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura

5th R-CCS International Symposium 2023.2.6

　More details

Event date： 2023.2

Language：English Presentation type：Poster presentation
Multiple and Mixed Precision BLAS with C++ Template International conference

Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura

5th R-CCS International Symposium 2023.2.6

　More details

Event date： 2023.2

Language：English Presentation type：Poster presentation
疎行列ベクトル積における低精度データ表現の導入について

椋木大地、河合直聡

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022.12.23

　More details

Event date： 2022.12

Language：Japanese Presentation type：Oral presentation (general)
疎行列ベクトル積における低精度データ表現の導入について

椋木大地、河合直聡

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022.12.23

　More details

Event date： 2022.12

Language：Japanese Presentation type：Oral presentation (general)
Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba 2022.10.14

　More details

Event date： 2022.10

Language：English Presentation type：Poster presentation
Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba 2022.10.14

　More details

Event date： 2022.10

Language：English Presentation type：Poster presentation
A mixed-precision algorithm of the CG method using the group-wise update strategy International conference

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

The 41st JSST Annual International Conference on Simulation Technology (JSST2022) 2022.9.2

　More details

Event date： 2022.9

Language：English Presentation type：Oral presentation (general)
A mixed-precision algorithm of the CG method using the group-wise update strategy International conference

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

The 41st JSST Annual International Conference on Simulation Technology (JSST2022) 2022.9.2

　More details

Event date： 2022.9

Language：English Presentation type：Oral presentation (general)
Remedies for Reproducibility Issue in Conjugate Gradient Solvers International conference

Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SparseDays2022 2022.6.20

　More details

Event date： 2022.6

Language：English Presentation type：Poster presentation
Remedies for Reproducibility Issue in Conjugate Gradient Solvers International conference

Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SparseDays2022 2022.6.20

　More details

Event date： 2022.6

Language：English Presentation type：Poster presentation
A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2022) 2022.6.1

　More details

Event date： 2022.6

Language：English Presentation type：Poster presentation
A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2022) 2022.6.1

　More details

Event date： 2022.6

Language：English Presentation type：Poster presentation
Impact and Contribution of Ozaki scheme in High Performance Computing International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022) 2022.3.15

　More details

Event date： 2022.3

Language：English Presentation type：Oral presentation (general)
Impact and Contribution of Ozaki scheme in High Performance Computing International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022) 2022.3.15

　More details

Event date： 2022.3

Language：English Presentation type：Oral presentation (general)
Flying restart付きCG法に対する混合精度演算による近似解精度の向上

相原研輔、尾崎克久、椋木大地

日本応用数理学会第18回研究部会連合発表会 2022.3.9

　More details

Event date： 2022.3

Language：Japanese Presentation type：Oral presentation (general)
Flying restart付きCG法に対する混合精度演算による近似解精度の向上

相原研輔、尾崎克久、椋木大地

日本応用数理学会第18回研究部会連合発表会 2022.3.9

　More details

Event date： 2022.3

Language：Japanese Presentation type：Oral presentation (general)
行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用

尾崎克久、椋木大地、荻田武史

日本応用数理学会第18回研究部会連合発表会 2022.3.8

　More details

Event date： 2022.3

Language：Japanese Presentation type：Oral presentation (general)
行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用

尾崎克久、椋木大地、荻田武史

日本応用数理学会第18回研究部会連合発表会 2022.3.8

　More details

Event date： 2022.3

Language：Japanese Presentation type：Oral presentation (general)
Performance Evaluation of Batched BLAS on A64FX International conference

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

4th R-CCS International Symposium (lightning talk) 2022.2.7

　More details

Event date： 2022.2

Language：English Presentation type：Oral presentation (general)
Performance Evaluation of Batched BLAS on A64FX International conference

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

4th R-CCS International Symposium (lightning talk) 2022.2.7

　More details

Event date： 2022.2

Language：English Presentation type：Oral presentation (general)
精度自動チューニングに向けた基盤技術の検討

椋木大地

第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021) 2021.12.13

　More details

Event date： 2021.12

Language：Japanese Presentation type：Oral presentation (general)
精度自動チューニングに向けた基盤技術の検討

椋木大地

第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021) 2021.12.13

　More details

Event date： 2021.12

Language：Japanese Presentation type：Oral presentation (general)
Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments International conference

Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat

ISC High Performance (ISC 2021) 2021.6.29

　More details

Event date： 2021.6

Language：English Presentation type：Poster presentation
Accurate Matrix Multiplication on Binary128 using Ozaki Scheme International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2021) 2021.6.29

　More details

Event date： 2021.6

Language：English Presentation type：Poster presentation
Accurate Matrix Multiplication on Binary128 using Ozaki Scheme International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2021) 2021.6.29

　More details

Event date： 2021.6

Language：English Presentation type：Poster presentation
Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments International conference

Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat

ISC High Performance (ISC 2021) 2021.6.29

　More details

Event date： 2021.6

Language：English Presentation type：Poster presentation
Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic International conference

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Rencontres Arithmétiques de l'Informatique Mathématique (RAIM) 2021.5

　More details

Event date： 2021.5

Language：English Presentation type：Oral presentation (general)
Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic International conference

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Rencontres Arithmétiques de l'Informatique Mathématique (RAIM) 2021.5

　More details

Event date： 2021.5

Language：English Presentation type：Oral presentation (general)
DGEMM using Tensor Cores International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SIAM Conference on Computational Science and Engineering (CSE21) 2021.3.4

　More details

Event date： 2021.3

Language：English Presentation type：Oral presentation (general)
DGEMM using Tensor Cores International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SIAM Conference on Computational Science and Engineering (CSE21) 2021.3.4

　More details

Event date： 2021.3

Language：English Presentation type：Oral presentation (general)
High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

3rd R-CCS International Symposium 2021.2.15

　More details

Event date： 2021.2

Language：English Presentation type：Poster presentation
High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

3rd R-CCS International Symposium 2021.2.15

　More details

Event date： 2021.2

Language：English Presentation type：Poster presentation
binary128 に対する尾崎スキーム行列積

椋木大地、尾崎克久、荻田武史

第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020) 2020.11.28

　More details

Event date： 2020.11

Language：Japanese Presentation type：Oral presentation (general)
binary128 に対する尾崎スキーム行列積

椋木大地、尾崎克久、荻田武史

第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020) 2020.11.28

　More details

Event date： 2020.11

Language：Japanese Presentation type：Oral presentation (general)
Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments International conference

Roman Iakymchuk, Daichi Mukunoki

Sparse Days Cerfacs 2020.11.24

　More details

Event date： 2020.11

Language：English Presentation type：Oral presentation (general)
Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments International conference

Roman Iakymchuk, Daichi Mukunoki

Sparse Days Cerfacs 2020.11.24

　More details

Event date： 2020.11

Language：English Presentation type：Oral presentation (general)
DGEMM using Tensor Cores and OzBLAS International conference

Daichi Mukunoki

11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop 2020.9.8

　More details

Event date： 2020.9

Language：English Presentation type：Oral presentation (general)
DGEMM using Tensor Cores and OzBLAS International conference

Daichi Mukunoki

11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop 2020.9.8

　More details

Event date： 2020.9

Language：English Presentation type：Oral presentation (general)
An FPGA-based Matrix Multiplier with Task Parallelism International conference

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

2nd R-CCS International Symposium 2020.2.17

　More details

Event date： 2020.2

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

2nd R-CCS International Symposium 2020.2.17

　More details

Event date： 2020.2

Language：English Presentation type：Poster presentation
An FPGA-based Matrix Multiplier with Task Parallelism International conference

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

2nd R-CCS International Symposium 2020.2.17

　More details

Event date： 2020.2

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

2nd R-CCS International Symposium 2020.2.17

　More details

Event date： 2020.2

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki

SIAM Conference on Parallel Processing for Scientific Computing (PP20) 2020.2.15

　More details

Event date： 2020.2

Language：English Presentation type：Oral presentation (general)
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki

SIAM Conference on Parallel Processing for Scientific Computing (PP20) 2020.2.15

　More details

Event date： 2020.2

Language：English Presentation type：Oral presentation (general)
Accurate BLAS implementations: OzBLAS and BLAS-DOT2 International conference

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January) 2020.1.30

　More details

Event date： 2020.1

Language：English Presentation type：Oral presentation (general)
Accurate BLAS implementations: OzBLAS and BLAS-DOT2 International conference

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January) 2020.1.30

　More details

Event date： 2020.1

Language：English Presentation type：Oral presentation (general)
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki

Sapporo Winter HPC Seminar 2020 2020.1.24

　More details

Event date： 2020.1

Language：English Presentation type：Oral presentation (general)
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki

Sapporo Winter HPC Seminar 2020 2020.1.24

　More details

Event date： 2020.1

Language：English Presentation type：Oral presentation (general)
Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations International conference

Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020.1.15

　More details

Event date： 2020.1

Language：English Presentation type：Poster presentation
Accurate DGEMM using Tensor Cores International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020.1.15

　More details

Event date： 2020.1

Language：English Presentation type：Poster presentation
Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations International conference

Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020.1.15

　More details

Event date： 2020.1

Language：English Presentation type：Poster presentation
Accurate DGEMM using Tensor Cores International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020.1.15

　More details

Event date： 2020.1

Language：English Presentation type：Poster presentation
High-performance Implementations of Accurate Linear Algebra Kernels on GPUs International conference

Daichi Mukunoki, Takeshi Ogita

3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST) 2020.1.9

　More details

Event date： 2020.1

Language：English Presentation type：Oral presentation (general)
High-performance Implementations of Accurate Linear Algebra Kernels on GPUs International conference

Daichi Mukunoki, Takeshi Ogita

3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST) 2020.1.9

　More details

Event date： 2020.1

Language：English Presentation type：Oral presentation (general)
尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用

椋木大地、荻田武史、尾崎克久

第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019) 2019.12.1

　More details

Event date： 2019.12

Language：Japanese Presentation type：Oral presentation (general)
尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用

椋木大地、荻田武史、尾崎克久

第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019) 2019.12.1

　More details

Event date： 2019.12

Language：Japanese Presentation type：Oral presentation (general)
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

SC19 research poster session 2019.11.19

　More details

Event date： 2019.11

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

SC19 research poster session 2019.11.19

　More details

Event date： 2019.11

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications 2019.11.7

　More details

Event date： 2019.11

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications 2019.11.7

　More details

Event date： 2019.11

Language：English Presentation type：Poster presentation
Reduced and Extended-Precision Computations on FPGAs and GPUs International conference

Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku

The 11th symposium on Discovery 2019.10.15

　More details

Event date： 2019.10

Language：English Presentation type：Poster presentation
Reduced and Extended-Precision Computations on FPGAs and GPUs International conference

Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku

The 11th symposium on Discovery 2019.10.15

　More details

Event date： 2019.10

Language：English Presentation type：Poster presentation
Accurate and Reproducible CG Method on GPUs International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019) 2019.10.1

　More details

Event date： 2019.10

Language：English Presentation type：Oral presentation (general)
Accurate and Reproducible CG Method on GPUs International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019) 2019.10.1

　More details

Event date： 2019.10

Language：English Presentation type：Oral presentation (general)
Accurate and Reproducible Linear Algebra Operations for Many-core Architectures International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Russian Supercomputing Days 2019 (RuSCDays 2019) 2019.9.23

　More details

Event date： 2019.9

Language：English Presentation type：Poster presentation
Accurate and Reproducible Linear Algebra Operations for Many-core Architectures International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Russian Supercomputing Days 2019 (RuSCDays 2019) 2019.9.23

　More details

Event date： 2019.9

Language：English Presentation type：Poster presentation
High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs International conference

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June) 2019.6.7

　More details

Event date： 2019.6

Language：English Presentation type：Oral presentation (general)
High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs International conference

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June) 2019.6.7

　More details

Event date： 2019.6

Language：English Presentation type：Oral presentation (general)
尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用

椋木大地

第22回AT研究会オープンアカデミックセッション（ATOS22） 2019.5.13

　More details

Event date： 2019.5

Language：Japanese Presentation type：Oral presentation (general)
尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用

椋木大地

第22回AT研究会オープンアカデミックセッション（ATOS22） 2019.5.13

　More details

Event date： 2019.5

Language：Japanese Presentation type：Oral presentation (general)
OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

GPU Technology Conference (GTC 2019) 2019.3.17

　More details

Event date： 2019.3

Language：English Presentation type：Poster presentation
OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

GPU Technology Conference (GTC 2019) 2019.3.17

　More details

Event date： 2019.3

Language：English Presentation type：Poster presentation
Development of Scientific Numerical Libraries on post-K computer International conference

Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu

1st R-CCS International Symposium 2019.2.18

　More details

Event date： 2019.2

Language：English Presentation type：Poster presentation
Development of Scientific Numerical Libraries on post-K computer International conference

Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu

1st R-CCS International Symposium 2019.2.18

　More details

Event date： 2019.2

Language：English Presentation type：Poster presentation
尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価

椋木大地、荻田武史、尾崎克久

第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018) 2018.12.2

　More details

Event date： 2018.12

Language：Japanese Presentation type：Oral presentation (general)
尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価

椋木大地、荻田武史、尾崎克久

第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018) 2018.12.2

　More details

Event date： 2018.12

Language：Japanese Presentation type：Oral presentation (general)
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Computational Reproducibility at Exascale 2018 (CRE2018) 2018.11.11

　More details

Event date： 2018.11

Language：English Presentation type：Oral presentation (general)
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Computational Reproducibility at Exascale 2018 (CRE2018) 2018.11.11

　More details

Event date： 2018.11

Language：English Presentation type：Oral presentation (general)
Accurate and cost-efficient triangular solve International conference

Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki

The 18th International Symposium on Scientific Computing 2018.9.11

　More details

Event date： 2018.9

Language：English Presentation type：Oral presentation (general)
High Performance Implementation of Accurate Matrix Multiplications on GPUs International conference

Daichi Mukunoki, Takeshi Ogita

The 18th International Symposium on Scientific Computing 2018.9.11

　More details

Event date： 2018.9

Language：English Presentation type：Oral presentation (general)
Accurate and cost-efficient triangular solve International conference

Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki

The 18th International Symposium on Scientific Computing 2018.9.11

　More details

Event date： 2018.9

Language：English Presentation type：Oral presentation (general)
High Performance Implementation of Accurate Matrix Multiplications on GPUs International conference

Daichi Mukunoki, Takeshi Ogita

The 18th International Symposium on Scientific Computing 2018.9.11

　More details

Event date： 2018.9

Language：English Presentation type：Oral presentation (general)
High-performance implementations of reproducible and accurate matrix-multiplication International conference

Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita

10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18) 2018.6.27

　More details

Event date： 2018.6

Language：English Presentation type：Oral presentation (general)
High-performance implementations of reproducible and accurate matrix-multiplication International conference

Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita

10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18) 2018.6.27

　More details

Event date： 2018.6

Language：English Presentation type：Oral presentation (general)
Automatic Generation of Full-Set Batched BLAS International conference

Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2018) 2018.6.26

　More details

Event date： 2018.6

Language：English Presentation type：Poster presentation
Automatic Generation of Full-Set Batched BLAS International conference

Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2018) 2018.6.26

　More details

Event date： 2018.6

Language：English Presentation type：Poster presentation
Performance Analysis of 2.5D-PDGEMM on the K Computer International conference

Daichi Mukunoki, Toshiyuki Imamura

SIAM Conference on Parallel Processing for Scientific Computing (PP18) 2018.3.8

　More details

Event date： 2018.3

Language：English Presentation type：Oral presentation (general)
Performance Analysis of 2.5D-PDGEMM on the K Computer International conference

Daichi Mukunoki, Toshiyuki Imamura

SIAM Conference on Parallel Processing for Scientific Computing (PP18) 2018.3.8

　More details

Event date： 2018.3

Language：English Presentation type：Oral presentation (general)
次世代計算機のための数値計算ライブラリの実装技術

椋木大地

日本応用数理学会三部会連携「応用数理セミナー」 2017.12.26

　More details

Event date： 2017.12

Language：Japanese Presentation type：Oral presentation (general)
次世代計算機のための数値計算ライブラリの実装技術

椋木大地

日本応用数理学会三部会連携「応用数理セミナー」 2017.12.26

　More details

Event date： 2017.12

Language：Japanese Presentation type：Oral presentation (general)
HPC分野における精度保証付き数値計算学の展開

荻田武史、椋木大地、尾崎克久

第3回CDMSI（ポスト「京」重点課題（７））シンポジウム 2017.12.5

　More details

Event date： 2017.12

Language：Japanese Presentation type：Poster presentation
HPC分野における精度保証付き数値計算学の展開

荻田武史、椋木大地、尾崎克久

第3回CDMSI（ポスト「京」重点課題（７））シンポジウム 2017.12.5

　More details

Event date： 2017.12

Language：Japanese Presentation type：Poster presentation
Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer International conference

Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2017) 2017.6.20

　More details

Event date： 2017.6

Language：English Presentation type：Poster presentation
Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer International conference

Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2017) 2017.6.20

　More details

Event date： 2017.6

Language：English Presentation type：Poster presentation
Reduced-/Extended-precision BLASの実装方法の検討

椋木大地、今村俊幸

Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017) 2017.3.27

　More details

Event date： 2017.3

Language：Japanese Presentation type：Oral presentation (general)
Reduced-/Extended-precision BLASの実装方法の検討

椋木大地、今村俊幸

Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017) 2017.3.27

　More details

Event date： 2017.3

Language：Japanese Presentation type：Oral presentation (general)
Implementation Techniques for High Performance BLAS Kernels on Modern GPUs International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE17) 2017.2.28

　More details

Event date： 2017.2

Language：English Presentation type：Oral presentation (general)
Implementation Techniques for High Performance BLAS Kernels on Modern GPUs International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE17) 2017.2.28

　More details

Event date： 2017.2

Language：English Presentation type：Oral presentation (general)
PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討

椋木大地、今村俊幸、高橋大介

GTC Japan 2016 2016.10.5

　More details

Event date： 2016.10

Language：Japanese Presentation type：Poster presentation
PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討

椋木大地、今村俊幸、高橋大介

GTC Japan 2016 2016.10.5

　More details

Event date： 2016.10

Language：Japanese Presentation type：Poster presentation
KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2016年度年会 2016.9.13

　More details

Event date： 2016.9

Language：Japanese Presentation type：Poster presentation
KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2016年度年会 2016.9.13

　More details

Event date： 2016.9

Language：Japanese Presentation type：Poster presentation
Performance Evaluation of Verified Computation for Linear Systems on Supercomputer International conference

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi

SIAM: East Asian Section Conference (EASIAM 2016) 2016.6.20

　More details

Event date： 2016.6

Language：English Presentation type：Oral presentation (general)
Performance Evaluation of Verified Computation for Linear Systems on Supercomputer International conference

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi

SIAM: East Asian Section Conference (EASIAM 2016) 2016.6.20

　More details

Event date： 2016.6

Language：English Presentation type：Oral presentation (general)
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

The 6th AICS International Symposium 2016.2.22

　More details

Event date： 2016.2

Language：English Presentation type：Poster presentation
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

The 6th AICS International Symposium 2016.2.22

　More details

Event date： 2016.2

Language：English Presentation type：Poster presentation
Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016) 2016.2.19

　More details

Event date： 2016.2

Language：English Presentation type：Oral presentation (general)
Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016) 2016.2.19

　More details

Event date： 2016.2

Language：English Presentation type：Oral presentation (general)
Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers International conference

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi

2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016) 2016.1.19

　More details

Event date： 2016.1

Language：English Presentation type：Poster presentation
Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers International conference

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi

2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016) 2016.1.19

　More details

Event date： 2016.1

Language：English Presentation type：Poster presentation
GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価

椋木大地、今村俊幸、高橋大介

GTC Japan 2015 2015.9.18

　More details

Event date： 2015.9

Language：Japanese Presentation type：Poster presentation
GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価

椋木大地、今村俊幸、高橋大介

GTC Japan 2015 2015.9.18

　More details

Event date： 2015.9

Language：Japanese Presentation type：Poster presentation
京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2015年度年会 2015.9.9

　More details

Event date： 2015.9

Language：Japanese Presentation type：Poster presentation
京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2015年度年会 2015.9.9

　More details

Event date： 2015.9

Language：Japanese Presentation type：Poster presentation
High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

GPU Technology Conference (GTC 2015) 2015.3.17

　More details

Event date： 2015.3

Language：English Presentation type：Poster presentation
High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

GPU Technology Conference (GTC 2015) 2015.3.17

　More details

Event date： 2015.3

Language：English Presentation type：Poster presentation
疑似四倍精度拡張数学パッケージQP-Pack

今村俊幸、椋木大地、佐々成正、山田進、町田昌彦

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Event date： 2015.1

Language：Japanese Presentation type：Poster presentation
Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装

椋木大地、今村俊幸、高橋大介

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Event date： 2015.1

Language：Japanese Presentation type：Poster presentation
スーパコンピュータ京における倍々精度演算の高速化

佐々木信一、藤井昭宏、田中輝雄、椋木大地、今村俊幸

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Event date： 2015.1

Language：Japanese Presentation type：Poster presentation
疑似四倍精度拡張数学パッケージQP-Pack

今村俊幸、椋木大地、佐々成正、山田進、町田昌彦

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Event date： 2015.1

Language：Japanese Presentation type：Poster presentation
Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装

椋木大地、今村俊幸、高橋大介

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Event date： 2015.1

Language：Japanese Presentation type：Poster presentation
スーパコンピュータ京における倍々精度演算の高速化

佐々木信一、藤井昭宏、田中輝雄、椋木大地、今村俊幸

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Event date： 2015.1

Language：Japanese Presentation type：Poster presentation
KeplerアーキテクチャGPUにおける高速なSGEMVの実装

椋木大地、今村俊幸、高橋大介

GTC Japan 2014 2014.7.16

　More details

Event date： 2014.7

Language：Japanese Presentation type：Poster presentation
KeplerアーキテクチャGPUにおける高速なSGEMVの実装

椋木大地、今村俊幸、高橋大介

GTC Japan 2014 2014.7.16

　More details

Event date： 2014.7

Language：Japanese Presentation type：Poster presentation
Linear Algebra Operations using Quadruple-precision Arithmetic on GPU International conference

Daichi Mukunoki, Daisuke Takahashi

GPU Technology Conference (GTC2014) 2014.3.24

　More details

Event date： 2014.3

Language：English Presentation type：Poster presentation
Linear Algebra Operations using Quadruple-precision Arithmetic on GPU International conference

Daichi Mukunoki, Daisuke Takahashi

GPU Technology Conference (GTC2014) 2014.3.24

　More details

Event date： 2014.3

Language：English Presentation type：Poster presentation
GPUにおける3倍精度演算と4倍精度疎行列反復解法

椋木大地、高橋大介

第3回多倍長精度計算フォーラム 2013.3.8

　More details

Event date： 2013.3

Language：Japanese Presentation type：Oral presentation (general)
GPUにおける3倍精度演算と4倍精度疎行列反復解法

椋木大地、高橋大介

第3回多倍長精度計算フォーラム 2013.3.8

　More details

Event date： 2013.3

Language：Japanese Presentation type：Oral presentation (general)
Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs International conference

Daichi Mukunoki, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE13) 2013.2.28

　More details

Event date： 2013.2

Language：English Presentation type：Oral presentation (general)
Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs International conference

Daichi Mukunoki, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE13) 2013.2.28

　More details

Event date： 2013.2

Language：English Presentation type：Oral presentation (general)
Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs International conference

Daichi Mukunoki, Daisuke Takahashi

Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12) 2012.5.7

　More details

Event date： 2012.5

Language：English Presentation type：Poster presentation
Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs International conference

Daichi Mukunoki, Daisuke Takahashi

Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12) 2012.5.7

　More details

Event date： 2012.5

Language：English Presentation type：Poster presentation
GPUによる4倍精度行列計算

椋木大地、高橋大介

2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） 2011.7.27

　More details

Event date： 2011.7

Language：Japanese Presentation type：Oral presentation (general)
GPUによる4倍精度行列計算

椋木大地、高橋大介

2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） 2011.7.27

　More details

Event date： 2011.7

Language：Japanese Presentation type：Oral presentation (general)
Exploring Multi-Agent Systems for HPC Code Development Invited International conference

Daichi Mukunoki

The 2nd International Workshop on Foundational Large Language Models Advances for HPC (in conjunction with ISC-HPC 2026) (LLM4HPC 2026) 2026.6.26

　More details

Language：English Presentation type：Oral presentation (general)

Venue：Hamburg Country：Germany

researchmap
HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution

林俊一郎、椋木大地、星野哲也、片桐孝洋

xSIG 2025 2025.8

　More details

Language：Japanese Presentation type：Poster presentation
GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化

湯淺義尚、小田昌宏、椋木大地、片桐孝洋、星野哲也、河合直聡、永井亨、森健策

第44回日本医用画像工学会大会（JAMIT 2025） 2025.8.28

　More details

Language：Japanese Presentation type：Poster presentation
LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性

林俊一郎、森田光貴、椋木大地、星野哲也、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025.10.20

　More details

Language：Japanese Presentation type：Poster presentation
LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望

椋木大地、林俊一郎、星野哲也、森田光貴、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025.10.20

　More details

Language：Japanese Presentation type：Poster presentation
Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer International conference

Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2017) 2017.6.20

　More details

Language：English Presentation type：Poster presentation
Automatic Generation of Full-Set Batched BLAS International conference

Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2018) 2018.6.26

　More details

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

SC19 research poster session 2019.11.19

　More details

Language：English Presentation type：Poster presentation
Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments International conference

Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat

ISC High Performance (ISC 2021) 2021.6.29

　More details

Language：English Presentation type：Poster presentation
Accurate Matrix Multiplication on Binary128 using Ozaki Scheme International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2021) 2021.6.29

　More details

Language：English Presentation type：Poster presentation
A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2022) 2022.6.1

　More details

Language：English Presentation type：Poster presentation
tmBLAS: a Mixed Precision BLAS by C++ Template International conference

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2023) 2023.5

　More details

Language：English Presentation type：Poster presentation
csDF: a double-float arithmetic library for the Cerebras CS-2 International conference

Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC25 research poster session 2025.11.16

　More details

Language：English Presentation type：Poster presentation
Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method International conference

Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs International conference

Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
Performance Evaluation of SVM with Multiple Quantum-inspired Annealers International conference

Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
A Multi Agent System for Local LLM-Based HPC Code Generation International conference

Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models International conference

Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning International conference

Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization International conference

Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes International conference

Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme International conference

Daichi Mukunoki

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026.1

　More details

Language：English Presentation type：Poster presentation
生成AIによるスーパーコンピュータのプログラム開発 ― HPC-GENIEプロジェクトの紹介 Invited

椋木大地

【第97回】大学等におけるオンライン教育とデジタル変革に関するサイバーシンポジウム「教育機関DXシンポ」 2026.3.16

　More details

Language：Japanese Presentation type：Oral presentation (general)

Country：Japan

researchmap
Multiple and Mixed Precision BLAS with C++ Template International conference

Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura

5th R-CCS International Symposium 2023.2.6

　More details

Language：English Presentation type：Poster presentation
binary128 に対する尾崎スキーム行列積

椋木大地、尾崎克久、荻田武史

第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020) 2020.11.28

　More details

Language：Japanese Presentation type：Oral presentation (general)
Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic International conference

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Rencontres Arithmétiques de l'Informatique Mathématique (RAIM) 2021.5

　More details

Language：English Presentation type：Oral presentation (general)
DGEMM using Tensor Cores International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SIAM Conference on Computational Science and Engineering (CSE21) 2021.3.4

　More details

Language：English Presentation type：Oral presentation (general)
精度自動チューニングに向けた基盤技術の検討

椋木大地

第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021) 2021.12.13

　More details

Language：Japanese Presentation type：Oral presentation (general)
Performance Evaluation of Batched BLAS on A64FX International conference

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

4th R-CCS International Symposium (lightning talk) 2022.2.7

　More details

Language：English Presentation type：Oral presentation (general)
行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用

尾崎克久、椋木大地、荻田武史

日本応用数理学会第18回研究部会連合発表会 2022.3.8

　More details

Language：Japanese Presentation type：Oral presentation (general)
Flying restart付きCG法に対する混合精度演算による近似解精度の向上

相原研輔、尾崎克久、椋木大地

日本応用数理学会第18回研究部会連合発表会 2022.3.9

　More details

Language：Japanese Presentation type：Oral presentation (general)
Impact and Contribution of Ozaki scheme in High Performance Computing International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022) 2022.3.15

　More details

Language：English Presentation type：Oral presentation (general)
A mixed-precision algorithm of the CG method using the group-wise update strategy International conference

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

The 41st JSST Annual International Conference on Simulation Technology (JSST2022) 2022.9.2

　More details

Language：English Presentation type：Oral presentation (general)
疎行列ベクトル積における低精度データ表現の導入について

椋木大地、河合直聡

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022.12.23

　More details

Language：Japanese Presentation type：Oral presentation (general)
Multiple- and Mixed-Precision BLAS with C++ Template International conference

Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023.8.24

　More details

Language：English Presentation type：Oral presentation (general)
Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications International conference

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023.8.21

　More details

Language：English Presentation type：Oral presentation (general)
LLMによるBLASコード生成に関する考察

椋木大地

第33回AT研究会オープンアカデミックセッション（ATOS33） 2025.7.28

　More details

Language：Japanese Presentation type：Oral presentation (general)
汎用LLMによるBLASコード自動生成能力の考察

椋木大地

第6回スーパーコンピュータ「不老」ユーザ会 2025.9.11

　More details

Language：Japanese Presentation type：Oral presentation (general)
Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI International conference

Daichi Mukunoki

The 6th "FugakuNEXT" Application Seminar 2025.9.25

　More details

Language：English Presentation type：Oral presentation (general)
生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望

林俊一郎、椋木大地

2025年度第2回物性アプリオープンフォーラム 2025.9.29

　More details

Language：Japanese Presentation type：Oral presentation (general)
Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI International conference

Daichi Mukunoki

58th ASE Seminar 2025.12.1

　More details

Language：English Presentation type：Oral presentation (general)
Automatic Generation of Numerical Codes for GPUs Using LLMs International conference

Daichi Mukunoki

JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing 2025.12.5

　More details

Language：English Presentation type：Oral presentation (general)
AI時代のハードウェアとFP64エミュレーション

椋木大地

第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025） 2025.12.23

　More details

Language：Japanese Presentation type：Oral presentation (general)
生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望

椋木大地

情報処理学会東海支部主催第6回講演会 2025.1.9

　More details

Language：Japanese Presentation type：Oral presentation (general)
生成AIの活用によるHPCコードGPU化の展望

椋木大地

「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会 2026.1.21

　More details

Language：Japanese Presentation type：Oral presentation (general)
高性能計算のためのコード生成AIエージェント開発

椋木大地

MateriAI 2025 〜計算物質科学分野におけるAI技術の活用 2026.2.2

　More details

Language：Japanese Presentation type：Oral presentation (general)
Toward Automatic Generation of High Performance Numerical Codes by LLMs International conference

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri

SIAM Conference on Parallel Processing for Scientific Computing (PP26) 2026.3

　More details

Language：English Presentation type：Oral presentation (general)
Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs International conference

Daichi Mukunoki, Daisuke Takahashi

Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12) 2012.5.7

　More details

Language：English Presentation type：Poster presentation
Linear Algebra Operations using Quadruple-precision Arithmetic on GPU International conference

Daichi Mukunoki, Daisuke Takahashi

GPU Technology Conference (GTC2014) 2014.3.24

　More details

Language：English Presentation type：Poster presentation
KeplerアーキテクチャGPUにおける高速なSGEMVの実装

椋木大地、今村俊幸、高橋大介

GTC Japan 2014 2014.7.16

　More details

Language：Japanese Presentation type：Poster presentation
疑似四倍精度拡張数学パッケージQP-Pack

今村俊幸、椋木大地、佐々成正、山田進、町田昌彦

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Language：Japanese Presentation type：Poster presentation
スーパコンピュータ京における倍々精度演算の高速化

佐々木信一、藤井昭宏、田中輝雄、椋木大地、今村俊幸

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Language：Japanese Presentation type：Poster presentation
Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装

椋木大地、今村俊幸、高橋大介

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015.1.26

　More details

Language：Japanese Presentation type：Poster presentation
High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

GPU Technology Conference (GTC 2015) 2015.3.17

　More details

Language：English Presentation type：Poster presentation
GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価

椋木大地、今村俊幸、高橋大介

GTC Japan 2015 2015.9.18

　More details

Language：Japanese Presentation type：Poster presentation
京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2015年度年会 2015.9.9

　More details

Language：Japanese Presentation type：Poster presentation
Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers International conference

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi

2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016) 2016.1.19

　More details

Language：English Presentation type：Poster presentation
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

The 6th AICS International Symposium 2016.2.22

　More details

Language：English Presentation type：Poster presentation
KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2016年度年会 2016.9.13

　More details

Language：Japanese Presentation type：Poster presentation
PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討

椋木大地、今村俊幸、高橋大介

GTC Japan 2016 2016.10.5

　More details

Language：Japanese Presentation type：Poster presentation
HPC分野における精度保証付き数値計算学の展開

荻田武史、椋木大地、尾崎克久

第3回CDMSI（ポスト「京」重点課題（７））シンポジウム 2017.12.5

　More details

Language：Japanese Presentation type：Poster presentation
Development of Scientific Numerical Libraries on post-K computer International conference

Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu

1st R-CCS International Symposium 2019.2.18

　More details

Language：English Presentation type：Poster presentation
OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

GPU Technology Conference (GTC 2019) 2019.3.17

　More details

Language：English Presentation type：Poster presentation
Accurate and Reproducible Linear Algebra Operations for Many-core Architectures International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Russian Supercomputing Days 2019 (RuSCDays 2019) 2019.9.23

　More details

Language：English Presentation type：Poster presentation
Reduced and Extended-Precision Computations on FPGAs and GPUs International conference

Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku

The 11th symposium on Discovery 2019.10.15

　More details

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications 2019.11.7

　More details

Language：English Presentation type：Poster presentation
Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations International conference

Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020.1.15

　More details

Language：English Presentation type：Poster presentation
Accurate DGEMM using Tensor Cores International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020.1.15

　More details

Language：English Presentation type：Poster presentation
An FPGA-based Matrix Multiplier with Task Parallelism International conference

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

2nd R-CCS International Symposium 2020.2.17

　More details

Language：English Presentation type：Poster presentation
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

2nd R-CCS International Symposium 2020.2.17

　More details

Language：English Presentation type：Poster presentation
High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

3rd R-CCS International Symposium 2021.2.15

　More details

Language：English Presentation type：Poster presentation
Remedies for Reproducibility Issue in Conjugate Gradient Solvers International conference

Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SparseDays2022 2022.6.20

　More details

Language：English Presentation type：Poster presentation
Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs International conference

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba 2022.10.14

　More details

Language：English Presentation type：Poster presentation
Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments International conference

Roman Iakymchuk, Daichi Mukunoki

Sparse Days Cerfacs 2020.11.24

　More details

Language：English Presentation type：Oral presentation (general)
GPUによる4倍精度行列計算

椋木大地、高橋大介

2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） 2011.7.27

　More details

Language：Japanese Presentation type：Oral presentation (general)
Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs International conference

Daichi Mukunoki, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE13) 2013.2.28

　More details

Language：English Presentation type：Oral presentation (general)
GPUにおける3倍精度演算と4倍精度疎行列反復解法

椋木大地、高橋大介

第3回多倍長精度計算フォーラム 2013.3.8

　More details

Language：Japanese Presentation type：Oral presentation (general)
Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016) 2016.2.19

　More details

Language：English Presentation type：Oral presentation (general)
Performance Evaluation of Verified Computation for Linear Systems on Supercomputer International conference

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi

SIAM: East Asian Section Conference (EASIAM 2016) 2016.6.20

　More details

Language：English Presentation type：Oral presentation (general)
Implementation Techniques for High Performance BLAS Kernels on Modern GPUs International conference

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE17) 2017.2.28

　More details

Language：English Presentation type：Oral presentation (general)
Reduced-/Extended-precision BLASの実装方法の検討

椋木大地、今村俊幸

Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017) 2017.3.27

　More details

Language：Japanese Presentation type：Oral presentation (general)
次世代計算機のための数値計算ライブラリの実装技術

椋木大地

日本応用数理学会三部会連携「応用数理セミナー」 2017.12.26

　More details

Language：Japanese Presentation type：Oral presentation (general)
Performance Analysis of 2.5D-PDGEMM on the K Computer International conference

Daichi Mukunoki, Toshiyuki Imamura

SIAM Conference on Parallel Processing for Scientific Computing (PP18) 2018.3.8

　More details

Language：English Presentation type：Oral presentation (general)
High-performance implementations of reproducible and accurate matrix-multiplication International conference

Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita

10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18) 2018.6.27

　More details

Language：English Presentation type：Oral presentation (general)
Accurate and cost-efficient triangular solve International conference

Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki

The 18th International Symposium on Scientific Computing 2018.9.11

　More details

Language：English Presentation type：Oral presentation (general)
High Performance Implementation of Accurate Matrix Multiplications on GPUs International conference

Daichi Mukunoki, Takeshi Ogita

The 18th International Symposium on Scientific Computing 2018.9.11

　More details

Language：English Presentation type：Oral presentation (general)
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Computational Reproducibility at Exascale 2018 (CRE2018) 2018.11.11

　More details

Language：English Presentation type：Oral presentation (general)
尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価

椋木大地、荻田武史、尾崎克久

第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018) 2018.12.2

　More details

Language：Japanese Presentation type：Oral presentation (general)
尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用

椋木大地

第22回AT研究会オープンアカデミックセッション（ATOS22） 2019.5.13

　More details

Language：Japanese Presentation type：Oral presentation (general)
High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs International conference

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June) 2019.6.7

　More details

Language：English Presentation type：Oral presentation (general)
Accurate and Reproducible CG Method on GPUs International conference

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019) 2019.10.1

　More details

Language：English Presentation type：Oral presentation (general)
尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用

椋木大地、荻田武史、尾崎克久

第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019) 2019.12.1

　More details

Language：Japanese Presentation type：Oral presentation (general)
High-performance Implementations of Accurate Linear Algebra Kernels on GPUs International conference

Daichi Mukunoki, Takeshi Ogita

3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST) 2020.1.9

　More details

Language：English Presentation type：Oral presentation (general)
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki

Sapporo Winter HPC Seminar 2020 2020.1.24

　More details

Language：English Presentation type：Oral presentation (general)
Accurate BLAS implementations: OzBLAS and BLAS-DOT2 International conference

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January) 2020.1.30

　More details

Language：English Presentation type：Oral presentation (general)
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations International conference

Daichi Mukunoki

SIAM Conference on Parallel Processing for Scientific Computing (PP20) 2020.2.15

　More details

Language：English Presentation type：Oral presentation (general)
DGEMM using Tensor Cores and OzBLAS International conference

Daichi Mukunoki

11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop 2020.9.8

　More details

Language：English Presentation type：Oral presentation (general)

▼display all

To the head of Presentations.▲

KAKENHI (Grants-in-Aid for Scientific Research) 7

Application of High Precision Operation Techniques for Accelerating Scientific Computations on AI Supercomputers

Grant number：25K24387 2025.7 - 2027.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Research Activity Start-up

　 More details

researchmap
Development of Accurate and Validated Matrix Computation Software for Next Generation Supercomputers

Grant number：20KK0259 2022.4 - 2023.10

Japan Society for the Promotion of Science (JSPS) Fund for the Promotion of Joint International Research (Fostering Joint International Research (A)) Fund for the Promotion of Joint International Research (Fostering Joint International Research (A))

Daichi Mukunoki

　 More details

Authorship：Principal investigator

Grant amount：\9230000 （ Direct Cost: \7100000 、 Indirect Cost：\2130000 ）

researchmap
Development of accurate and reproducible matrix computation library for massively parallel environments

Grant number：19K20286 2019.4 - 2022.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Early-Career Scientists

Mukunoki Daichi

　 More details

In this study, we developed the Basic Linear Algebra Subprograms (BLAS) for massively parallel architectures, which is accurate and can ensure reproducibility of computation results among different environments. Focusing mainly on the Ozaki scheme, we have developed a high-performance implementation of accurate and reproducible BLAS routines, and demonstrated its application to sparse iterative solvers on CPUs and GPUs. As further applications, we proposed an implementation of a single/double precision matrix multiplications using low-precision arithmetic units (Tensor Cores) and a binary128 matrix multiplication using single/double precision matrix multiplications.

researchmap
Reduced-precision formats for high-performance and energy-efficient computations

Grant number：16K16062 2016.4 - 2019.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (B) Grant-in-Aid for Young Scientists (B)

Mukunoki Daichi

　 More details

This study explored the possibility of reduced-precision formats which have shorter bit length against the IEEE 32/64 bit floating-point for enchance the performance of numerical computations in terms of both computation speed and energy efficiency. We proposed a light-weight implementation of reduced-precision formats on software and demonstrated the performance improvement, in terms of both speed and energy efficiency, on some data-intensive operations on basic linear algebra.

researchmap
ＧＰＵスパコンのための３倍・４倍精度線形演算ライブラリの開発に関する研究

Grant number：13J01290 2013.4 - 2015.3

日本学術振興会科学研究費助成事業特別研究員奨励費特別研究員奨励費

椋木大地

　 More details

本研究の目的は，GPUスパコンにおける3倍・4倍精度演算の実用化を目的として，GPUにおける高性能な3倍・4倍精度線形計算ライブラリの実現に向けた基礎研究を行うことであった．本年度は主として，GPUにおける複数の演算精度に対応した線形計算ライブラリの効率的な実装手法に関する研究を行った，その結果として，複数のNVIDIA GPUアーキテクチャに対応した高速な行列ベクトル積ルーチン（GEMV）の実装手法を開発した．本実装ではGPUにおけるプログラムの実行メカニズムをモデル化し，実行効率が最大となるようなスレッドブロックサイズを自動的に決定するオンライン自動チューニングを採用する．これにより既存の実装と比べ，実行環境や問題サイズに依存して生じる性能の変動を防ぎ，常に高い性能を維持できる．本手法は，ある線形計算を行うプログラム（例えばBLASルーチンなど）において演算精度が異なる複数バージョンを実装・最適化する上で有効であると考えられる．またこの他に，4倍精度演算手法の応用として，倍精度演算性能が単精度演算性能の1/32であるNVIDIA社の最新GPUにおいて，ソフトウェアエミュレーションによる疑似倍精度演算を実装し，倍精度行列積ルーチン（DGEMM）においてハードウェア処理による実装を上回る性能が得られることを示した．本年度に開発したGPU向けソフトウェアの一部は，オープンソースのライブラリとしてウェブ上で公開しており，今後も開発を継続する予定である．

researchmap
Research on high-performance and high-dimensional numerical linear algebra applying an asynchronous task mechanism on the exascale computing era

Grant number：19H04127 2019.4 - 2022.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

Imamura Toshiyuki

　 More details

Authorship：Coinvestigator(s)

The main objective of this research project is to study asynchronous numerical algorithms and task technologies to improve system execution efficiency in the exascale era and to realize a development framework for high-performance numerical software that is sustainable in the future. To address this issue, we will investigate existing compiler runtime technologies, identify problems related to conditional task invocation and dynamic processing of dependencies that are necessary to realize numerical algorithms and incorporate them into actual numerical libraries to achieve results that contribute not only to execution speed but also to utilization efficiency. As a result, we identified issues related to the next generation of mixed-precision computation technology.

researchmap
Theory and Application of Scalable Numerical Software on an O(100M) core environment

Grant number：15H02709 2015.4 - 2018.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B) Grant-in-Aid for Scientific Research (B)

IMAMURA Toshiyuki, YAMAMOTO Yusaku, Todo Shinji

　 More details

This research project aims to realize high performance numerical services investigated in the past based on new mathematical principles in the emerging computing system where tens of thousands to hundreds of millions of processing cores are installed. Giving two important themes, `Mixed-granularity numerical kernel' and `Asynchronous numerical algorithm,' we conducted; i) the research on the theory of asynchronous numerical algorithms. Also avoidance of communication and synchronization at a practical level, then CAHTR and a new method for the FDTD scheme were proposed. Furthermore, we have practiced; ii) promoting research on core numerical infrastructure technologies such as automatic tuning for scalable, lightweight code generation at super-many-core, and promoting innovative research leading to the next generation numerical calculation software.

researchmap

▼display all

To the head of KAKENHI (Grants-in-Aid for Scientific Research).▲

Teaching Experience (On-campus) 3

Numerical Analysis

2025
コンピュータ科学実験a

2025
コンピュータ科学実験b

2025

To the head of Teaching Experience (On-campus).▲

Teaching Experience (Off-campus) 1

情報処理技法（リテラシ）II

2018.9 - 2019.1 （Tokyo Woman's Christian University）

　More details

researchmap

To the head of Teaching Experience (Off-campus).▲