研究者詳細 - 椋木　大地

2026/06/17 更新

　基本情報

　研究活動

　教育活動

　社会貢献

業績はありません

2026/06/17 更新

写真a

ムクノキ　ダイチ

椋木　大地

MUKUNOKI Daichi

所属

情報基盤センター情報基盤デザイン開発部門助教

大学院担当

大学院情報学研究科

連絡先

メールアドレス

ホームページ

https://mukunoki.github.io/

外部リンク

学位 3

博士（工学）（ 2013年11月筑波大学）
修士（工学）（ 2011年3月筑波大学）
学士（図書館情報学）（ 2009年3月筑波大学）

学位の先頭へ▲

研究キーワード 9

高性能計算
高精度計算
自動チューニング
数値計算
再現可能な計算
並列計算
GPUコンピューティング
混合精度計算
大規模言語モデル

研究キーワードの先頭へ▲

研究分野 2

情報通信 / 高性能計算
情報通信 / 計算機システム

研究分野の先頭へ▲

経歴 19

名古屋大学情報基盤センター情報基盤デザイン開発部門助教

2025年4月 - 現在

　詳細を見る

国名：日本国

researchmap
名古屋大学情報基盤センター特任助教

2024年12月 - 2025年3月

　詳細を見る

国名：日本国

researchmap
芝浦工業大学システム理工学部数理科学科臨時技術職員

2024年4月 - 2024年10月

　詳細を見る

国名：日本国

researchmap
株式会社ソニー・インタラクティブエンタテインメント基盤システム・エクスペリエンス設計本部 G部門 2部 7課

2023年11月 - 2024年2月

　詳細を見る

国名：日本国

researchmap
東京大学情報基盤センター客員研究員

2021年11月 - 2023年3月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究センター大規模並列数値計算技術研究チーム研究員

2019年4月 - 2023年10月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究センターフラッグシップ2020プロジェクトアーキテクチャ開発チーム研究員

2019年4月 - 2021年3月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究センター大規模並列数値計算技術研究チーム客員研究員

2018年4月 - 2019年3月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究センターフラッグシップ2020プロジェクトアーキテクチャ開発チーム客員研究員

2018年4月 - 2019年3月

　詳細を見る

国名：日本国

researchmap
東京女子大学大学院理学研究科博士後期課程数学専攻特任研究員

2017年10月 - 2019年3月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究機構フラッグシップ2020プロジェクトアーキテクチャ開発チーム客員研究員

2017年10月 - 2018年3月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究機構研究部門大規模並列数値計算技術研究チーム客員研究員

2017年10月 - 2018年3月

　詳細を見る

国名：日本国

researchmap
特定国立研究開発法人理化学研究所計算科学研究機構フラッグシップ2020プロジェクトアーキテクチャ開発チーム特別研究員

2017年4月 - 2017年9月

　詳細を見る

国名：日本国

researchmap
独立行政法人理化学研究所計算科学研究機構フラッグシップ2020プロジェクトコデザイン推進チーム特別研究員

2016年4月 - 2017年3月

　詳細を見る

国名：日本国

researchmap
独立行政法人理化学研究所計算科学研究機構エクサスケールコンピューティング開発プロジェクトコデザイン推進チーム特別研究員

2015年5月 - 2016年3月

　詳細を見る

国名：日本国

researchmap
独立行政法人理化学研究所計算科学研究機構研究部門大規模並列数値計算技術研究チーム特別研究員

2014年6月 - 2017年9月

　詳細を見る

国名：日本国

researchmap
独立行政法人日本学術振興会特別研究員（PD）

2013年12月 - 2014年5月

　詳細を見る

国名：日本国

researchmap
独立行政法人日本学術振興会特別研究員（DC2）

2013年4月 - 2013年11月

　詳細を見る

国名：日本国

researchmap
名古屋大学情報基盤センター助教

2025年11月 - 現在

　詳細を見る

researchmap

▼全件表示

経歴の先頭へ▲

学歴 4

筑波大学システム情報工学研究科コンピュータサイエンス専攻

2011年4月 - 2013年11月

　詳細を見る

国名：日本国

備考：博士後期課程

researchmap
筑波大学システム情報工学研究科コンピュータサイエンス専攻

2009年4月 - 2011年3月

　詳細を見る

国名：日本国

備考：博士前期課程

researchmap
筑波大学図書館情報専門学群

2006年4月 - 2009年3月

　詳細を見る

国名：日本国

researchmap
岐阜工業高等専門学校電子制御工学科

2001年4月 - 2006年3月

　詳細を見る

国名：日本国

researchmap

学歴の先頭へ▲

所属学協会 4

日本医用画像工学会

2025年8月 - 現在

　詳細を見る

researchmap
Association for Computing Machinery (ACM)

2025年 - 現在

　詳細を見る

researchmap
情報処理学会

2008年 - 現在

　詳細を見る

researchmap
自動チューニング研究会

　詳細を見る

researchmap

所属学協会の先頭へ▲

委員歴 40

The 1st International Workshop on Agentic AI for HPC (AgenticAI4HPC 2026) Co-Chair

2026年

　詳細を見る

団体区分：学協会

researchmap
The 15th International Conference on Parallel Processing & Applied Mathematics (PPAM 2024) Program Committee Member

2024年

　詳細を見る

researchmap
Mini Symposium: Exploring Arithmetic and Data Representation Beyond the Standard in HPC (at ICIAM 2023) Mini-Symposium Organizer

2023年

　詳細を見る

researchmap
The 24th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2023) (in conjunction with IPDPS 2023) Program Committee Member

2023年

　詳細を見る

researchmap
2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2023) Program Committee Member

2023年

　詳細を見る

researchmap
Special Session: Performance Optimization and Auto-Tuning of Software on Multicore/Manycore Systems (POAT 2023) (in conjunction with MCSoC-2023) Program Chair

2023年

　詳細を見る

団体区分：学協会

researchmap
The 22nd International Conference on Computational Science (ICCS 2022) Program Committee Member

2022年

　詳細を見る

researchmap
The International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2022) Publicity Chair

2022年

　詳細を見る

researchmap
36th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2022) Program Committee Member (Algorithm track)

2022年

　詳細を見る

researchmap
自動チューニング研究会幹事（交流促進委員会）

2021年 - 2023年

　詳細を見る

団体区分：学協会

researchmap
情報処理学会論文誌コンピューティングシステム編集委員

2020年 - 2024年

　詳細を見る

団体区分：学協会

researchmap
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC20) Research Poster Committee Member

2020年

　詳細を見る

researchmap
The 4th International Workshop on GPU Computing and AI (GCA'19) (in conjunction with CANDAR'19) Program Committee Member

2019年

　詳細を見る

researchmap
The Fourteenth International Workshop on Automatic Performance Tuning (iWAPT2019) (in conjunction with IPDPS 2019) Program Committee Member

2019年

　詳細を見る

researchmap
The 16th International Conference on Parallel Processing & Applied Mathematics (PPAM 2026) Program Committee Member

2026年9月

　詳細を見る

団体区分：学協会

researchmap
The 2nd International Workshop on Foundational Large Language Models Advances for HPC (LLM4HPC 2026) Program Committee Member

2026年6月

　詳細を見る

団体区分：学協会

researchmap
The International Conference on High Performance Computing in Asia-Pacific Region 2026 (HPCAsia2026) Poster Chair

2026年1月

　詳細を見る

団体区分：学協会

researchmap
The 28th Workshop on Advances in Parallel and Distributed Computational Models (APDCM2026) Program Committee Member

2026年

　詳細を見る

団体区分：学協会

researchmap
自動チューニング研究会研究推進委員

2025年 - 現在

　詳細を見る

researchmap
The 14th International Conference on Parallel Processing & Applied Mathematics (PPAM 2022) Program Committee Member

2022年

　詳細を見る

researchmap
Special Session: Auto-Tuning for Multicore and GPU (ATMG2022) (in conjunction with MCSoC-2022) Program Chair

2022年

　詳細を見る

researchmap
IEEE 22nd International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2021) (in conjunction with IPDPS 2021) Program Committee Member

2021年

　詳細を見る

researchmap
Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020 January) Program Committee Member

2020年

　詳細を見る

researchmap
The 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020) (in conjunction with IPDPS 2020) Program Committee Member

2020年

　詳細を見る

researchmap
2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2019) Program Committee Member

2019年

　詳細を見る

researchmap
The 20th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2019) (in conjunction with IPDPS 2019) Program Committee Member

2019年

　詳細を見る

researchmap
Mini Symposium: Development of Numerical Computing Software on Emerging Computing Platforms (at SIAM PP 18) Mini-Symposium Organizer

2018年

　詳細を見る

researchmap
2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-2018) Program Committee Member

2018年

　詳細を見る

researchmap
The 19th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2018) (in conjunction with IPDPS 2018) Program Committee Member

2018年

　詳細を見る

researchmap
The Third International Workshop on GPU Computing and AI (GCA'18) (in conjunction with CANDAR'18) Program Committee Member

2018年

　詳細を見る

researchmap
The Thirteenth International Workshop on Automatic Performance Tuning (iWAPT2018) (in conjunction with IPDPS 2018) Program Committee Member

2018年

　詳細を見る

researchmap
Special Session: Auto-Tuning for Multicore and GPU (ATMG 2018) (in conjunction with MCSoC-2018) Program Committee Member

2018年

　詳細を見る

researchmap
The Second International Workshop on GPU Computing and AI (GCA'17) (in conjunction with CANDAR'17) Program Committee Member

2017年

　詳細を見る

researchmap
Special Session: Auto-Tuning for Multicore and GPU (ATMG 2017) (in conjunction with MCSoC-17) Program Committee Member

2017年

　詳細を見る

researchmap
The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017) (in conjunction with IPDPS 2017) Program Committee Member

2017年

　詳細を見る

researchmap
The Twelfth International Workshop on Automatic Performance Tuning (iWAPT2017) (in conjunction with IPDPS 2017) Program Committee Member

2017年

　詳細を見る

researchmap
The First International Workshop on GPU Computing and Applications (GCA'16) (in conjunction with CANDAR'16) Program Committee Member

2016年

　詳細を見る

researchmap
The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016) (in conjunction with IPDPS 2016) Program Committee Member

2016年

　詳細を見る

researchmap
The 16th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2015) (in conjunction with IPDPS 2015) Program Committee Member

2015年

　詳細を見る

researchmap
The 15th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2014) (in conjunction with IPDPS 2014) Program Committee Member

2014年

　詳細を見る

researchmap

▼全件表示

委員歴の先頭へ▲

受賞 10

Best Paper Award

2026年1月 The 1st International Workshop on Foundational Large Language Models Advances for HPC in Asia (LLM4HPCAsia 2026) Evaluating Claude Code's Coding and Test Automation for GPU Acceleration of a Legacy Fortran Application: A GeoFEM Case Study

Tetsuya Hoshino, Shun-Ichiro Hayashi, Daichi Mukunoki, Takahiro Katagiri, Toshihiro Hanawa

　詳細を見る

researchmap
Best Paper Award

2023年12月 6th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2023) Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞

researchmap
Research Poster Award 2nd Place Winner

2022年6月 ISC High Performance 2022 A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

　詳細を見る

researchmap
2021年度理化学研究所桜舞賞

2022年3月理化学研究所 Precision-awareな数値演算手法の研究

　詳細を見る

researchmap
Research Poster Award

2021年6月 ISC High Performance 2021 Accurate Matrix Multiplication on Binary128 using Ozaki Scheme

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

　詳細を見る

researchmap
Best Research Poster Award

2019年9月 Russian Supercomputing Days Accurate and Reproducible Linear Algebra Operations for Many-core Architectures

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

　詳細を見る

researchmap
PRACE-ISC Research Poster Award 2017

2017年6月 ISC High Performance 2017 Implementation & Evaluation of 2.5D Matrix Multiplication on K Computer

Daichi Mukunoki, Toshiyuki Imamura

　詳細を見る

researchmap
2016年度山下記念研究賞

2016年情報処理学会 NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法

椋木大地

　詳細を見る

researchmap
2013年度コンピュータサイエンス領域奨励賞

2013年情報処理学会 GPUにおける高速なCRS形式疎行列ベクトル積の実装

椋木大地

　詳細を見る

researchmap
若手奨励賞

2013年情報処理学会計算機アーキテクチャ研究会 GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価

椋木大地

　詳細を見る

researchmap

▼全件表示

受賞の先頭へ▲

論文 87

Performance Evaluation of Loop Body Splitting for Fast Modal Filtering in SCALE-DG on A64FX 査読有り Open Access

Xuanzhengbo Ren, Yuta Kawai, Hirofumi Tomita, Seiya Nishizawa, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai, Toru Nagai

Proceedings of the 2025 International Conference on High Performance Computing in Asia-Pacific Region Workshops 頁： 36 - 44 2025年2月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ACM

DOI： 10.1145/3703001.3724385

Open Access

researchmap
Performance evaluation and modelling of single-precision matrix multiplication on Cerebras CS-2 査読有り

Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis 頁： 727 - 731 2024年11月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/scw63240.2024.00101

researchmap
Evaluating Claude Code's Coding and Test Automation for GPU Acceleration ofa Legacy Fortran Application: A GeoFEM Case Study. 査読有り

Tetsuya Hoshino, Shun-ichiro Hayashi, Daichi Mukunoki, Takahiro Katagiri, Toshihiro Hanawa

Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - The 1st International Workshop on Foundational Large Language Models Advances for HPC in Asia (LLM4HPCAsia 2026) 頁： 353 - 360 2026年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1145/3784828.3785335

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2026w.html#HoshinoHMKH26
Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms. 査読有り

Daichi Mukunoki, Katsuhisa Ozaki

Proc. 2025 IEEE 18th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC 2025) 頁： 33 - 40 2025年12月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/MCSoC67473.2025.00016

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/mcsoc/mcsoc2025.html#MukunokiO25
DGEMM without FP64 Arithmetic - Using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme 査読有り Open Access

Daichi Mukunoki

Proc. the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) - ExHET'26: The Fifth International Workshop on Extreme Heterogeneity Solutions abs/2508.00441 巻頁： 303 - 311 2025年8月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

As the demand for AI computation rapidly increases, more hardware is being developed to efficiently perform the low-precision matrix multiplications required by such workloads. However, these operations are generally not directly applicable to scientific computations due to accuracy requirements. The Ozaki scheme - an accurate matrix multiplication method proposed by Ozaki et al. in 2012 - enables FP64 matrix multiplication (DGEMM) using low-precision matrix multiplication units, such as FP16 Tensor Cores. This approach has since been extended to utilize integer arithmetic, offering lower computational cost compared to floating-point-based implementations. In fact, it has achieved higher performance than hardware FP64 operations on GPUs equipped with fast INT8 Tensor Cores designed for AI workloads. However, recent AI-oriented processors trends have shifted toward improving the performance of low-precision floating-point operations, such as FP8, rather than integer operations. Motivated by this shift, this study revisits the use of low-precision floating-point operations in the Ozaki scheme. Specifically, we explore the use of FP8 Tensor Cores. In addition, for processors that support very slow or no hardware-based FP64 operations, we also consider FP64 arithmetic emulation based on integer arithmetic. This completely eliminates hardware FP64 instructions. Furthermore, we explore the use of blocking in the inner-product dimension to accelerate FP16-based implementations. We demonstrate the effectiveness of these methods by evaluating the performance on an NVIDIA RTX Blackwell architecture GPU.

DOI： 10.1145/3784828.3785017

Open Access

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2508.00441v3
An Algorithm Portfolio Approach for Parameter Tuning in Coherent Ising Machines. 査読有り

Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

Proc. 2025 Thirteenth International Symposium on Computing and Networking Workshops (CANDARW) - 17th International Workshop on Parallel and Distributed Algorithms and Applications (PDAA 2025) 頁： 142 - 148 2025年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/CANDARW68385.2025.00032

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/candar/candar2025w.html#HanyuKMH25
Extension of accurate numerical algorithms for matrix multiplication based on error-free transformation 査読有り

Katsuhisa Ozaki, Daichi Mukunoki, Takeshi Ogita

Japan Journal of Industrial and Applied Mathematics 42 巻 ( 1 ) 頁： 1 - 20 2024年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：Springer Science and Business Media LLC

DOI： 10.1007/s13160-024-00677-z

researchmap

その他リンク： https://link.springer.com/article/10.1007/s13160-024-00677-z/fulltext.html
Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix–Vector Product 査読有り Open Access

Stef Graillat, Fabienne Jézéquel, Theo Mary, Roméo Molina, Daichi Mukunoki

Lecture Notes in Computer Science 14803 巻頁： 17 - 30 2024年8月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer Nature Switzerland

DOI： 10.1007/978-3-031-69583-4_2

researchmap
Mixed-precision conjugate gradient algorithm using the groupwise update strategy 査読有り

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

Japan Journal of Industrial and Applied Mathematics 2024年2月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：Springer Science and Business Media LLC

DOI： 10.1007/s13160-024-00644-8

researchmap

その他リンク： https://link.springer.com/article/10.1007/s13160-024-00644-8/fulltext.html
Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor 査読有り

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 頁： 608 - 615 2023年12月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/mcsoc60832.2023.00094

researchmap
Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors 査読有り

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Parallel Processing and Applied Mathematics 頁： 40 - 54 2023年4月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer International Publishing

DOI： 10.1007/978-3-031-30442-2_4

researchmap
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs 査読有り

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

Proc. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) 頁： 234 - 241 2021年12月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

researchmap
A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow. 査読有り

Takeyuki Harayama, Shuhei Kudo, Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

Proc. The 2021 International Conference on Computational Science and Its Applications (ICCSA 2021), Lecture Notes in Computer Science 12949 巻頁： 95 - 110 2021年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-030-86653-2_7

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/iccsa/iccsa2021-1.html#HarayamaKMIT21
Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme 査読有り Open Access

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Proc. The 50th International Conference on Parallel Processing (ICPP-2021) ( 78 ) 頁： 1 - 11 2021年8月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1145/3472456.3472493

Open Access

researchmap
Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme. 査読有り Open Access

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Roman Iakymchuk

Proc. The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2021) 頁： 100 - 109 2021年1月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ACM

DOI： 10.1145/3432261.3432270

Open Access

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2021.html#MukunokiOOI21
Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results? 査読有り

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Proc. 13th International Workshop on Numerical Software Verification 2020 (NSV 20), Lecture Notes in Computer Science 12549 巻頁： 163 - 177 2020年12月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-030-63618-0_10

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/vstte/vstte2020.html#JezequelGMII20
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws? 査読有り

Jens Domke, Emil Vatai, Aleksandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka

Proc. 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021) 頁： 1056 - 1065 2020年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Matrix engines or units, in different forms and affinities, are becoming a reality in modern processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep Learning merits the commercial investments in these units, and deduced from the No.1 benchmark in supercomputing, namely High Performance Linpack, one would expect an awakened enthusiasm by the HPC community, too.
Hence, our goal is to identify the practical added benefits for HPC and machine learning applications by having access to matrix engines. For this purpose, we perform an in-depth survey of software stacks, proxy applications and benchmarks, and historical batch job records. We provide a cost-benefit analysis of matrix engines, both asymptotically and in conjunction with state-of-the-art processors. While our empirical data will temper the enthusiasm, we also outline opportunities to misuse these dense matrix-multiplication engines if they come for free.

DOI： 10.1109/IPDPS49936.2021.00114

arXiv

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ipps/ipdps2021.html#DomkeVDCO0SMPWM21
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs. 査読有り

Daichi Mukunoki, Takeshi Ogita

J. Comput. Appl. Math. 372 巻頁： 112701 - 112701 2020年7月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：Elsevier {BV}

DOI： 10.1016/j.cam.2019.112701

researchmap
DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions 査読有り Open Access

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Proc. ISC High Performance 2020, Lecture Notes in Computer Science 12151 巻頁： 230 - 248 2020年6月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-030-50743-5_12

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/supercomputer/isc2020.html#MukunokiOOI20
Design of an FPGA-Based Matrix Multiplier with Task Parallelism. 査読有り Open Access

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

Proc. International Conference on Parallel Computing (ParCo2019), Parallel Computing: Technology Trends 36 巻頁： 241 - 250 2019年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IOS Press

DOI： 10.3233/APC200047

Open Access

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/parco/parco2019.html#TanIM19
Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures. 査読有り

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Proc. 13th International Conference on Parallel Processing and Applied Mathematics (PPAM2019), Lecture Notes in Computer Science 12043 巻頁： 516 - 527 2019年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-030-43229-4_44

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ppam/ppam2019-1.html#MukunokiOO19
Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster. 査読有り

Daichi Mukunoki, Toshiyuki Imamura

Proc. International Conference on Computational Science (ICCS 2018), Lecture Notes in Computer Science 10862 巻頁： 853 - 858 2018年6月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-319-93713-7_85

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/iccS/iccS2018-3.html#MukunokiI18
Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats. 査読有り

Toshiyuki Imamura, Daichi Mukunoki, Yusuke Hirota, Susumu Yamada, Masahiko Machida

Proc. International Conference on Parallel Computing (ParCo2017), Advances in Parallel Computing 頁： 97 - 106 2017年9月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IOS Press

DOI： 10.3233/978-1-61499-843-3-97

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/parco/parco2017.html#ImamuraMHYM17
Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer. 査読有り

Daichi Mukunoki, Toshiyuki Imamura

Proc. 12th International Conference on Parallel Processing and Applied Mathematics (PPAM2017), Lecture Notes in Computer Science 10777 巻頁： 348 - 358 2017年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-319-78024-5_31

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ppam/ppam2017-1.html#MukunokiI17
Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs. 査読有り

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

Proc. IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16) 頁： 377 - 384 2016年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

DOI： 10.1109/MCSoC.2016.32

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/mcsoc/mcsoc2016.html#MukunokiIT16
Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation. 査読有り

Daichi Mukunoki, Toshiyuki Imamura

Proc. IEEE International Conference on Cluster Computing (Cluster 2016) 頁： 144 - 145 2016年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

DOI： 10.1109/CLUSTER.2016.77

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/cluster/cluster2016.html#MukunokiI16
Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs. 査読有り

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

Proc. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2015) 頁： 642 - 650 2015年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

DOI： 10.1109/PDP.2015.66

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/pdp/pdp2015.html#MukunokiIT15
GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価査読有り Open Access

椋木大地, 高橋大介

情報処理学会論文誌コンピューティングシステム（ACS） 6 巻 ( 1 ) 頁： 66 - 77 2013年1月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：日本語出版者・発行元：情報処理学会

本論文では GPU において 3 倍・ 4 倍精度浮動小数点演算を実現し，線形計算への適用例として Level 1-3 の代表的な BLAS （Basic Linear Algebra Subprograms）ルーチンである AXPY， GEMV， GEMM を実装して性能評価を行った結果を示す． 4 倍精度演算には Double-Double 型（DD型）の 4 倍精度演算（DD演算）を用いた．一方で 3 倍精度演算として新たに， Double+Single 型（D+S型）・Double+Int 型（D+I型）の 3 倍精度フォーマットを提案し，内部の計算に DD 演算を用いることで 3 倍精度演算を行う手法を実装した． NVIDIA Tesla M2090 における性能評価では， 3 倍・ 4 倍精度の AXPY・GEMV がメモリ律速となり，その実行時間はデータサイズに比例して，単精度ルーチンに対しておよそ 3 倍， 4 倍となることを示した．我々が提案した 3 倍精度演算は， 3 倍精度データに対する DD 演算がメモリ律速となるケースにおいて， 4 倍精度演算に対する速度面での利点が主張できる． 4 倍精度は必要ないが倍精度では精度が不足する場合では，特に PCI Express やネットワークの帯域が性能のボトルネックとなりやすい GPU クラスタ環境などで， 4 倍精度に対する 3 倍精度の有効性が期待できる．We have implemented triple and quadruple precision floating-point operations on GPUs. As an example of the application of linear algebra operations, we have implemented triple and quadruple precision subroutines of the Basic Linear Algebra Subprograms (BLAS), AXPY, GEMV and GEMM, and evaluated their performance. For quadruple precision, we used Double-Double (DD) type quadruple precision operations (DD-operations). On the other hand, in our research we are proposing Double+Single (D+S) and Double+Int (D+I) type triple precision floating-point formats and triple precision operations that use DD-operations internally. On an NVIDIA Tesla M2090, the triple and quadruple precision AXPY and GEMV are memory-bound. Therefore, the execution time of the triple and quadruple precision operations is approximately 3x and 4x that of the single precision, respectively. Our triple precision operations have the advantage of speed compared to quadruple precision, in cases where the triple precision operations are memory-bound. In cases where quadruple precision is not required, but double precision is insufficient, we predict that our triple precision operations will perform well, especially in environments such as GPU clusters where the bandwidth of the PCI Express and the network may become bottlenecks.

Open Access

CiNii Research

researchmap

その他リンク： http://id.nii.ac.jp/1001/00089921/
Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs. 査読有り

Daichi Mukunoki, Daisuke Takahashi

Proc. 13th International Conference on Computational Science and Its Applications (ICCSA 2013), Part V, Lecture Notes in Computer Science 7975 巻頁： 211 - 223 2013年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-642-39640-3_15

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/iccsa/iccsa2013-5.html#MukunokiT13
Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs. 査読有り

Daichi Mukunoki, Daisuke Takahashi

Proc. 10th International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Part I, Workshop on Numerical Algorithms on Hybrid Architectures, Lecture Notes in Computer Science 8384 巻頁： 632 - 642 2013年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-642-55224-3_59

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ppam/ppam2013-1.html#MukunokiT13
Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs. 査読有り

Daichi Mukunoki, Daisuke Takahashi

Proc. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012), The 13th Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-12) 頁： 1378 - 1386 2012年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

DOI： 10.1109/IPDPSW.2012.175

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/ipps/ipdps2012w.html#MukunokiT12
GPUによる4倍・8倍精度BLASの実装と評価査読有り Open Access

椋木大地, 高橋大介

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集 2011 巻 ( 2011 ) 頁： 148 - 156 2011年1月

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：日本語

Open Access

CiNii Research

researchmap
Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs. 査読有り

Daichi Mukunoki, Daisuke Takahashi

Proc. 10th International Conference on Applied Parallel and Scientific Computing (PARA 2010), Part I, Lecture Notes in Computer Science 7133 巻頁： 249 - 259 2010年

　詳細を見る

担当区分：筆頭著者,　責任著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer

DOI： 10.1007/978-3-642-28151-8_25

researchmap

その他リンク： https://dblp.uni-trier.de/db/conf/para/para2010-1.html#MukunokiT10
Improving HPC Code Generation Capability of LLMs via Online Reinforcement Learning with Real-Machine Benchmark Rewards

Ryo Mikasa, Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

2026年2月

　詳細を見る

Large language models (LLMs) have demonstrated strong code generation capabilities, yet the runtime performance of generated code is not guaranteed, and there have been few attempts to train LLMs using runtime performance as a reward in the HPC domain. We propose an online reinforcement learning approach that executes LLM-generated code on a supercomputer and directly feeds back the measured runtime performance (GFLOPS) as a reward. We further introduce a Staged Quality-Diversity (SQD) algorithm that progressively varies the permitted optimization techniques on a per-problem basis, enabling the model to learn code optimization from diverse perspectives. We build a distributed system connecting a GPU training cluster with a CPU benchmarking cluster, and train Qwen2.5 Coder 14B on a double-precision matrix multiplication task using Group Relative Policy Optimization (GRPO). Through two experiments, we show that reinforcement learning combining runtime performance feedback with staged optimization can improve the HPC code generation capability of LLMs.

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2602.12049v1
Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM

Xuanzhengbo Ren, Yuta Kawai, Tetsuya Hoshino, Hirofumi Tomita, Takahiro Katagiri, Daichi Mukunoki, Seiya Nishizawa

2026年1月

　詳細を見る

Accurate performance prediction is essential for optimizing scientific applications on modern high-performance computing (HPC) architectures. Widely used performance models primarily focus on cache and memory bandwidth, which is suitable for many memory-bound workloads. However, it is unsuitable for highly arithmetic intensive cases such as the sum-factorization with tensor $n$-mode product kernels, which are an optimization technique for high-order finite element methods (FEM). On processors with relatively high single instruction multiple data (SIMD) instruction latency, such as the Fujitsu A64FX, the performance of these kernels is strongly influenced by loop-body splitting strategies. Memory-bandwidth-oriented models are therefore not appropriate for evaluating these splitting configurations, and a model that directly reflects instruction-level efficiency is required. To address this need, we develop a dependency-chain-based analytical formulation that links loop-splitting configurations to instruction dependencies in the tensor $n$-mode product kernel. We further use XGBoost to estimate key parameters in the analytical model that are difficult to model explicitly. Evaluations show that the learning-augmented model outperforms the widely used standard Roofline and Execution-Cache-Memory (ECM) models. On the Fujitsu A64FX processor, the learning-augmented model achieves mean absolute percentage errors (MAPE) between 1% and 24% for polynomial orders ($P$) from 1 to 15. In comparison, the standard Roofline and ECM models yield errors of 42%-256% and 5%-117%, respectively. On the Intel Xeon Gold 6230 processor, the learning-augmented model achieves MAPE values from 1% to 13% for $P$=1 to $P$=14, and 24% at $P$=15. In contrast, the standard Roofline and ECM models produce errors of 1%-73% and 8%-112% for $P$=1 to $P$=15, respectively.

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2601.06886v1
Single-precision Matrix Multiplication Performance on Cerebras CS-2: Evaluation and Modelling of Performance, Scalability and Energy Efficiency 査読有り Open Access

Takaaki Miyajima, Ryunosuke Matsuzaki, Daichi Mukunoki

Journal of Information Processing 34 巻頁： 132 - 139 2026年

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：Information Processing Society of Japan

DOI： 10.2197/ipsjjip.34.132

Open Access

researchmap
Sparse Iterative Solvers Using High-Precision Arithmetic with Quasi Multi-Word Algorithms

Daichi Mukunoki, Katsuhisa Ozaki

CoRR abs/2510.13536 巻 2025年10月

　詳細を見る

掲載種別：研究論文（学術雑誌）

To obtain accurate results in numerical computation, high-precision arithmetic is a straightforward approach. However, most processors lack hardware support for floating-point formats beyond double precision (FP64). Double-word arithmetic (Dekker 1971) extends precision by using standard floating-point operations to represent numbers with twice the mantissa length. Building on this concept, various multi-word arithmetic methods have been proposed to further increase precision by combining additional words. Simplified variants, known as quasi algorithms, have also been introduced, which trade a certain loss of accuracy for reduced computational cost. In this study, we investigate the performance of quasi algorithms for double- and triple-word arithmetic in sparse iterative solvers based on the Conjugate Gradient method, and compare them with both non-quasi algorithms and standard FP64. We evaluate execution time on an x86 processor, the number of iterations to convergence, and solution accuracy. Although quasi algorithms require appropriate normalization to preserve accuracy - without it, convergence cannot be achieved - they can still reduce runtime when normalization is applied correctly, while maintaining accuracy comparable to full multi-word algorithms. In particular, quasi triple-word arithmetic can yield more accurate solutions without significantly increasing execution time relative to double-word arithmetic and its quasi variant. Furthermore, for certain problems, a reduction in iteration count contributes to additional speedup. Thus, quasi triple-word arithmetic can serve as a compelling alternative to conventional double-word arithmetic in sparse iterative solvers.

DOI： 10.48550/arXiv.2510.13536

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2510.13536v1
3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG

Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Satoshi Ohshima, Takahiro Katagiri

CoRR abs/2510.04536 巻 2025年10月

　詳細を見る

掲載種別：研究論文（学術雑誌）

This paper proposes "3Dify," a procedural 3D computer graphics (3D-CG) generation framework utilizing Large Language Models (LLMs). The framework enables users to generate 3D-CG content solely through natural language instructions. 3Dify is built upon Dify, an open-source platform for AI application development, and incorporates several state-of-the-art LLM-related technologies such as the Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG). For 3D-CG generation support, 3Dify automates the operation of various Digital Content Creation (DCC) tools via MCP. When DCC tools do not support MCP-based interaction, the framework employs the Computer-Using Agent (CUA) method to automate Graphical User Interface (GUI) operations. Moreover, to enhance image generation quality, 3Dify allows users to provide feedback by selecting preferred images from multiple candidates. The LLM then learns variable patterns from these selections and applies them to subsequent generations. Furthermore, 3Dify supports the integration of locally deployed LLMs, enabling users to utilize custom-developed models and to reduce both time and monetary costs associated with external API calls by leveraging their own computational resources.

DOI： 10.48550/arXiv.2510.04536

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2510.04536v1
VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

CoRR abs/2510.00031 巻 2025年9月

　詳細を見る

掲載種別：研究論文（学術雑誌）

We propose VibeCodeHPC, an automatic tuning system for HPC programs based on multi-agent LLMs for code generation. VibeCodeHPC tunes programs through multi-agent role allocation and iterative prompt refinement. We describe the system configuration with four roles: Project Manager (PM), System Engineer (SE), Programmer (PG), and Continuous Delivery (CD). We introduce dynamic agent deployment and activity monitoring functions to facilitate effective multi-agent collaboration. In our case study, we convert and optimize CPU-based matrix-matrix multiplication code written in C to GPU code using CUDA. The multi-agent configuration of VibeCodeHPC achieved higher-quality code generation per unit time compared to a solo-agent configuration. Additionally, the dynamic agent deployment and activity monitoring capabilities facilitated more effective identification of requirement violations and other issues.

DOI： 10.48550/arXiv.2510.00031

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2510.00031v1
Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach

Tatsuro Hanyu, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

CoRR abs/2507.20295 巻 2025年7月

　詳細を見る

掲載種別：研究論文（学術雑誌）

Coherent Ising Machines (CIMs) have recently gained attention as a promising computing model for solving combinatorial optimization problems. In particular, the Chaotic Amplitude Control (CAC) algorithm has demonstrated high solution quality, but its performance is highly sensitive to a large number of hyperparameters, making efficient tuning essential. In this study, we present an algorithm portfolio approach for hyperparameter tuning in CIMs employing Chaotic Amplitude Control with momentum (CACm) algorithm. Our method incorporates multiple search strategies, enabling flexible and effective adaptation to the characteristics of the hyperparameter space. Specifically, we propose two representative tuning methods, Method A and Method B. Method A optimizes each hyperparameter sequentially with a fixed total number of trials, while Method B prioritizes hyperparameters based on initial evaluations before applying Method A in order. Performance evaluations were conducted on the Supercomputer "Flow" at Nagoya University, using planted Wishart instances and Time to Solution (TTS) as the evaluation metric. Compared to the baseline performance with best-known hyperparameters, Method A achieved up to 1.47x improvement, and Method B achieved up to 1.65x improvement. These results demonstrate the effectiveness of the algorithm portfolio approach in enhancing the tuning process for CIMs.

DOI： 10.48550/arXiv.2507.20295

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2507.20295v1
Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation

Daichi Mukunoki, Shun-ichiro Hayashi, Tetsuya Hoshino, Takahiro Katagiri

CoRR abs/2507.04697 巻 2025年7月

　詳細を見る

掲載種別：研究論文（学術雑誌）

Generative AI technology based on Large Language Models (LLM) has been developed and applied to assist or automatically generate program codes. In this paper, we evaluate the capability of existing general LLMs for Basic Linear Algebra Subprograms (BLAS) code generation for CPUs. We use two LLMs provided by OpenAI: GPT-4.1, a Generative Pre-trained Transformer (GPT) model, and o4-mini, one of the o-series of Reasoning models. Both have been released in April 2025. For the routines from level-1 to 3 BLAS, we tried to generate (1) C code without optimization from routine name only, (2) C code with basic performance optimizations (thread parallelization, SIMD vectorization, and cache blocking) from routine name only, and (3) C code with basic performance optimizations based on Fortran reference code. As a result, we found that correct code can be generated in many cases even when only routine name are given. We also confirmed that thread parallelization with OpenMP, SIMD vectorization, and cache blocking can be implemented to some extent, and that the code is faster than the reference code.

DOI： 10.48550/arXiv.2507.04697

arXiv

researchmap

その他リンク： https://arxiv.org/pdf/2507.04697v1
コヒーレントイジングマシンにおけるパラメタチューニングへのATの適用—Application of AT to Parameter Tuning in Coherent Ising Machines

羽生達郎, 片桐孝洋, 森下誠, 高橋一郎, 河合直聡, 椋木大地

計算工学講演会論文集 = Proceedings of the Conference on Computational Engineering and Science / 日本計算工学会編 30 巻頁： 957 - 960 2025年6月

　詳細を見る

記述言語：日本語出版者・発行元：東京 : 日本計算工学会

J-GLOBAL

researchmap

その他リンク： https://ndlsearch.ndl.go.jp/books/R000000004-I034175077
BLASコードを題材としたGPTモデルによる数値計算コード実装支援に関する考察

椋木大地, 林俊一郎, 星野哲也, 片桐孝洋

情報処理学会研究報告(Web) 2025 巻 ( HPC-200 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
疎行列反復解法の深層学習を用いた実行時間予測モデル構築と評価

中谷崇真, 河合直聡, 河合直聡, 片桐孝洋, 星野哲也, 永井亨, 椋木大地

情報処理学会研究報告(Web) 2025 巻 ( HPC-199 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
機械学習によるLAPACK固有値計算ルーチンのテストシーケンス最適化の試行

樫村寛大, 片桐孝洋, 森崎修司, 星野哲也, 椋木大地

情報処理学会研究報告(Web) 2025 巻 ( HPC-201 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
コヒーレントイジングマシンの性能パラメタ最適化のための探索アルゴリズム選択可能な手法の提案

羽生達郎, 森下誠, 水木直也, 片桐孝洋, 椋木大地, 河合直聡, 星野哲也, 永井亨

情報処理学会研究報告(Web) 2025 巻 ( HPC-198 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
SVMによる誤差を含むクラス分類における多種疑似量子アニーラの性能評価

水木直也, 森下誠, 河合直聡, 片桐孝洋, 椋木大地, 星野哲也, 永井亨

情報処理学会研究報告(Web) 2025 巻 ( HPC-198 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
MCP・RAGを用いたプロシージャル3D生成LLMエージェント3Difyの提案とスパコンの利用

林俊一郎, 椋木大地, 片桐孝洋, 星野哲也, 大島聡史

情報処理学会研究報告(Web) 2025 巻 ( HPC-200 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
GeoFEMを対象としたClaude CodeによるGPUコード開発の評価

星野哲也, 林俊一郎, 椋木大地, 片桐孝洋, 塙敏博

情報処理学会研究報告(Web) 2025 巻 ( HPC-201 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
VibeCodeHPC:HPCコード自動チューニングのためのマルチLLMエージェントシステム

林俊一郎, 森田光貴, 椋木大地, 星野哲也, 片桐孝洋

情報処理学会研究報告(Web) 2025 巻 ( ARC-263 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
gpt-oss-120bを用いたコード自動最適化マルチエージェントシステムの試作

椋木大地, 森田光貴, 林俊一郎, 三笠諒, 星野哲也, 片桐孝洋

情報処理学会研究報告(Web) 2025 巻 ( ARC-263 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
csDF:Cerebras CS-2向け疑似倍精度浮動小数点演算ライブラリの実装

村上魁, 長島令旺, 中村暁, 松崎竜之介, 吉井一友, 椋木大地, 宮島敬明

情報処理学会研究報告(Web) 2025 巻 ( ARC-263 ) 2025年

　詳細を見る

J-GLOBAL

researchmap
Quasi Triple-Word Arithmeticによる6倍精度演算の疎行列反復解法への応用 Open Access

椋木大地, 尾崎克久

情報処理学会研究報告ハイパフォーマンスコンピューティング（HPC） 2024-HPC-197 巻 ( 11 ) 頁： 1 - 7 2024年12月

　詳細を見る

担当区分：筆頭著者記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

Open Access

J-GLOBAL

researchmap
Performance Evaluation of Adaptive-Precision SpMV with Reduced-Precision Formats

Stef Grailla, Fabienne Jézéquel, Théo Mary, Roméo Molina, Daichi Mukunoki

HAL hal-04261073 巻 2023年10月

　詳細を見る

記述言語：英語掲載種別：研究論文（その他学術会議資料等）

researchmap
CPUにおけるbatched BLASのためのタスクスケジューリング戦略

椋木大地, 廣田悠輔, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) 2021 巻 2022年

　詳細を見る

J-GLOBAL

researchmap
尾崎スキームによる無限精度内積と再現可能疎行列反復ソルバーへの応用

椋木大地, 尾崎克久, 荻田武史, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) 2022 巻 2022年

　詳細を見る

J-GLOBAL

researchmap
不等分割による行列積のエラーフリー変換の高精度計算への応用

尾崎克久, 椋木大地, 荻田武史, 荻田武史

日本応用数理学会年会講演予稿集(CD-ROM) 2022 巻 2022年

　詳細を見る

J-GLOBAL

researchmap
グループワイズ更新戦略を用いたCG手法の混合精度アルゴリズム

AIHARA Kensuke, OZAKI Katsuhisa, MUKUNOKI Daichi

Conference Proceedings. JSST Annual International Conference on Simulation Technology (Web) 41st 巻 2022年

　詳細を見る

J-GLOBAL

researchmap
GPUテンソルコアを用いた行列乗算のエラーフリー変換の加速

OZAKI Katsuhisa, MUKUNOKI Daichi, OGITA Takeshi

International Conference on Simulation Technology (CD-ROM) 40th 巻 2021年

　詳細を見る

J-GLOBAL

researchmap
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

Roman Iakymchuk, Daichi Mukunoki, Artur Podobas, Fabienne Jézéquel, Toshiyuki Imamura, Norihisa Fujita, Jens Huthmann, Shuhei Kudo, Yiyu Tan, Jens Domke, Kai Torben Ohlhus, Takeshi Fukaya, Takeo Hoshi, Yuki Murakami, Maho Nakata, Takeshi Ogita, Kentaro Sano, Taisuke Boku

CoRR abs/2004.04628 巻 2020年4月

　詳細を見る

記述言語：英語掲載種別：研究論文（研究会，シンポジウム資料等）

In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Spe cifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/.

arXiv

researchmap

その他リンク： https://dblp.uni-trier.de/db/journals/corr/corr2004.html#abs-2004-04628
GPUの単精度演算・Tensorコアを用いた行列積のエラーフリー変換

尾崎克久, 椋木大地, 荻田武史

日本応用数理学会年会講演予稿集(CD-ROM) 2020 巻 2020年

　詳細を見る

J-GLOBAL

researchmap
オーバー・アンダーフローを抑えた高精度かつ高速な2ノルム計算手法 Open Access

原山赳幸, 工藤周平, 椋木大地, 今村俊幸, 高橋大介

情報処理学会研究報告(Web) 2020 巻 ( HPC-177 ) 2020年

　詳細を見る

Open Access

J-GLOBAL

researchmap
尾崎スキームを用いたbinary128による4倍精度行列積

椋木大地, 尾崎克久, 荻田武史

日本応用数理学会年会講演予稿集(CD-ROM) 2020 巻 2020年

　詳細を見る

J-GLOBAL

researchmap
尾崎スキームによる高精度かつ再現性のあるBLAS実装

椋木大地, 荻田武史, 尾崎克久, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) 2019 巻 2019年

　詳細を見る

J-GLOBAL

researchmap
Level-3BLASに基づく高精度行列積計算法による高精度かつ再現性のあるBLASルーチンの実装とその最適化 Open Access

椋木大地, 荻田武史, 尾崎克久

情報処理学会研究報告(Web) 2018 巻 ( HPC-166 ) 2018年

　詳細を見る

Open Access

J-GLOBAL

researchmap
京コンピュータにおける2.5次元アルゴリズムを用いた分散並列行列積の実装と評価 Open Access

椋木大地, 今村俊幸

情報処理学会研究報告(Web) 2017 巻 ( HPC-159 ) 2017年

　詳細を見る

Open Access

J-GLOBAL

researchmap
KMATHLIB-京コンピュータにおける高性能かつスケーラブルな数値計算ライブラリ-

大井祥栄, 廣田悠輔, 椋木大地, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) 2016 巻 2016年

　詳細を見る

J-GLOBAL

researchmap
大規模並列計算機における連立一次方程式の精度保証付き数値計算に対する性能評価 Open Access

森倉悠介, 椋木大地, 深谷猛, 山中脩也, 大石進一

情報処理学会研究報告(Web) 2016 巻 ( HPC-157 ) 2016年

　詳細を見る

Open Access

J-GLOBAL

researchmap
コンシューマレンジのGPUに最適化した固有値ソルバーの実装と評価 Open Access

今村俊幸, 椋木大地

情報処理学会研究報告(Web) 2016 巻 ( HPC-157 ) 2016年

　詳細を見る

Open Access

J-GLOBAL

researchmap
AICSの大規模並列数値計算技術研究チームにおけるGPU計算に関する研究活動の紹介

MUKUNOKI Daichi, IMAMURA Toshiyuki, TAKAHASHI Daisuke

Plans and Future for International Collaborations on Extreme Scale Computing. 6th AICS International Symposium. RIKEN Symposium, 2016 2016年

　詳細を見る

J-GLOBAL

researchmap
CUDA-BLAS等の選択による最速GPU固有値ソルバーの性能評価 Open Access

今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

情報処理学会研究報告(Web) 2015 巻 ( HPC-148 ) 2015年

　詳細を見る

Open Access

J-GLOBAL

researchmap
FFTを使った時間発展問題における累積誤差

佐々成正, 山田進, 町田昌彦, 椋木大地, 今村俊幸

日本応用数理学会年会講演予稿集(CD-ROM) 2015 巻 2015年

　詳細を見る

J-GLOBAL

researchmap
NVIDIA GPUにおけるGEMVカーネルの自動チューニング

椋木大地, 今村俊幸, 高橋大介

計算工学講演会論文集(CD-ROM) 20 巻 2015年

　詳細を見る

J-GLOBAL

researchmap
短尺浮動小数点形式の検討 Open Access

椋木大地, 今村俊幸

情報処理学会研究報告(Web) 2015 巻 ( HPC-152 ) 2015年

　詳細を見る

Open Access

J-GLOBAL

researchmap
京・FX10における倍々精度演算の高速化 Open Access

佐々木信一, 菱沼利彰, 藤井昭宏, 田中輝雄, 椋木大地, 今村俊幸

情報処理学会研究報告(Web) 2015 巻 ( HPC-151 ) 2015年

　詳細を見る

Open Access

J-GLOBAL

researchmap
SYMV・GEMVルーチン群のマルチGPU化とその評価 Open Access

今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

情報処理学会研究報告(Web) 2015 巻 ( HPC-151 ) 2015年

　詳細を見る

Open Access

J-GLOBAL

researchmap
NVIDIA GPUにおけるメモリ律速なBLASカーネルのスレッド数自動選択手法 Open Access

椋木大地, 今村俊幸, 高橋大介

情報処理学会研究報告(Web) 2015 巻 ( HPC-150 ) 2015年

　詳細を見る

Open Access

J-GLOBAL

researchmap
CUDA-xSYMVの実装と評価 Open Access

今村俊幸, 今村俊幸, 椋木大地, 山田進, 山田進, 町田昌彦, 町田昌彦

情報処理学会研究報告(Web) 2014 巻 ( HPC-146 ) 2014年

　詳細を見る

Open Access

J-GLOBAL

researchmap
MaxwellアーキテクチャGPUにおける疑似倍精度演算を用いたDGEMMの実装と評価 Open Access

椋木大地, 今村俊幸

情報処理学会研究報告(Web) 2014 巻 ( ARC-213 ) 2014年

　詳細を見る

Open Access

J-GLOBAL

researchmap
GPUにおける高速なCRS形式疎行列ベクトル積の実装 Open Access

椋木大地, 高橋大介

研究報告ハイパフォーマンスコンピューティング（HPC） 2013 巻 ( 5 ) 頁： 1 - 7 2013年2月

　詳細を見る

記述言語：日本語

疎行列ベクトル積（SpMV）は科学技術計算において多用される重要な基本演算である．本稿では GPU における高速な CRS 形式 SpMV の実装について報告する．GPU として NVIDIA 社の Kepler アーキテクチャを対象とし，CUDA5.0 環境において実装を行った．従来の Fermi アーキテクチャまでの GPU を対象に提案されていた実装手法をベースに，Kepler アーキテクチャで新たにサポートされた機能や仕様変更を活用して，最適化を行った．Kepler アーキテクチャの Tesla K20 における性能評価では，CUDA5.0 に付属の cuSPARSE における CRS 形式の倍精度 SpMV ルーチンに対して，200 種類の行列において，平均で約 1.86 倍，177 種類の行列で性能向上を達成した．

Open Access

CiNii Research

researchmap
GPUにおける4倍精度浮動小数点演算を用いたクリロフ部分空間法の高速化 Open Access

椋木大地, 椋木大地, 高橋大介

情報処理学会研究報告(Web) 2013 巻 ( HPC-140 ) 2013年

　詳細を見る

Open Access

J-GLOBAL

researchmap
GPUにおける3倍・4倍精度浮動小数点演算の実現と性能評価

椋木大地, 高橋大介

情報処理学会論文誌トランザクション(CD-ROM) 2012 巻 ( 2 ) 2013年

　詳細を見る

J-GLOBAL

researchmap
GPUにおける高速なCRS形式疎行列ベクトル積の実装

椋木大地, 高橋大介

情報処理学会研究報告(CD-ROM) 2012 巻 ( 6 ) 2013年

　詳細を見る

J-GLOBAL

researchmap
GPUにおける4倍精度演算を用いた疎行列反復解法の実装と評価 Open Access

椋木大地, 高橋大介

研究報告ハイパフォーマンスコンピューティング（HPC） 2012 巻 ( 37 ) 頁： 1 - 8 2012年12月

　詳細を見る

記述言語：日本語

疎行列の反復解法として用いられるクリロフ部分空間法は，丸め誤差の影響によって収束までの反復回数が増加したり，収束しなくなるケースがある．このような場合に高精度演算を用いることで収束性を改善できるケースがあることが報告されている．このとき，高精度演算を行うことによる1反復あたりの計算時間の増大に対して，反復回数の削減による計算時間の短縮効果が大きければ，求解までの計算時間を短縮できる可能性がある．我々は GPU （Tesla M2050）において Double-Double （DD）演算による 4 倍精度を用いて，クリロフ部分空間法の一つである BiCGStab 法を実装し性能を評価した． GPU 上では 4 倍精度 BiCGStab 法の 1 反復あたりの計算時間が，倍精度の約 1.0-2.2 倍となり，反復回数の削減量によっては， 4 倍精度演算を用いることで求解までの計算時間を短縮できる場合が存在した．本稿では GPU 上の疎行列反復解法における 4 倍精度演算の性能と有効性について検討する．

Open Access

CiNii Research

researchmap
GPUによる3倍精度浮動小数点演算の検討 Open Access

椋木大地, 高橋大介

情報処理学会研究報告(CD-ROM) 2011 巻 ( 4 ) 2011年

　詳細を見る

Open Access

J-GLOBAL

researchmap
GPUによる4倍精度BLASの実装と評価

椋木大地, 高橋大介

計算工学講演会論文集 15 巻 ( 2 ) 2010年

　詳細を見る

J-GLOBAL

researchmap
GPUによる4倍精度BLASの実装と評価 Open Access

椋木大地, 高橋大介

情報処理学会研究報告(CD-ROM) 2009 巻 ( 4 ) 2009年

　詳細を見る

Open Access

J-GLOBAL

researchmap

▼全件表示

論文の先頭へ▲

講演・口頭発表等 287

Toward Automatic Generation of High Performance Numerical Codes by LLMs 国際会議

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri

SIAM Conference on Parallel Processing for Scientific Computing (PP26) 2026年3月

　詳細を見る

開催年月日： 2026年3月

記述言語：英語会議種別：口頭発表（一般）
Toward Automatic Generation of High Performance Numerical Codes by LLMs 国際会議

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri

SIAM Conference on Parallel Processing for Scientific Computing (PP26) 2026年3月

　詳細を見る

開催年月日： 2026年3月

記述言語：英語会議種別：口頭発表（一般）
高性能計算のためのコード生成AIエージェント開発

椋木大地

MateriAI 2025 〜計算物質科学分野におけるAI技術の活用 2026年2月2日

　詳細を見る

開催年月日： 2026年2月

記述言語：日本語会議種別：口頭発表（一般）
高性能計算のためのコード生成AIエージェント開発

椋木大地

MateriAI 2025 〜計算物質科学分野におけるAI技術の活用 2026年2月2日

　詳細を見る

開催年月日： 2026年2月

記述言語：日本語会議種別：口頭発表（一般）
生成AIの活用によるHPCコードGPU化の展望

椋木大地

「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会 2026年1月21日

　詳細を見る

開催年月日： 2026年1月

記述言語：日本語会議種別：口頭発表（一般）
生成AIの活用によるHPCコードGPU化の展望

椋木大地

「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会 2026年1月21日

　詳細を見る

開催年月日： 2026年1月

記述言語：日本語会議種別：口頭発表（一般）
Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method 国際会議

Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Performance Evaluation of SVM with Multiple Quantum-inspired Annealers 国際会議

Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs 国際会議

Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
A Multi Agent System for Local LLM-Based HPC Code Generation 国際会議

Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models 国際会議

Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme 国際会議

Daichi Mukunoki

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes 国際会議

Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization 国際会議

Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning 国際会議

Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme 国際会議

Daichi Mukunoki

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method 国際会議

Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs 国際会議

Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Performance Evaluation of SVM with Multiple Quantum-inspired Annealers 国際会議

Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
A Multi Agent System for Local LLM-Based HPC Code Generation 国際会議

Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models 国際会議

Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning 国際会議

Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization 国際会議

Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes 国際会議

Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

開催年月日： 2026年1月

記述言語：英語会議種別：ポスター発表
AI時代のハードウェアとFP64エミュレーション

椋木大地

第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025） 2025年12月23日

　詳細を見る

開催年月日： 2025年12月

記述言語：日本語会議種別：口頭発表（一般）
AI時代のハードウェアとFP64エミュレーション

椋木大地

第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025） 2025年12月23日

　詳細を見る

開催年月日： 2025年12月

記述言語：日本語会議種別：口頭発表（一般）
Automatic Generation of Numerical Codes for GPUs Using LLMs 国際会議

Daichi Mukunoki

JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing 2025年12月5日

　詳細を見る

開催年月日： 2025年12月

記述言語：英語会議種別：口頭発表（一般）
Automatic Generation of Numerical Codes for GPUs Using LLMs 国際会議

Daichi Mukunoki

JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing 2025年12月5日

　詳細を見る

開催年月日： 2025年12月

記述言語：英語会議種別：口頭発表（一般）
Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI 国際会議

Daichi Mukunoki

58th ASE Seminar 2025年12月1日

　詳細を見る

開催年月日： 2025年12月

記述言語：英語会議種別：口頭発表（一般）
Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI 国際会議

Daichi Mukunoki

58th ASE Seminar 2025年12月1日

　詳細を見る

開催年月日： 2025年12月

記述言語：英語会議種別：口頭発表（一般）
csDF: a double-float arithmetic library for the Cerebras CS-2 国際会議

Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC25 research poster session 2025年11月16日

　詳細を見る

開催年月日： 2025年11月

記述言語：英語会議種別：ポスター発表
csDF: a double-float arithmetic library for the Cerebras CS-2 国際会議

Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC25 research poster session 2025年11月16日

　詳細を見る

開催年月日： 2025年11月

記述言語：英語会議種別：ポスター発表
LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性

林俊一郎、森田光貴、椋木大地、星野哲也、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025年10月20日

　詳細を見る

開催年月日： 2025年10月

記述言語：日本語会議種別：ポスター発表
LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望

椋木大地、林俊一郎、星野哲也、森田光貴、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025年10月20日

　詳細を見る

開催年月日： 2025年10月

記述言語：日本語会議種別：ポスター発表
LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性

林俊一郎、森田光貴、椋木大地、星野哲也、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025年10月20日

　詳細を見る

開催年月日： 2025年10月

記述言語：日本語会議種別：ポスター発表
LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望

椋木大地、林俊一郎、星野哲也、森田光貴、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025年10月20日

　詳細を見る

開催年月日： 2025年10月

記述言語：日本語会議種別：ポスター発表
生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望

林俊一郎、椋木大地

2025年度第2回物性アプリオープンフォーラム 2025年9月29日

　詳細を見る

開催年月日： 2025年9月

記述言語：日本語会議種別：口頭発表（一般）
生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望

林俊一郎、椋木大地

2025年度第2回物性アプリオープンフォーラム 2025年9月29日

　詳細を見る

開催年月日： 2025年9月

記述言語：日本語会議種別：口頭発表（一般）
Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI 国際会議

Daichi Mukunoki

The 6th "FugakuNEXT" Application Seminar 2025年9月25日

　詳細を見る

開催年月日： 2025年9月

記述言語：英語会議種別：口頭発表（一般）
Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI 国際会議

Daichi Mukunoki

The 6th "FugakuNEXT" Application Seminar 2025年9月25日

　詳細を見る

開催年月日： 2025年9月

記述言語：英語会議種別：口頭発表（一般）
汎用LLMによるBLASコード自動生成能力の考察

椋木大地

第6回スーパーコンピュータ「不老」ユーザ会 2025年9月11日

　詳細を見る

開催年月日： 2025年9月

記述言語：日本語会議種別：口頭発表（一般）
汎用LLMによるBLASコード自動生成能力の考察

椋木大地

第6回スーパーコンピュータ「不老」ユーザ会 2025年9月11日

　詳細を見る

開催年月日： 2025年9月

記述言語：日本語会議種別：口頭発表（一般）
GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化

湯淺義尚、小田昌宏、椋木大地、片桐孝洋、星野哲也、河合直聡、永井亨、森健策

第44回日本医用画像工学会大会（JAMIT 2025） 2025年8月28日

　詳細を見る

開催年月日： 2025年8月

記述言語：日本語会議種別：ポスター発表
GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化

湯淺義尚、小田昌宏、椋木大地、片桐孝洋、星野哲也、河合直聡、永井亨、森健策

第44回日本医用画像工学会大会（JAMIT 2025） 2025年8月28日

　詳細を見る

開催年月日： 2025年8月

記述言語：日本語会議種別：ポスター発表
HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution

林俊一郎、椋木大地、星野哲也、片桐孝洋

xSIG 2025 2025年8月

　詳細を見る

開催年月日： 2025年8月

記述言語：日本語会議種別：ポスター発表
HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution

林俊一郎、椋木大地、星野哲也、片桐孝洋

xSIG 2025 2025年8月

　詳細を見る

開催年月日： 2025年8月

記述言語：日本語会議種別：ポスター発表
LLMによるBLASコード生成に関する考察

椋木大地

第33回AT研究会オープンアカデミックセッション（ATOS33） 2025年7月28日

　詳細を見る

開催年月日： 2025年7月

記述言語：日本語会議種別：口頭発表（一般）
LLMによるBLASコード生成に関する考察

椋木大地

第33回AT研究会オープンアカデミックセッション（ATOS33） 2025年7月28日

　詳細を見る

開催年月日： 2025年7月

記述言語：日本語会議種別：口頭発表（一般）
生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望

椋木大地

情報処理学会東海支部主催第6回講演会 2025年1月9日

　詳細を見る

開催年月日： 2025年1月

記述言語：日本語会議種別：口頭発表（一般）
生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望

椋木大地

情報処理学会東海支部主催第6回講演会 2025年1月9日

　詳細を見る

開催年月日： 2025年1月

記述言語：日本語会議種別：口頭発表（一般）
Multiple- and Mixed-Precision BLAS with C++ Template 国際会議

Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023年8月24日

　詳細を見る

開催年月日： 2023年8月

記述言語：英語会議種別：口頭発表（一般）
Multiple- and Mixed-Precision BLAS with C++ Template 国際会議

Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023年8月24日

　詳細を見る

開催年月日： 2023年8月

記述言語：英語会議種別：口頭発表（一般）
Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications 国際会議

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023年8月21日

　詳細を見る

開催年月日： 2023年8月

記述言語：英語会議種別：口頭発表（一般）
Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications 国際会議

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023年8月21日

　詳細を見る

開催年月日： 2023年8月

記述言語：英語会議種別：口頭発表（一般）
tmBLAS: a Mixed Precision BLAS by C++ Template 国際会議

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2023) 2023年5月

　詳細を見る

開催年月日： 2023年5月

記述言語：英語会議種別：ポスター発表
tmBLAS: a Mixed Precision BLAS by C++ Template 国際会議

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2023) 2023年5月

　詳細を見る

開催年月日： 2023年5月

記述言語：英語会議種別：ポスター発表
Multiple and Mixed Precision BLAS with C++ Template 国際会議

Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura

5th R-CCS International Symposium 2023年2月6日

　詳細を見る

開催年月日： 2023年2月

記述言語：英語会議種別：ポスター発表
Multiple and Mixed Precision BLAS with C++ Template 国際会議

Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura

5th R-CCS International Symposium 2023年2月6日

　詳細を見る

開催年月日： 2023年2月

記述言語：英語会議種別：ポスター発表
疎行列ベクトル積における低精度データ表現の導入について

椋木大地、河合直聡

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022年12月23日

　詳細を見る

開催年月日： 2022年12月

記述言語：日本語会議種別：口頭発表（一般）
疎行列ベクトル積における低精度データ表現の導入について

椋木大地、河合直聡

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022年12月23日

　詳細を見る

開催年月日： 2022年12月

記述言語：日本語会議種別：口頭発表（一般）
Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba 2022年10月14日

　詳細を見る

開催年月日： 2022年10月

記述言語：英語会議種別：ポスター発表
Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba 2022年10月14日

　詳細を見る

開催年月日： 2022年10月

記述言語：英語会議種別：ポスター発表
A mixed-precision algorithm of the CG method using the group-wise update strategy 国際会議

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

The 41st JSST Annual International Conference on Simulation Technology (JSST2022) 2022年9月2日

　詳細を見る

開催年月日： 2022年9月

記述言語：英語会議種別：口頭発表（一般）
A mixed-precision algorithm of the CG method using the group-wise update strategy 国際会議

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

The 41st JSST Annual International Conference on Simulation Technology (JSST2022) 2022年9月2日

　詳細を見る

開催年月日： 2022年9月

記述言語：英語会議種別：口頭発表（一般）
Remedies for Reproducibility Issue in Conjugate Gradient Solvers 国際会議

Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SparseDays2022 2022年6月20日

　詳細を見る

開催年月日： 2022年6月

記述言語：英語会議種別：ポスター発表
Remedies for Reproducibility Issue in Conjugate Gradient Solvers 国際会議

Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SparseDays2022 2022年6月20日

　詳細を見る

開催年月日： 2022年6月

記述言語：英語会議種別：ポスター発表
A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2022) 2022年6月1日

　詳細を見る

開催年月日： 2022年6月

記述言語：英語会議種別：ポスター発表
A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2022) 2022年6月1日

　詳細を見る

開催年月日： 2022年6月

記述言語：英語会議種別：ポスター発表
Impact and Contribution of Ozaki scheme in High Performance Computing 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022) 2022年3月15日

　詳細を見る

開催年月日： 2022年3月

記述言語：英語会議種別：口頭発表（一般）
Impact and Contribution of Ozaki scheme in High Performance Computing 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022) 2022年3月15日

　詳細を見る

開催年月日： 2022年3月

記述言語：英語会議種別：口頭発表（一般）
Flying restart付きCG法に対する混合精度演算による近似解精度の向上

相原研輔、尾崎克久、椋木大地

日本応用数理学会第18回研究部会連合発表会 2022年3月9日

　詳細を見る

開催年月日： 2022年3月

記述言語：日本語会議種別：口頭発表（一般）
Flying restart付きCG法に対する混合精度演算による近似解精度の向上

相原研輔、尾崎克久、椋木大地

日本応用数理学会第18回研究部会連合発表会 2022年3月9日

　詳細を見る

開催年月日： 2022年3月

記述言語：日本語会議種別：口頭発表（一般）
行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用

尾崎克久、椋木大地、荻田武史

日本応用数理学会第18回研究部会連合発表会 2022年3月8日

　詳細を見る

開催年月日： 2022年3月

記述言語：日本語会議種別：口頭発表（一般）
行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用

尾崎克久、椋木大地、荻田武史

日本応用数理学会第18回研究部会連合発表会 2022年3月8日

　詳細を見る

開催年月日： 2022年3月

記述言語：日本語会議種別：口頭発表（一般）
Performance Evaluation of Batched BLAS on A64FX 国際会議

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

4th R-CCS International Symposium (lightning talk) 2022年2月7日

　詳細を見る

開催年月日： 2022年2月

記述言語：英語会議種別：口頭発表（一般）
Performance Evaluation of Batched BLAS on A64FX 国際会議

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

4th R-CCS International Symposium (lightning talk) 2022年2月7日

　詳細を見る

開催年月日： 2022年2月

記述言語：英語会議種別：口頭発表（一般）
精度自動チューニングに向けた基盤技術の検討

椋木大地

第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021) 2021年12月13日

　詳細を見る

開催年月日： 2021年12月

記述言語：日本語会議種別：口頭発表（一般）
精度自動チューニングに向けた基盤技術の検討

椋木大地

第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021) 2021年12月13日

　詳細を見る

開催年月日： 2021年12月

記述言語：日本語会議種別：口頭発表（一般）
Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments 国際会議

Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat

ISC High Performance (ISC 2021) 2021年6月29日

　詳細を見る

開催年月日： 2021年6月

記述言語：英語会議種別：ポスター発表
Accurate Matrix Multiplication on Binary128 using Ozaki Scheme 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2021) 2021年6月29日

　詳細を見る

開催年月日： 2021年6月

記述言語：英語会議種別：ポスター発表
Accurate Matrix Multiplication on Binary128 using Ozaki Scheme 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2021) 2021年6月29日

　詳細を見る

開催年月日： 2021年6月

記述言語：英語会議種別：ポスター発表
Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments 国際会議

Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat

ISC High Performance (ISC 2021) 2021年6月29日

　詳細を見る

開催年月日： 2021年6月

記述言語：英語会議種別：ポスター発表
Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic 国際会議

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Rencontres Arithmétiques de l'Informatique Mathématique (RAIM) 2021年5月

　詳細を見る

開催年月日： 2021年5月

記述言語：英語会議種別：口頭発表（一般）
Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic 国際会議

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Rencontres Arithmétiques de l'Informatique Mathématique (RAIM) 2021年5月

　詳細を見る

開催年月日： 2021年5月

記述言語：英語会議種別：口頭発表（一般）
DGEMM using Tensor Cores 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SIAM Conference on Computational Science and Engineering (CSE21) 2021年3月4日

　詳細を見る

開催年月日： 2021年3月

記述言語：英語会議種別：口頭発表（一般）
DGEMM using Tensor Cores 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SIAM Conference on Computational Science and Engineering (CSE21) 2021年3月4日

　詳細を見る

開催年月日： 2021年3月

記述言語：英語会議種別：口頭発表（一般）
High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

3rd R-CCS International Symposium 2021年2月15日

　詳細を見る

開催年月日： 2021年2月

記述言語：英語会議種別：ポスター発表
High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

3rd R-CCS International Symposium 2021年2月15日

　詳細を見る

開催年月日： 2021年2月

記述言語：英語会議種別：ポスター発表
binary128 に対する尾崎スキーム行列積

椋木大地、尾崎克久、荻田武史

第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020) 2020年11月28日

　詳細を見る

開催年月日： 2020年11月

記述言語：日本語会議種別：口頭発表（一般）
binary128 に対する尾崎スキーム行列積

椋木大地、尾崎克久、荻田武史

第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020) 2020年11月28日

　詳細を見る

開催年月日： 2020年11月

記述言語：日本語会議種別：口頭発表（一般）
Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments 国際会議

Roman Iakymchuk, Daichi Mukunoki

Sparse Days Cerfacs 2020年11月24日

　詳細を見る

開催年月日： 2020年11月

記述言語：英語会議種別：口頭発表（一般）
Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments 国際会議

Roman Iakymchuk, Daichi Mukunoki

Sparse Days Cerfacs 2020年11月24日

　詳細を見る

開催年月日： 2020年11月

記述言語：英語会議種別：口頭発表（一般）
DGEMM using Tensor Cores and OzBLAS 国際会議

Daichi Mukunoki

11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop 2020年9月8日

　詳細を見る

開催年月日： 2020年9月

記述言語：英語会議種別：口頭発表（一般）
DGEMM using Tensor Cores and OzBLAS 国際会議

Daichi Mukunoki

11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop 2020年9月8日

　詳細を見る

開催年月日： 2020年9月

記述言語：英語会議種別：口頭発表（一般）
An FPGA-based Matrix Multiplier with Task Parallelism 国際会議

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

2nd R-CCS International Symposium 2020年2月17日

　詳細を見る

開催年月日： 2020年2月

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

2nd R-CCS International Symposium 2020年2月17日

　詳細を見る

開催年月日： 2020年2月

記述言語：英語会議種別：ポスター発表
An FPGA-based Matrix Multiplier with Task Parallelism 国際会議

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

2nd R-CCS International Symposium 2020年2月17日

　詳細を見る

開催年月日： 2020年2月

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

2nd R-CCS International Symposium 2020年2月17日

　詳細を見る

開催年月日： 2020年2月

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki

SIAM Conference on Parallel Processing for Scientific Computing (PP20) 2020年2月15日

　詳細を見る

開催年月日： 2020年2月

記述言語：英語会議種別：口頭発表（一般）
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki

SIAM Conference on Parallel Processing for Scientific Computing (PP20) 2020年2月15日

　詳細を見る

開催年月日： 2020年2月

記述言語：英語会議種別：口頭発表（一般）
Accurate BLAS implementations: OzBLAS and BLAS-DOT2 国際会議

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January) 2020年1月30日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：口頭発表（一般）
Accurate BLAS implementations: OzBLAS and BLAS-DOT2 国際会議

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January) 2020年1月30日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：口頭発表（一般）
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki

Sapporo Winter HPC Seminar 2020 2020年1月24日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：口頭発表（一般）
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki

Sapporo Winter HPC Seminar 2020 2020年1月24日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：口頭発表（一般）
Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations 国際会議

Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020年1月15日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：ポスター発表
Accurate DGEMM using Tensor Cores 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020年1月15日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：ポスター発表
Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations 国際会議

Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020年1月15日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：ポスター発表
Accurate DGEMM using Tensor Cores 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020年1月15日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：ポスター発表
High-performance Implementations of Accurate Linear Algebra Kernels on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita

3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST) 2020年1月9日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：口頭発表（一般）
High-performance Implementations of Accurate Linear Algebra Kernels on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita

3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST) 2020年1月9日

　詳細を見る

開催年月日： 2020年1月

記述言語：英語会議種別：口頭発表（一般）
尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用

椋木大地、荻田武史、尾崎克久

第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019) 2019年12月1日

　詳細を見る

開催年月日： 2019年12月

記述言語：日本語会議種別：口頭発表（一般）
尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用

椋木大地、荻田武史、尾崎克久

第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019) 2019年12月1日

　詳細を見る

開催年月日： 2019年12月

記述言語：日本語会議種別：口頭発表（一般）
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

SC19 research poster session 2019年11月19日

　詳細を見る

開催年月日： 2019年11月

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

SC19 research poster session 2019年11月19日

　詳細を見る

開催年月日： 2019年11月

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications 2019年11月7日

　詳細を見る

開催年月日： 2019年11月

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications 2019年11月7日

　詳細を見る

開催年月日： 2019年11月

記述言語：英語会議種別：ポスター発表
Reduced and Extended-Precision Computations on FPGAs and GPUs 国際会議

Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku

The 11th symposium on Discovery 2019年10月15日

　詳細を見る

開催年月日： 2019年10月

記述言語：英語会議種別：ポスター発表
Reduced and Extended-Precision Computations on FPGAs and GPUs 国際会議

Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku

The 11th symposium on Discovery 2019年10月15日

　詳細を見る

開催年月日： 2019年10月

記述言語：英語会議種別：ポスター発表
Accurate and Reproducible CG Method on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019) 2019年10月1日

　詳細を見る

開催年月日： 2019年10月

記述言語：英語会議種別：口頭発表（一般）
Accurate and Reproducible CG Method on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019) 2019年10月1日

　詳細を見る

開催年月日： 2019年10月

記述言語：英語会議種別：口頭発表（一般）
Accurate and Reproducible Linear Algebra Operations for Many-core Architectures 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Russian Supercomputing Days 2019 (RuSCDays 2019) 2019年9月23日

　詳細を見る

開催年月日： 2019年9月

記述言語：英語会議種別：ポスター発表
Accurate and Reproducible Linear Algebra Operations for Many-core Architectures 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Russian Supercomputing Days 2019 (RuSCDays 2019) 2019年9月23日

　詳細を見る

開催年月日： 2019年9月

記述言語：英語会議種別：ポスター発表
High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs 国際会議

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June) 2019年6月7日

　詳細を見る

開催年月日： 2019年6月

記述言語：英語会議種別：口頭発表（一般）
High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs 国際会議

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June) 2019年6月7日

　詳細を見る

開催年月日： 2019年6月

記述言語：英語会議種別：口頭発表（一般）
尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用

椋木大地

第22回AT研究会オープンアカデミックセッション（ATOS22） 2019年5月13日

　詳細を見る

開催年月日： 2019年5月

記述言語：日本語会議種別：口頭発表（一般）
尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用

椋木大地

第22回AT研究会オープンアカデミックセッション（ATOS22） 2019年5月13日

　詳細を見る

開催年月日： 2019年5月

記述言語：日本語会議種別：口頭発表（一般）
OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

GPU Technology Conference (GTC 2019) 2019年3月17日

　詳細を見る

開催年月日： 2019年3月

記述言語：英語会議種別：ポスター発表
OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

GPU Technology Conference (GTC 2019) 2019年3月17日

　詳細を見る

開催年月日： 2019年3月

記述言語：英語会議種別：ポスター発表
Development of Scientific Numerical Libraries on post-K computer 国際会議

Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu

1st R-CCS International Symposium 2019年2月18日

　詳細を見る

開催年月日： 2019年2月

記述言語：英語会議種別：ポスター発表
Development of Scientific Numerical Libraries on post-K computer 国際会議

Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu

1st R-CCS International Symposium 2019年2月18日

　詳細を見る

開催年月日： 2019年2月

記述言語：英語会議種別：ポスター発表
尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価

椋木大地、荻田武史、尾崎克久

第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018) 2018年12月2日

　詳細を見る

開催年月日： 2018年12月

記述言語：日本語会議種別：口頭発表（一般）
尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価

椋木大地、荻田武史、尾崎克久

第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018) 2018年12月2日

　詳細を見る

開催年月日： 2018年12月

記述言語：日本語会議種別：口頭発表（一般）
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Computational Reproducibility at Exascale 2018 (CRE2018) 2018年11月11日

　詳細を見る

開催年月日： 2018年11月

記述言語：英語会議種別：口頭発表（一般）
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Computational Reproducibility at Exascale 2018 (CRE2018) 2018年11月11日

　詳細を見る

開催年月日： 2018年11月

記述言語：英語会議種別：口頭発表（一般）
Accurate and cost-efficient triangular solve 国際会議

Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki

The 18th International Symposium on Scientific Computing 2018年9月11日

　詳細を見る

開催年月日： 2018年9月

記述言語：英語会議種別：口頭発表（一般）
High Performance Implementation of Accurate Matrix Multiplications on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita

The 18th International Symposium on Scientific Computing 2018年9月11日

　詳細を見る

開催年月日： 2018年9月

記述言語：英語会議種別：口頭発表（一般）
Accurate and cost-efficient triangular solve 国際会議

Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki

The 18th International Symposium on Scientific Computing 2018年9月11日

　詳細を見る

開催年月日： 2018年9月

記述言語：英語会議種別：口頭発表（一般）
High Performance Implementation of Accurate Matrix Multiplications on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita

The 18th International Symposium on Scientific Computing 2018年9月11日

　詳細を見る

開催年月日： 2018年9月

記述言語：英語会議種別：口頭発表（一般）
High-performance implementations of reproducible and accurate matrix-multiplication 国際会議

Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita

10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18) 2018年6月27日

　詳細を見る

開催年月日： 2018年6月

記述言語：英語会議種別：口頭発表（一般）
High-performance implementations of reproducible and accurate matrix-multiplication 国際会議

Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita

10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18) 2018年6月27日

　詳細を見る

開催年月日： 2018年6月

記述言語：英語会議種別：口頭発表（一般）
Automatic Generation of Full-Set Batched BLAS 国際会議

Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2018) 2018年6月26日

　詳細を見る

開催年月日： 2018年6月

記述言語：英語会議種別：ポスター発表
Automatic Generation of Full-Set Batched BLAS 国際会議

Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2018) 2018年6月26日

　詳細を見る

開催年月日： 2018年6月

記述言語：英語会議種別：ポスター発表
Performance Analysis of 2.5D-PDGEMM on the K Computer 国際会議

Daichi Mukunoki, Toshiyuki Imamura

SIAM Conference on Parallel Processing for Scientific Computing (PP18) 2018年3月8日

　詳細を見る

開催年月日： 2018年3月

記述言語：英語会議種別：口頭発表（一般）
Performance Analysis of 2.5D-PDGEMM on the K Computer 国際会議

Daichi Mukunoki, Toshiyuki Imamura

SIAM Conference on Parallel Processing for Scientific Computing (PP18) 2018年3月8日

　詳細を見る

開催年月日： 2018年3月

記述言語：英語会議種別：口頭発表（一般）
次世代計算機のための数値計算ライブラリの実装技術

椋木大地

日本応用数理学会三部会連携「応用数理セミナー」 2017年12月26日

　詳細を見る

開催年月日： 2017年12月

記述言語：日本語会議種別：口頭発表（一般）
次世代計算機のための数値計算ライブラリの実装技術

椋木大地

日本応用数理学会三部会連携「応用数理セミナー」 2017年12月26日

　詳細を見る

開催年月日： 2017年12月

記述言語：日本語会議種別：口頭発表（一般）
HPC分野における精度保証付き数値計算学の展開

荻田武史、椋木大地、尾崎克久

第3回CDMSI（ポスト「京」重点課題（７））シンポジウム 2017年12月5日

　詳細を見る

開催年月日： 2017年12月

記述言語：日本語会議種別：ポスター発表
HPC分野における精度保証付き数値計算学の展開

荻田武史、椋木大地、尾崎克久

第3回CDMSI（ポスト「京」重点課題（７））シンポジウム 2017年12月5日

　詳細を見る

開催年月日： 2017年12月

記述言語：日本語会議種別：ポスター発表
Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer 国際会議

Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2017) 2017年6月20日

　詳細を見る

開催年月日： 2017年6月

記述言語：英語会議種別：ポスター発表
Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer 国際会議

Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2017) 2017年6月20日

　詳細を見る

開催年月日： 2017年6月

記述言語：英語会議種別：ポスター発表
Reduced-/Extended-precision BLASの実装方法の検討

椋木大地、今村俊幸

Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017) 2017年3月27日

　詳細を見る

開催年月日： 2017年3月

記述言語：日本語会議種別：口頭発表（一般）
Reduced-/Extended-precision BLASの実装方法の検討

椋木大地、今村俊幸

Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017) 2017年3月27日

　詳細を見る

開催年月日： 2017年3月

記述言語：日本語会議種別：口頭発表（一般）
Implementation Techniques for High Performance BLAS Kernels on Modern GPUs 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE17) 2017年2月28日

　詳細を見る

開催年月日： 2017年2月

記述言語：英語会議種別：口頭発表（一般）
Implementation Techniques for High Performance BLAS Kernels on Modern GPUs 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE17) 2017年2月28日

　詳細を見る

開催年月日： 2017年2月

記述言語：英語会議種別：口頭発表（一般）
PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討

椋木大地、今村俊幸、高橋大介

GTC Japan 2016 2016年10月5日

　詳細を見る

開催年月日： 2016年10月

記述言語：日本語会議種別：ポスター発表
PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討

椋木大地、今村俊幸、高橋大介

GTC Japan 2016 2016年10月5日

　詳細を見る

開催年月日： 2016年10月

記述言語：日本語会議種別：ポスター発表
KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2016年度年会 2016年9月13日

　詳細を見る

開催年月日： 2016年9月

記述言語：日本語会議種別：ポスター発表
KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2016年度年会 2016年9月13日

　詳細を見る

開催年月日： 2016年9月

記述言語：日本語会議種別：ポスター発表
Performance Evaluation of Verified Computation for Linear Systems on Supercomputer 国際会議

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi

SIAM: East Asian Section Conference (EASIAM 2016) 2016年6月20日

　詳細を見る

開催年月日： 2016年6月

記述言語：英語会議種別：口頭発表（一般）
Performance Evaluation of Verified Computation for Linear Systems on Supercomputer 国際会議

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi

SIAM: East Asian Section Conference (EASIAM 2016) 2016年6月20日

　詳細を見る

開催年月日： 2016年6月

記述言語：英語会議種別：口頭発表（一般）
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

The 6th AICS International Symposium 2016年2月22日

　詳細を見る

開催年月日： 2016年2月

記述言語：英語会議種別：ポスター発表
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

The 6th AICS International Symposium 2016年2月22日

　詳細を見る

開催年月日： 2016年2月

記述言語：英語会議種別：ポスター発表
Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016) 2016年2月19日

　詳細を見る

開催年月日： 2016年2月

記述言語：英語会議種別：口頭発表（一般）
Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016) 2016年2月19日

　詳細を見る

開催年月日： 2016年2月

記述言語：英語会議種別：口頭発表（一般）
Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers 国際会議

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi

2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016) 2016年1月19日

　詳細を見る

開催年月日： 2016年1月

記述言語：英語会議種別：ポスター発表
Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers 国際会議

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi

2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016) 2016年1月19日

　詳細を見る

開催年月日： 2016年1月

記述言語：英語会議種別：ポスター発表
GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価

椋木大地、今村俊幸、高橋大介

GTC Japan 2015 2015年9月18日

　詳細を見る

開催年月日： 2015年9月

記述言語：日本語会議種別：ポスター発表
GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価

椋木大地、今村俊幸、高橋大介

GTC Japan 2015 2015年9月18日

　詳細を見る

開催年月日： 2015年9月

記述言語：日本語会議種別：ポスター発表
京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2015年度年会 2015年9月9日

　詳細を見る

開催年月日： 2015年9月

記述言語：日本語会議種別：ポスター発表
京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2015年度年会 2015年9月9日

　詳細を見る

開催年月日： 2015年9月

記述言語：日本語会議種別：ポスター発表
High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

GPU Technology Conference (GTC 2015) 2015年3月17日

　詳細を見る

開催年月日： 2015年3月

記述言語：英語会議種別：ポスター発表
High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

GPU Technology Conference (GTC 2015) 2015年3月17日

　詳細を見る

開催年月日： 2015年3月

記述言語：英語会議種別：ポスター発表
疑似四倍精度拡張数学パッケージQP-Pack

今村俊幸、椋木大地、佐々成正、山田進、町田昌彦

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

開催年月日： 2015年1月

記述言語：日本語会議種別：ポスター発表
Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装

椋木大地、今村俊幸、高橋大介

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

開催年月日： 2015年1月

記述言語：日本語会議種別：ポスター発表
スーパコンピュータ京における倍々精度演算の高速化

佐々木信一、藤井昭宏、田中輝雄、椋木大地、今村俊幸

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

開催年月日： 2015年1月

記述言語：日本語会議種別：ポスター発表
疑似四倍精度拡張数学パッケージQP-Pack

今村俊幸、椋木大地、佐々成正、山田進、町田昌彦

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

開催年月日： 2015年1月

記述言語：日本語会議種別：ポスター発表
Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装

椋木大地、今村俊幸、高橋大介

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

開催年月日： 2015年1月

記述言語：日本語会議種別：ポスター発表
スーパコンピュータ京における倍々精度演算の高速化

佐々木信一、藤井昭宏、田中輝雄、椋木大地、今村俊幸

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

開催年月日： 2015年1月

記述言語：日本語会議種別：ポスター発表
KeplerアーキテクチャGPUにおける高速なSGEMVの実装

椋木大地、今村俊幸、高橋大介

GTC Japan 2014 2014年7月16日

　詳細を見る

開催年月日： 2014年7月

記述言語：日本語会議種別：ポスター発表
KeplerアーキテクチャGPUにおける高速なSGEMVの実装

椋木大地、今村俊幸、高橋大介

GTC Japan 2014 2014年7月16日

　詳細を見る

開催年月日： 2014年7月

記述言語：日本語会議種別：ポスター発表
Linear Algebra Operations using Quadruple-precision Arithmetic on GPU 国際会議

Daichi Mukunoki, Daisuke Takahashi

GPU Technology Conference (GTC2014) 2014年3月24日

　詳細を見る

開催年月日： 2014年3月

記述言語：英語会議種別：ポスター発表
Linear Algebra Operations using Quadruple-precision Arithmetic on GPU 国際会議

Daichi Mukunoki, Daisuke Takahashi

GPU Technology Conference (GTC2014) 2014年3月24日

　詳細を見る

開催年月日： 2014年3月

記述言語：英語会議種別：ポスター発表
GPUにおける3倍精度演算と4倍精度疎行列反復解法

椋木大地、高橋大介

第3回多倍長精度計算フォーラム 2013年3月8日

　詳細を見る

開催年月日： 2013年3月

記述言語：日本語会議種別：口頭発表（一般）
GPUにおける3倍精度演算と4倍精度疎行列反復解法

椋木大地、高橋大介

第3回多倍長精度計算フォーラム 2013年3月8日

　詳細を見る

開催年月日： 2013年3月

記述言語：日本語会議種別：口頭発表（一般）
Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs 国際会議

Daichi Mukunoki, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE13) 2013年2月28日

　詳細を見る

開催年月日： 2013年2月

記述言語：英語会議種別：口頭発表（一般）
Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs 国際会議

Daichi Mukunoki, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE13) 2013年2月28日

　詳細を見る

開催年月日： 2013年2月

記述言語：英語会議種別：口頭発表（一般）
Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs 国際会議

Daichi Mukunoki, Daisuke Takahashi

Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12) 2012年5月7日

　詳細を見る

開催年月日： 2012年5月

記述言語：英語会議種別：ポスター発表
Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs 国際会議

Daichi Mukunoki, Daisuke Takahashi

Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12) 2012年5月7日

　詳細を見る

開催年月日： 2012年5月

記述言語：英語会議種別：ポスター発表
GPUによる4倍精度行列計算

椋木大地、高橋大介

2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） 2011年7月27日

　詳細を見る

開催年月日： 2011年7月

記述言語：日本語会議種別：口頭発表（一般）
GPUによる4倍精度行列計算

椋木大地、高橋大介

2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） 2011年7月27日

　詳細を見る

開催年月日： 2011年7月

記述言語：日本語会議種別：口頭発表（一般）
Exploring Multi-Agent Systems for HPC Code Development 招待有り国際会議

Daichi Mukunoki

The 2nd International Workshop on Foundational Large Language Models Advances for HPC (in conjunction with ISC-HPC 2026) (LLM4HPC 2026) 2026年6月26日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）

開催地：Hamburg 国名：ドイツ連邦共和国

researchmap
HPC-GENIE: High-Performance Computing with Generative Neural Intelligence for Execution

林俊一郎、椋木大地、星野哲也、片桐孝洋

xSIG 2025 2025年8月

　詳細を見る

記述言語：日本語会議種別：ポスター発表
GPU搭載スーパーコンピュータを用いたCOVID-19診断支援のための肺野セグメンテーションの高速化

湯淺義尚、小田昌宏、椋木大地、片桐孝洋、星野哲也、河合直聡、永井亨、森健策

第44回日本医用画像工学会大会（JAMIT 2025） 2025年8月28日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
LLMによるコード自動最適化「VibeCodeHPC」の開発状況と実験が示したマルチエージェントの優位性

林俊一郎、森田光貴、椋木大地、星野哲也、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025年10月20日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
LLMを用いた数値計算コードの自動生成・自動性能最適化への挑戦と展望

椋木大地、林俊一郎、星野哲也、森田光貴、片桐孝洋

物性研究所ソフトウェア開発・高度化プロジェクト研究会〜計算物質科学の発展を支えるオープンソースソフトウェアの開発と普及 2025年10月20日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
Implementation and Evaluation of 2.5D Matrix Multiplication on K Computer 国際会議

Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2017) 2017年6月20日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Automatic Generation of Full-Set Batched BLAS 国際会議

Yusuke Hirota, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2018) 2018年6月26日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

SC19 research poster session 2019年11月19日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments 国際会議

Roman Iakymchuk, Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki, Stef Graillat

ISC High Performance (ISC 2021) 2021年6月29日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Accurate Matrix Multiplication on Binary128 using Ozaki Scheme 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2021) 2021年6月29日

　詳細を見る

記述言語：英語会議種別：ポスター発表
A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

ISC High Performance (ISC 2022) 2022年6月1日

　詳細を見る

記述言語：英語会議種別：ポスター発表
tmBLAS: a Mixed Precision BLAS by C++ Template 国際会議

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura

ISC High Performance (ISC 2023) 2023年5月

　詳細を見る

記述言語：英語会議種別：ポスター発表
csDF: a double-float arithmetic library for the Cerebras CS-2 国際会議

Reo Nagashima, Akeru Nakamura, Kai Murakami, Ryunosuke Matsuzaki, Daichi Mukunoki, Takaaki Miyajima

SC25 research poster session 2025年11月16日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Verification of the Effectiveness of Deep Learning in Preprocessing Parameter Estimation for the Conjugate Gradient Method 国際会議

Takamasa Nakaya, Takahiro Katagiri, Tetsuya Hoshino, Daichi Mukunoki, Masatoshi Kawai

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
Evaluation of the Capability of Coding AI in Generating SYCL-Based Numerical Computation Codes for Intel GPUs 国際会議

Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
Performance Evaluation of SVM with Multiple Quantum-inspired Annealers 国際会議

Naoya Mizuki, Takahiro Katagiri, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
A Multi Agent System for Local LLM-Based HPC Code Generation 国際会議

Ryo Mikasa, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
Proposal of The AI Scientist v2 for High Performance Computing with Local Large Language Models 国際会議

Takanori Kotama, Rio Yokota, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
A Trial on Optimizing Test Sequences for LAPACK Eigenvalue Computation Routines using Machine Learning 国際会議

Hiroto Kashimura, Takahiro Katagiri, Shuji Morisaki, Daichi Mukunoki, Tetsuya Hoshino

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
GPUAcceleration of Medical Image Representation Learning Models with Distributed Data Parallel and I/O Optimization 国際会議

Koki Isobe, Daichi Mukunoki, Masahiro Oda, Tetsuya Oda, Kensaku Mori, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
VibeCodeHPC: A Multi-LLM Agent Auto-Tuner for HPC Codes 国際会議

Shun-Ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
DGEMM using FP64 Arithmetic Emulation and FP8 Tensor Cores with Ozaki Scheme 国際会議

Daichi Mukunoki

the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops (SCA/HPCAsiaWS '26) 2026年1月

　詳細を見る

記述言語：英語会議種別：ポスター発表
生成AIによるスーパーコンピュータのプログラム開発 ― HPC-GENIEプロジェクトの紹介招待有り

椋木大地

【第97回】大学等におけるオンライン教育とデジタル変革に関するサイバーシンポジウム「教育機関DXシンポ」 2026年3月16日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国

researchmap
Multiple and Mixed Precision BLAS with C++ Template 国際会議

Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura

5th R-CCS International Symposium 2023年2月6日

　詳細を見る

記述言語：英語会議種別：ポスター発表
binary128 に対する尾崎スキーム行列積

椋木大地、尾崎克久、荻田武史

第4回精度保証付き数値計算の実問題への応用研究集会 (NVR 2020) 2020年11月28日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Fast rounding error estimation for compute-intensive operations using standard floating-point arithmetic 国際会議

Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Roman Iakymchuk

Rencontres Arithmétiques de l'Informatique Mathématique (RAIM) 2021年5月

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
DGEMM using Tensor Cores 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SIAM Conference on Computational Science and Engineering (CSE21) 2021年3月4日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
精度自動チューニングに向けた基盤技術の検討

椋木大地

第13回自動チューニング技術の現状と応用に関するシンポジウム (ATTA2021) 2021年12月13日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Performance Evaluation of Batched BLAS on A64FX 国際会議

Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura

4th R-CCS International Symposium (lightning talk) 2022年2月7日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
行列積に対する試行型エラーフリー変換に対する誤差の対処法とその応用

尾崎克久、椋木大地、荻田武史

日本応用数理学会第18回研究部会連合発表会 2022年3月8日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Flying restart付きCG法に対する混合精度演算による近似解精度の向上

相原研輔、尾崎克久、椋木大地

日本応用数理学会第18回研究部会連合発表会 2022年3月9日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Impact and Contribution of Ozaki scheme in High Performance Computing 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

International Workshop on Reliable Computing and Computer-Assisted Proofs (ReCAP 2022) 2022年3月15日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
A mixed-precision algorithm of the CG method using the group-wise update strategy 国際会議

Kensuke Aihara, Katsuhisa Ozaki, Daichi Mukunoki

The 41st JSST Annual International Conference on Simulation Technology (JSST2022) 2022年9月2日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
疎行列ベクトル積における低精度データ表現の導入について

椋木大地、河合直聡

第14回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2022） 2022年12月23日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Multiple- and Mixed-Precision BLAS with C++ Template 国際会議

Toshiyuki Imamura, Daichi Mukunoki, Atsushi Suzuki

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023年8月24日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Reduced-Precision Data Representation on Sparse Matrix-Vector Multiplications 国際会議

Daichi Mukunoki, Masatoshi Kawai, Toshiyuki Imamura

10th International Congress on Industrial and Applied Mathematics (ICIAM 2023) 2023年8月21日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
LLMによるBLASコード生成に関する考察

椋木大地

第33回AT研究会オープンアカデミックセッション（ATOS33） 2025年7月28日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
汎用LLMによるBLASコード自動生成能力の考察

椋木大地

第6回スーパーコンピュータ「不老」ユーザ会 2025年9月11日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Challenges and Prospects in Automatic Generation of HPC Codes Using Generative AI 国際会議

Daichi Mukunoki

The 6th "FugakuNEXT" Application Seminar 2025年9月25日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
生成AIを活用した数値計算・HPCコード自動生成への挑戦と展望

林俊一郎、椋木大地

2025年度第2回物性アプリオープンフォーラム 2025年9月29日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Automatic Generation and GPU Porting of Numerical Computation Codes Using Generative AI 国際会議

Daichi Mukunoki

58th ASE Seminar 2025年12月1日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Automatic Generation of Numerical Codes for GPUs Using LLMs 国際会議

Daichi Mukunoki

JHPCN Field Workshop: State-of-the-Art in Code Generative AI for High-Performance Computing 2025年12月5日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
AI時代のハードウェアとFP64エミュレーション

椋木大地

第17回自動チューニング技術の現状と応用に関するシンポジウム（ATTA2025） 2025年12月23日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
生成AIによるHPCコード開発の革新に向けて：HPC-GENIEプロジェクトの取り組みと展望

椋木大地

情報処理学会東海支部主催第6回講演会 2025年1月9日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
生成AIの活用によるHPCコードGPU化の展望

椋木大地

「次世代計算基盤を見据えたソフトウェア環境整備とそれを担う人材の育成に関する提言」についての意見交換会 2026年1月21日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
高性能計算のためのコード生成AIエージェント開発

椋木大地

MateriAI 2025 〜計算物質科学分野におけるAI技術の活用 2026年2月2日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Toward Automatic Generation of High Performance Numerical Codes by LLMs 国際会議

Daichi Mukunoki, Koki Morita, Hayashi Shun-ichiro, Tetsuya Hoshino, Takahiro Katagiri

SIAM Conference on Parallel Processing for Scientific Computing (PP26) 2026年3月

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Performance Comparison of Double, Triple and Quadruple Precision Real and Complex BLAS Subroutines on GPUs 国際会議

Daichi Mukunoki, Daisuke Takahashi

Proc. ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way? (ATIP/A*CRC Workshop '12) 2012年5月7日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Linear Algebra Operations using Quadruple-precision Arithmetic on GPU 国際会議

Daichi Mukunoki, Daisuke Takahashi

GPU Technology Conference (GTC2014) 2014年3月24日

　詳細を見る

記述言語：英語会議種別：ポスター発表
KeplerアーキテクチャGPUにおける高速なSGEMVの実装

椋木大地、今村俊幸、高橋大介

GTC Japan 2014 2014年7月16日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
疑似四倍精度拡張数学パッケージQP-Pack

今村俊幸、椋木大地、佐々成正、山田進、町田昌彦

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
スーパコンピュータ京における倍々精度演算の高速化

佐々木信一、藤井昭宏、田中輝雄、椋木大地、今村俊幸

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
Kepler・MaxwellアーキテクチャGPUにおける性能が行列形状に依存しない高速なGEMVの実装

椋木大地、今村俊幸、高橋大介

Annual Meeting on Advanced Computing System and Infrastructure (ACSI) 2015論文集 2015年1月26日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
High-Performance GEMV and SYMV with Auto-Tuning for Performance Stabilization on Multiple GPU Generations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

GPU Technology Conference (GTC 2015) 2015年3月17日

　詳細を見る

記述言語：英語会議種別：ポスター発表
GPUにおけるスレッド数自動選択機能を持ったメモリ律速な線形計算カーネル群「MUBLAS」の実装と評価

椋木大地、今村俊幸、高橋大介

GTC Japan 2015 2015年9月18日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
京コンピュータ向け数値計算ライブラリ群KMATHLIBの実装

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2015年度年会 2015年9月9日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
Performance Evaluation of Verified Computation for Linear Systems on Parallel Computers 国際会議

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin'ichi Oishi

2nd Annual Meeting on Advanced Computing System and Infrastructure (ACSI2016) 2016年1月19日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Introduction of Research Activities for GPU Computing at Large-scale Parallel Numerical Computing Technology Research Team on AICS 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

The 6th AICS International Symposium 2016年2月22日

　詳細を見る

記述言語：英語会議種別：ポスター発表
KMATHLIB -High Performance and Scalable Numerical Library for the K Computer-

大井祥栄、廣田悠輔、椋木大地、今村俊幸

応用数理学会2016年度年会 2016年9月13日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
PascalアーキテクチャGPUにおける線形計算カーネルの実装技術の検討

椋木大地、今村俊幸、高橋大介

GTC Japan 2016 2016年10月5日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
HPC分野における精度保証付き数値計算学の展開

荻田武史、椋木大地、尾崎克久

第3回CDMSI（ポスト「京」重点課題（７））シンポジウム 2017年12月5日

　詳細を見る

記述言語：日本語会議種別：ポスター発表
Development of Scientific Numerical Libraries on post-K computer 国際会議

Toshiyuki Imamura, Yusuke Hirota, Daichi Mukunoki, Shuhei Kudo, Akiyoshi Kuroda, Naoki Sueyasu

1st R-CCS International Symposium 2019年2月18日

　詳細を見る

記述言語：英語会議種別：ポスター発表
OzBLAS: Accurate and Reproducible BLAS Based on Ozaki Scheme 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

GPU Technology Conference (GTC 2019) 2019年3月17日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Accurate and Reproducible Linear Algebra Operations for Many-core Architectures 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Russian Supercomputing Days 2019 (RuSCDays 2019) 2019年9月23日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Reduced and Extended-Precision Computations on FPGAs and GPUs 国際会議

Yiyu Tan, Daichi Mukunoki, Toshiyuki Imamura, Norihisa Fujita, Taisuke Boku

The 11th symposium on Discovery 2019年10月15日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

France-Japan-Germany trilateral workshop: Convergence of HPC and Data Science for Future Extreme Scale Intelligent Applications 2019年11月7日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Optimizing Precision for High-Performance, Robust, and Energy-Efficient Computations 国際会議

Roman Iakymchuk, Fabienne Jézéquel, Stef Graillat, Daichi Mukunoki, Toshiyuki Imamura, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Norihisa Fujita, Taisuke Boku

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020年1月15日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Accurate DGEMM using Tensor Cores 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The International Conference on High Performance Computing in Asia-Pacific Region (HPCAsia 2020) 2020年1月15日

　詳細を見る

記述言語：英語会議種別：ポスター発表
An FPGA-based Matrix Multiplier with Task Parallelism 国際会議

Yiyu Tan, Toshiyuki Imamura, Daichi Mukunoki

2nd R-CCS International Symposium 2020年2月17日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Toshiyuki Imamura, Daichi Mukunoki, Yiyu Tan, Atsushi Koshiba, Jens Huthmann, Kentaro Sano, Fabienne Jézéquel, Stef Graillat, Roman Iakymchuk, Norihisa Fujita, Taisuke Boku

2nd R-CCS International Symposium 2020年2月17日

　詳細を見る

記述言語：英語会議種別：ポスター発表
High-Precision, Accurate, and Reproducible Linear Algebra Operations using Ozaki Scheme 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura, Roman Iakymchuk

3rd R-CCS International Symposium 2021年2月15日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Remedies for Reproducibility Issue in Conjugate Gradient Solvers 国際会議

Daichi Mukunoki, Roman Iakymchuk, Fabienne Jezequel, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

SparseDays2022 2022年6月20日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Accurate Matrix Computations using Ozaki Scheme on CPUs and GPUs 国際会議

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

The 30th Anniversary Symposium of the Center for Computational Sciences at the University of Tsukuba 2022年10月14日

　詳細を見る

記述言語：英語会議種別：ポスター発表
Conjugate Gradient Solvers with Accuracy and Reproducibility Guarantees in Hybrid Parallel Environments 国際会議

Roman Iakymchuk, Daichi Mukunoki

Sparse Days Cerfacs 2020年11月24日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
GPUによる4倍精度行列計算

椋木大地、高橋大介

2011年並列／分散／協調処理に関する『鹿児島』サマー・ワークショップ（SWoPP鹿児島2011） 2011年7月27日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Iterative Method for Sparse Linear Systems using Quadruple Precision Operations on GPUs 国際会議

Daichi Mukunoki, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE13) 2013年2月28日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
GPUにおける3倍精度演算と4倍精度疎行列反復解法

椋木大地、高橋大介

第3回多倍長精度計算フォーラム 2013年3月8日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Automatic Thread-Block Size Adjustment for Dense Matrix-Vector Multiplication on CUDA 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 Conference on Advanced Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2016) 2016年2月19日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Performance Evaluation of Verified Computation for Linear Systems on Supercomputer 国際会議

Yusuke Morikura, Daichi Mukunoki, Takeshi Fukaya, Naoya Yamanaka, Shin’ichi Oishi

SIAM: East Asian Section Conference (EASIAM 2016) 2016年6月20日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Implementation Techniques for High Performance BLAS Kernels on Modern GPUs 国際会議

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

SIAM Conference on Computational Science and Engineering (CSE17) 2017年2月28日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Reduced-/Extended-precision BLASの実装方法の検討

椋木大地、今村俊幸

Fifth Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2017) 2017年3月27日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
次世代計算機のための数値計算ライブラリの実装技術

椋木大地

日本応用数理学会三部会連携「応用数理セミナー」 2017年12月26日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
Performance Analysis of 2.5D-PDGEMM on the K Computer 国際会議

Daichi Mukunoki, Toshiyuki Imamura

SIAM Conference on Parallel Processing for Scientific Computing (PP18) 2018年3月8日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
High-performance implementations of reproducible and accurate matrix-multiplication 国際会議

Daichi Mukunoki, Roman Iakymchuk, Stef Graillat, Takeshi Ogita

10th International Workshop on Parallel Matrix Algorithms and Applications (PMAA18) 2018年6月27日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Accurate and cost-efficient triangular solve 国際会議

Roman Iakymchuk, Pedro Valero-Lara, Daichi Mukunoki

The 18th International Symposium on Scientific Computing 2018年9月11日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
High Performance Implementation of Accurate Matrix Multiplications on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita

The 18th International Symposium on Scientific Computing 2018年9月11日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
High Performance Implementation of Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

Computational Reproducibility at Exascale 2018 (CRE2018) 2018年11月11日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
尾崎スキームによる高精度かつ再現性のあるBLASルーチンの実装と評価

椋木大地、荻田武史、尾崎克久

第2回精度保証付き数値計算の実問題への応用研究集会 (NVR 2018) 2018年12月2日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
尾崎スキームに基づく高精度かつ再現性のあるBLASルーチンの実装と自動チューニングの適用

椋木大地

第22回AT研究会オープンアカデミックセッション（ATOS22） 2019年5月13日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
High-Performance Implementations of Accurate and Reproducible BLAS Routines on GPUs 国際会議

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2019 June) 2019年6月7日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Accurate and Reproducible CG Method on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

European Numerical Mathematics and Advanced Applications Conference 2019 (ENUMATH2019) 2019年10月1日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
尾崎スキームによる高精度BLAS実装「OzBLAS」とその応用

椋木大地、荻田武史、尾崎克久

第3回精度保証付き数値計算の実問題への応用研究集会 (NVR 2019) 2019年12月1日

　詳細を見る

記述言語：日本語会議種別：口頭発表（一般）
High-performance Implementations of Accurate Linear Algebra Kernels on GPUs 国際会議

Daichi Mukunoki, Takeshi Ogita

3rd International Conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST) 2020年1月9日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki

Sapporo Winter HPC Seminar 2020 2020年1月24日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Accurate BLAS implementations: OzBLAS and BLAS-DOT2 国際会議

Daichi Mukunoki

Workshop on Largescale Parallel Numerical Computing Technology (LSPANC 2020 January) 2020年1月30日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
Minimal-Precision Computing for High-Performance, Energy-Efficient, and Reliable Computations 国際会議

Daichi Mukunoki

SIAM Conference on Parallel Processing for Scientific Computing (PP20) 2020年2月15日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）
DGEMM using Tensor Cores and OzBLAS 国際会議

Daichi Mukunoki

11th Joint Laboratory for Extreme Scale Computing (JLESC) Workshop 2020年9月8日

　詳細を見る

記述言語：英語会議種別：口頭発表（一般）

▼全件表示

講演・口頭発表等の先頭へ▲

科研費 7

AIスーパーコンピュータにおける科学技術計算加速のための高精度演算技術応用

研究課題/研究課題番号：25K24387 2025年7月 - 2027年3月

日本学術振興会科学研究費助成事業研究活動スタート支援

椋木大地

　詳細を見る

researchmap
次世代計算機のための高精度かつ精度検証可能な行列計算法の開発

研究課題/研究課題番号：20KK0259 2022年4月 - 2023年10月

日本学術振興会科学研究費助成事業国際共同研究加速基金(国際共同研究強化(A)) 国際共同研究加速基金(国際共同研究強化(A))

椋木大地

　詳細を見る

担当区分：研究代表者

配分額：9230000円（直接経費：7100000円、間接経費：2130000円）

2023年度は昨年度から引き続き，ソルボンヌ大側で開発されたadaptive precision SpMV（疎行列ベクトル積における理論誤差解析に基づく適応的な精度最適化の手法）に，我々が開発した高速削減精度メモリアクセッサー（RpFp）を組み合わせることによる高速化を検討した．昨年度から継続していたSpMVにおけるRpFpの適用に関して，国際学会（ICIAM2023）における口頭発表，国際学会（MCSoC2023）における査読付き論文投稿（採択され2023年12月発表予定）を行った．さらに2023年9-10月に仏ソルボンヌ大に渡航し，RpFpによるadaptive precision SpMVの高速化を実証し，結果は仏のプレプリントサーバHALにおいて出版した（hal-04261073）．仏滞在期間中にはCADNAによる精度検証付きBLAS/LAPACKの開発に関して，C++テンプレートBLAS実装を活用した実装法に関する議論，またペルピニャン大学David Defour教授を訪問し混合精度演算のための高性能メモリアクセッサーの実装に関する議論とプロトタイプ実装も行われた．
また既課題からの発展として，東京都市大学相原研輔准教授らとの共同研究により，DotKスキームの応用による高精度かつ高速な混合精度疎行列反復法ソルバーに関する研究を実施した．また芝浦工業大学尾崎克久教授らとの共同研究により尾崎スキームの細粒度な精度調整に関する研究を実施した．これらは現在論文投稿中（査読中）である．
本課題は研究代表者の退職により2023年10月末を持って廃止となった．1年半の全期間での主な成果は上記2023年中に発表した内容が主であり，幾つかの研究については成果創出に結びつく前に中断となったが，本課題による議論や得られた知見は共同研究者に引き継がれ今後発展することが期待される．

researchmap
超並列計算環境のための高精度かつ再現性のある行列計算ライブラリの開発

研究課題/研究課題番号：19K20286 2019年4月 - 2022年3月

日本学術振興会科学研究費助成事業若手研究若手研究

椋木大地

　詳細を見る

本研究の目的は数値計算における計算の高精度化および再現性の保証を実現し，かつ最先端の超並列計算機アーキテクチャにおいて高性能を実現できるBLASライブラリの開発を行うことである．本研究では4つの手法：(1)尾崎スキーム，(2)ExBLASスキーム，(3)DotKスキーム，(4)CADNAスキームに着目し，このうち(1)を主たる手法として検討する．
2019年度は主として(1)(4)に関する進捗が得られた．(1)に関してはCPU・GPU向けのBLASの基本ルーチンを開発し，オープンソースソフトウェアとして公開した．またこれらに関する査読付き論文を国際学会（PPAM2019）において発表した．さらにその応用として，疎行列反復解法（CG法）への適用，FP16の活用に関する研究を前倒しして実施した（これらは当初2021年度の実施を予定していた）．このうち後者については，FP16/32の混合精度ハードウェアであるTensor Coresを活用して高速に高精度・再現性のある実装を行う方法を開発し，査読付き論文が国際学会（ISC2020）に採択された．また(4)CADNAスキームについては，その開発元であり共同研究を進めているソルボンヌ大学側で新しい手法が考案され，共著者として参加した論文を国際学会に投稿した（プレプリント公開済み，現在査読中）．
一方，計算結果の精度を担保しながら数値計算に用いられる演算精度を最適化して計算の高速化，省電力化を実現する方法の研究を開始した．本科研費課題で取り組む上記(1)-(4)の手法はその要素技術となりうるため，本研究の応用として位置付けられる．これに関しては本年度は国際会議（SC19）での査読付きポスター発表を行った．

researchmap
高性能・省電力な計算のための短尺浮動小数点表現の検討

研究課題/研究課題番号：16K16062 2016年4月 - 2019年3月

日本学術振興会科学研究費助成事業若手研究(B) 若手研究(B)

椋木大地

　詳細を見る

本研究では数値計算において広く用いられている32/64ビットのIEEE浮動小数点フォーマットに対して，ビット長が短い短尺フォーマットを導入することにより，計算の高速化と省電力化が可能であるかを検討した．ソフトウェアによる軽量な実装方法を検討するとともに，主にGPUをターゲットとして，数値計算に用いられる基本的な線形計算カーネルで性能がデータアクセス律速となるものにおいて，計算速度と電力性能の両面での有効性を示した．

researchmap
ＧＰＵスパコンのための３倍・４倍精度線形演算ライブラリの開発に関する研究

研究課題/研究課題番号：13J01290 2013年4月 - 2015年3月

日本学術振興会科学研究費助成事業特別研究員奨励費特別研究員奨励費

椋木大地

　詳細を見る

本研究の目的は，GPUスパコンにおける3倍・4倍精度演算の実用化を目的として，GPUにおける高性能な3倍・4倍精度線形計算ライブラリの実現に向けた基礎研究を行うことであった．本年度は主として，GPUにおける複数の演算精度に対応した線形計算ライブラリの効率的な実装手法に関する研究を行った，その結果として，複数のNVIDIA GPUアーキテクチャに対応した高速な行列ベクトル積ルーチン（GEMV）の実装手法を開発した．本実装ではGPUにおけるプログラムの実行メカニズムをモデル化し，実行効率が最大となるようなスレッドブロックサイズを自動的に決定するオンライン自動チューニングを採用する．これにより既存の実装と比べ，実行環境や問題サイズに依存して生じる性能の変動を防ぎ，常に高い性能を維持できる．本手法は，ある線形計算を行うプログラム（例えばBLASルーチンなど）において演算精度が異なる複数バージョンを実装・最適化する上で有効であると考えられる．またこの他に，4倍精度演算手法の応用として，倍精度演算性能が単精度演算性能の1/32であるNVIDIA社の最新GPUにおいて，ソフトウェアエミュレーションによる疑似倍精度演算を実装し，倍精度行列積ルーチン（DGEMM）においてハードウェア処理による実装を上回る性能が得られることを示した．本年度に開発したGPU向けソフトウェアの一部は，オープンソースのライブラリとしてウェブ上で公開しており，今後も開発を継続する予定である．

researchmap
エクサ時代の非同期タスクを応用した高性能高次元数値線形代数の研究

研究課題/研究課題番号：19H04127 2019年4月 - 2022年3月

日本学術振興会科学研究費助成事業基盤研究(B) 基盤研究(B)

今村俊幸, 工藤周平, 廣田悠輔, 鈴木智博, 椋木大地, 鈴木厚

　詳細を見る

担当区分：研究分担者

本年度は、研究計画の初年度として本基盤研究が目指す数値計算アルゴリズム由来の難スケジューリングについて調査とプロトタイプ実装による効果と問題点の絞りだしを行った。問題点は今後整理し、スケジューラプロトタイプの主要機能として実装を進める。数値計算アルゴリズム由来のスケジューリング調査について、まず、バッチスケジュール方式を中心とした、内部での細粒度パイプライン処理方式の開発と実問題でのテストを行い、スケジューリング方式の予備調査として中間結果を国際会議に発表している（高次FFTと粗密混合精度行列計算）。非同期ならびに優先度付きスケジューリングについては、本研究の中核をなす新規提案であり十分な予備調査と試験実装が行われた。特に, 既存言語であるOpenMPのtask構文とpriority句などの優先順位付けのアルゴリズムと我々が所望する数値アルゴリズムとの整合性（機能実現性・親和性・表現能力も含めて）研究を実施し国内外の研究会で報告をしている。行列分解アルゴリズムをCPU/GPUハイブリッド環境上に実装し、同実装においてpriority句によるタスクの優先順位付けにより、並列実行可能なタスクを部分的に増加できることが分かったが、その効果はさほど大きくないなどの結果を得ている。既存スケジューラの機能調査として, INRIAが開発をしているStarPUを調査し、その内部機能や基本性能を確認した。実情は我々がDissectionで構成しているタスクスケジューラよりもオーバーヘッドが大きく定量的に20～30%の性能低下が見込まれれた. さらに既存スケジューラで活用されているDAGの表現能力についても調査し, LDU分解の限られた範囲であるが数値計算アルゴリズムにDAGの記述能力が問題にならないなど、次年度に展開する良好な調査結果を得ることができた。

researchmap
O(1億)コア環境におけるスケーラブルな数値計算ソフトウェアの理論と応用

研究課題/研究課題番号：15H02709 2015年4月 - 2018年3月

日本学術振興会科学研究費助成事業基盤研究(B) 基盤研究(B)

今村俊幸, 大井祥栄, 深谷猛, 廣田悠輔, 椋木大地, 山本有作, 藤堂眞治

　詳細を見る

本研究は、数万から数億のコアプロセッサが搭載される計算システム環境下において、過去に蓄積された高性能な数値計算サービスを新しい数学原理に基づき実現することを目的にし、「異粒度数値カーネル構築」と共に「非同期的な数値計算アルゴリズム」の２大テーマのもと、１）非同期的数値計算アルゴリズムに関する理論と実用レベルにある省通信・省同期アルゴリズムについて研究しCAHTRやFDTD向けの手法を提案した。更に、２）超メニイコアでのスケーラブルな軽量コード生成のための自動チューニングなどの核基盤技術研究を推進し次世代数値計算ソフトウェアの新技術創出に繋がる新機軸探究を進めた。

researchmap

▼全件表示

科研費の先頭へ▲

担当経験のある科目 (本学) 3

数値解析及び演習

2025
コンピュータ科学実験a

2025
コンピュータ科学実験b

2025

担当経験のある科目 (本学)の先頭へ▲

担当経験のある科目 (本学以外) 1

情報処理技法（リテラシ）II

2018年9月 - 2019年1月（東京女子大学）

　詳細を見る

researchmap

担当経験のある科目 (本学以外)の先頭へ▲