研究者詳細 - 星野　哲也

写真a

ホシノ　テツヤ

星野　哲也

HOSHINO Tetsuya

所属

情報基盤センターデータサイエンス研究部門准教授

大学院担当

大学院情報学研究科

研究分野 1

情報通信 / 高性能計算

研究分野の先頭へ▲

経歴 2

名古屋大学情報基盤センター准教授

2023年1月 - 現在
東京大学情報基盤センター助教

2016年1月 - 2022年12月

経歴の先頭へ▲

論文 26

Azure CycleCloud利用環境の調査とスパコンセンター・クラウド連携に関する考察

永井亨, 五十木秀一, 河合直聡, 片桐孝洋, 星野哲也

学術情報処理研究 28 巻 ( 1 ) 頁： 114 - 124 2024年11月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人大学ICT推進協議会

パブリッククラウド利用環境の調査を主な目的としてMicrosoft Azureを対象にした仮想マシンの性能測定を行った．具体的には名古屋大学情報基盤センターと日本マイクロソフト社との共同研究のもとでHPC利用環境に特化したAzure CycleCloudを使用して種々のベンチマークプログラムを仮想マシン上で実行した．本稿ではAzure CycleCloudの利用環境と仮想マシン上でのベンチマークテスト結果について報告し，スーパーコンピュータセンターとパブリッククラウドの連携について考察する．

DOI： 10.24669/jacn.28.1_114

CiNii Research
Auto-Tuning Mixed-Precision Computation by Specifying Multiple Regions

Ren, XZB; Kawai, M; Hoshino, T; Katagiri, T; Nagai, T

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 37 巻 ( 2 ) 2024年11月

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Concurrency and Computation: Practice and Experience

Mixed-precision computation is a promising method for substantially improving high-performance computing applications. However, using mixed-precision data is a double-edged sword. While it can improve computational performance, the reduction in precision introduces more uncertainties and errors. As a result, precision tuning is necessary to determine the optimal mixed-precision configurations. Much effort is therefore spent on selecting appropriate variables while balancing execution time and numerical accuracy. Auto-tuning (AT) is one of the technologies that can assist in alleviating this intensive task. In recent years, ppOpen-AT, an AT language, introduced a directive for mixed-precision tuning called “Blocks.” In this study, we investigated an AT strategy for the “Blocks” directive for multi-region tuning of a program. The non-hydrostatic icosahedral atmospheric model (NICAM), a global cloud-resolving model, was used as a benchmark program to evaluate the effectiveness of the AT strategy. Experimental results indicated that when a single region of the program performed well in mixed-precision computation, combining these regions resulted in better performance. When tested on the supercomputer “Flow” Type I (Fujitsu PRIMEHPC FX1000) and Type II (Fujitsu PRIMEHPC CX1000) subsystems, the mixed-precision NICAM benchmark program tuned by the AT strategy achieved a speedup of nearly 1.31× on the Type I subsystem compared to the original double-precision program, and a 1.12× speedup on the Type II subsystem.

DOI： 10.1002/cpe.8326

Web of Science

Scopus
Optimize Efficiency of Utilizing Systems by Dynamic Core Binding

Kawai M., Ida A., Hanawa T., Hoshino T.

ACM International Conference Proceeding Series 頁： 77 - 82 2024年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ACM International Conference Proceeding Series

Load balancing at both the process and thread levels is imperative for minimizing application computation time in the context of MPI/OpenMP hybrid parallelization. This necessity arises from the constraint that, within a typical hybrid parallel environment, an identical number of cores is bound to each process. Dynamic Core Binding, however, adjusts the core binding based on the process’s workload, thereby realizing load balancing at the core level. In prior research, we have implemented the DCB library, which has two policies for computation time reduction or power reduction. In this paper, we show that the two policies provided by the DCB library can be used together to achieve both computation time reduction and power consumption reduction.

DOI： 10.1145/3636480.3637221

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/hpcasia/hpcasia2024w.html#KawaiIHH24
FMOプログラムABINIT-MPの整備状況2023

望月祐志, 中野達也, 坂倉耕太, 奥脇弘次, 土居英男, 加藤季広, 滝沢寛之, 成瀬彰, 大島聡史, 星野哲也, 片桐孝洋

23 巻 ( 1 ) 頁： 4 - 8 2024年

　詳細を見る

記述言語：日本語出版者・発行元：日本コンピュータ化学会

In August 2023, we released the latest version of our ABINIT-MP program, Open Version 2 Revision 8. In this version, the most commonly used FMO-MP2 calculations are even faster than in the previous Revision 4. It is now also possible to calculate excitation and ionization energies for regions of interest. Improved interaction analysis is also available. In addition, we have started GPU-oriented modifications. In this preliminary report, we present the current status of ABINIT-MP.

DOI： 10.2477/jccj.2024-0001

CiNii Research
Performance Evaluation of CMOS Annealing with Support Vector Machine

Fukuhara R., Morishita M., Katagiri T., Kawai M., Nagai T., Hoshino T.

Proceedings - 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2024 頁： 548 - 555 2024年

　詳細を見る

出版者・発行元：Proceedings - 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2024

In this study, support vector machine (SVM) performance was assessed using a quantum-inspired complementary metal-oxide semiconductor (CMOS) annealer. During performance evaluation, the accuracy rate in binary classification problems was the primary focus. SVM performance, when running on a CPU (classical computation) and quantum-inspired annealer, was comparatively analyzed. The performance outcome was evaluated using a CMOS annealing machine, and accuracy rates of 93.7%, 92.7%, and 97.6% were obtained for linearly separable problem and nonlinearly separable problems 1 and 2, respectively. According to these results, a CMOS annealing machine can achieve an accuracy rate that closely rivals that of classical computation.

DOI： 10.1109/MCSoC64144.2024.00094

Scopus
Implementing Fast Modal Filtering of SCALE-DG

Ren, XZB; Kawai, Y; Tomita, H; Nishizawa, S; Katagiri, T; Kawai, M; Hoshino, T; Nagai, T

2024 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING WORKSHOPS, CLUSTER WORKSHOPS 2024 頁： 150 - 151 2024年

　詳細を見る

出版者・発行元：Proceedings - 2024 IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2024

For future high-resolution atmospheric simulations, a dynamical core using the discontinuous Galerkin Method (DGM), called SCALE-DG [1], is being developed as an option for high-order fluid schemes in the SCALE library [2]. Compared to the traditional Finite Element Method (FEM), the DGM allows for discontinuities across element boundaries. When evaluating a first-order derivative operator, we use the values at nodes of own element and at common boundaries of neighbor elements. This feature allows most computations to be performed independently for each element. Thus, we can take full advantage of data locality. Additionally, the DGM can achieve high-order accuracy by choosing high-order polynomial basis functions within each element. Therefore, DGM is suitable for high-resolution atmospheric simulations with high-order numerical accuracy, and we expect the computational performance to be highly desirable on modern computer architectures.

DOI： 10.1109/CLUSTERWORKSHOPS61563.2024.00033

Web of Science

Scopus
Adaptation of XAI to Auto-tuning for Numerical Libraries Open Access

Aoki S., Katagiri T., Ohshima S., Kawai M., Nagai T., Hoshino T.

Proceedings - 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2024 頁： 556 - 563 2024年

　詳細を見る

出版者・発行元：Proceedings - 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC 2024

The unregulated utilization of Artificial Intelligence (AI) outputs, potentially leading to various societal issues, has received considerable attention. While humans routinely validate information, manually inspecting the vast volumes of AI-generated results is impractical. Therefore, automation and visualization are imperative. In this context, Explainable AI (XAI) technology is gaining prominence, aiming to streamline AI model development and alleviate the burden of explaining AI outputs to users. Simultaneously, software Auto-Tuning (AT) technology has emerged for reducing the man-hours required for performance tuning in numerical calculations. AT is a potent tool for cost reduction during parameter optimization and high-performance programming for numerical computing. The synergy between AT mechanisms and AI technology is noteworthy, with AI finding extensive applications in AT. However, applying AI to AT mechanisms introduces challenges in AI model explainability. This study focuses on XAI for AI models when integrated into two different processes for practical numerical computations: performance parameter tuning of accuracy-guaranteed numerical calculations and sparse iterative algorithm.

DOI： 10.1109/MCSoC64144.2024.00095

Scopus
Auto-Tuning Mixed-precision Computation by Specifying Multiple Regions

Ren X., Kawai M., Hoshino T., Katagiri T., Nagai T.

Proceedings - 2023 11th International Symposium on Computing and Networking, CANDAR 2023 頁： 175 - 181 2023年11月

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings - 2023 11th International Symposium on Computing and Networking, CANDAR 2023

Mixed-precision computation is a promising method for substantially increasing the speed of numerical computations. However, using mixed-precision data is a double-edged sword. Although it can improve the computational performance, the reduction in precision brings more uncertainties and errors. It is necessary to determine which variables can be represented with a lower-precision format without affecting the accuracy of the results. Hence, much effort is spent on selecting appropriate variables while considering the execution time and numerical accuracy. Auto-Tuning (AT) is one of several technologies that can assist in eliminating this intensive work. In this study, we investigated an AT strategy for the 'Blocks' directive in the auto-Tuning language ppOpen-AT to tune multiple regions of a program and evaluated the effectiveness. A benchmark program of the nonhydrostatic icosahedral atmospheric model (NICAM), which is a global cloud resolving model, was considered as a study case. Experimental results indicated that when a single part of the program could perform well in the mixed-precision computation, a combination achieved a better performance. When used on the Flow Type I Subsystem (The Fujitsu PRIMEHPC FX1000), this method achieved almost 1.27× speedup compared with the NICAM benchmark program using all double-precision data.

DOI： 10.1109/candar60563.2023.00031

Scopus
GPUスパコンを用いたPETの四次元再構成

大島聡史, 湯淺義尚, 松村海飛, 横田達也, 本谷秀堅, 坂田宗之, 木村裕一, 片桐孝洋, 永井亨, 塙敏博, 星野哲也

Ｍｅｄｉｃａｌ　Ｉｍａｇｉｎｇ　Ｔｅｃｈｎｏｌｏｇｙ 41 巻 ( 4-5 ) 頁： 150 - 156 2023年11月

　詳細を見る

記述言語：日本語出版者・発行元：日本医用画像工学会

医用画像処理技術の発達により，生体の内部を視覚的に理解するためのさまざまな技術が開発され，利用されている．しかし，それらにより直接的に得ることができるのは画像や映像であり，診断は医師など人の手によって行われている．これらの労力を軽減するソフトウェアへの期待は大きく，すでに医療の現場で利用されている技術も増えてきているが，医療（医用画像）と計算機技術の両方の知識と技術が必要なため，対象は限られている．そこで本研究では，医用画像処理分野と高性能計算分野の研究者が協力してPET における画像再構成の高速化と大規模化に取り組んでいる．本稿ではその取り組みの内容とこれまでに得られた成果を紹介する．

DOI： 10.11409/mit.41.150

CiNii Research
Implementation of Radio Wave Propagation using RT Cores and Consideration of Programming Models

Hashinoki, S; Ohshima, S; Katagiri, T; Nagai, T; Hoshino, T

2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 頁： 673 - 681 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023

With the NVIDIA Turing architecture generation, several NVIDIA graphics processing units (GPUs) have introduced ray tracing acceleration hardware (RT cores). Ray tracing processing can be regarded as a simulation of wave and particle propagation, collision, and reflection. Therefore, it is expected to be applied to computational science and high-performance computing. However, few studies have been conducted using RT cores. The purpose of this research is to demonstrate the use of RT cores in the scientific and technical computing fields. We implemented a radio wave propagation loss calculation with the programmable ray tracing application framework OptiX and evaluated its performance. Furthermore, we investigated the challenges of reducing the description of framework-specific settings and the needs of hardware allocation. In the simple two spheres experiment, the RT core implementation showed the highest performance. Moreover, the acceleration was super linear scaling, between (10000, 5000) and (20000, 10000). In the experiment with a sphere and planes, the performance achieved by the RT cores was up to approximately 390 times higher than the parallel execution of the BVH search algorithm. We also proved that a large number of RT cores yielded higher performance. In the open data problem space experiment, we evaluated various GPUs and revealed that a larger number of RT cores is effective. These results show that RT cores are sufficiently effective for radio propagation calculations with an adequate number of ray projections. Through this research, we contributed to the RT core use in computational science by proposing an implementation method for ray tracing applications and revealing the effects of RT cores in radio wave propagation loss calculations.

DOI： 10.1109/IPDPSW59300.2023.00115

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/ipps/ipdps2023w.html#HashinokiOKNH23
Large-scale earthquake sequence simulations on 3D nonplanar faults using the boundary element method accelerated by lattice H-matrices

So Ozawa, Akihiro Ida, Tetsuya Hoshino, Ryosuke Ando

Geophysical Journal International 2022年10月

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Oxford University Press (OUP)

Summary

Large-scale earthquake sequence simulations using the boundary element method (BEM) incur extreme computational costs through multiplying a dense matrix with a slip rate vector. Hierarchical matrices (H-matrices) have often been used to accelerate this multiplication. However, the complexity of the structures of the H-matrices and the communication costs between processors limit their scalability, and they therefore cannot be used efficiently in distributed memory computer systems. Lattice H-matrices have recently been proposed as a tool to improve the parallel scalability of H-matrices. In this study, we developed a method for earthquake sequence simulations applicable to 3D nonplanar faults with lattice H-matrices. We present a simulation example and verify the mesh convergence of our method for a 3D nonplanar thrust fault using rectangular and triangular discretizations. We also performed performance and scalability analyses of our code. Our simulations, using over ${10^5}$ degrees of freedom, demonstrated a parallel acceleration beyond ${10^4}$ MPI processors and a &gt; 10-fold acceleration over the best performance when the normal H-matrices are used. Using this code, we can perform unprecedented large-scale earthquake sequence simulations on geometrically complex faults with supercomputers. The software is made an open-source and freely available.

DOI： 10.1093/gji/ggac386

arXiv
Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors.

Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa

CLUSTER 頁： 462 - 472 2022年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

DOI： 10.1109/CLUSTER51413.2022.00056

その他リンク： https://dblp.uni-trier.de/db/conf/cluster/cluster2022.html#HoshinoIH22
Fortran標準規格do concurrentを用いたGPUオフローディング手法の評価

星野哲也, 塙敏博, 大島聡史

情報処理学会研究報告(Web) 2022-HPC-183 巻頁： 1 - 8 2022年

　詳細を見る

CiNii Research
A64FXにおける階層型行列演算の性能評価

星野哲也, 伊田明弘, 塙敏博

情報処理学会研究報告(Web) 2021-HPC-180 巻頁： 1 - 8 2021年

　詳細を見る

CiNii Research
Large-scale earthquake sequence simulations of 3D geometrically complex faults using the boundary element method accelerated by lattice H-matrices on distributed memory computer systems

伊田明弘, 星野哲也

arXiv preprint - 巻頁： 1 - 26 2021年

　詳細を見る

CiNii Research
A64FXにおけるテンポラルブロッキングの実装と性能評価

星野哲也, 塙敏博

情報処理学会研究報告ハイパフォーマンスコンピューティング 2021-HPC-178 巻頁： 1 - 8 2021年

　詳細を見る

CiNii Research
Preliminary development of training environment for deep learning on supercomputer system 査読有り

Y. Nomura, I. Sato, T. Hanawa, S. Hanaoka, T. Nakao, T. Takenaga, D. Sato, T. Hoshino, Y. Sekiya, S. Ohshima, N. Hayashi, O. Abe

International Journal of Computer Assisted Radiology and Surgery 13 巻 ( Issue 1 supplement ) 頁： S105 - S106 2018年6月

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1007/s11548-018-1766-y
有限要素法における係数行列生成部のマルチコア・メニィコア向け最適化

中島研吾, 中島研吾, 星野哲也, 星野哲也, 成瀬彰, 塙敏博, 三木洋平

情報処理学会研究報告(Web) 2018 巻 ( HPC-163 ) 頁： Vol.2018‐HPC‐163,No.28,1‐8 (WEB ONLY) 2018年2月

　詳細を見る

記述言語：日本語

J-GLOBAL
Design of Parallel BEM Analysis Framework for SIMD Processors 査読有り

星野哲也

International Conference on Computational Science 10860 巻頁： 601 - 613 2018年

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）

DOI： 10.1007/978-3-319-93698-7_46

Scopus
Load-Balancing-Aware Parallel Algorithms of H-Matrices with Adaptive Cross Approximation for GPUs. 査読有り

Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa, Kengo Nakajima

IEEE International Conference on Cluster Computing, CLUSTER 2018, Belfast, UK, September 10-13, 2018 頁： 35 - 45 2018年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE Computer Society

DOI： 10.1109/CLUSTER.2018.00016
スーパーコンピュータ上でのDeep Learning学習環境の初期構築

野村行弘, 佐藤一誠, 佐藤一誠, 佐藤一誠, 塙敏博, 花岡昇平, 中尾貴祐, 竹永智美, 佐藤大介, 星野哲也, 関谷勇司, 大島聡史, 林直人, 阿部修

電子情報通信学会技術研究報告 117 巻 ( 281(MI2017 47-62) ) 頁： 1‐2 2017年10月

　詳細を見る

記述言語：日本語

J-GLOBAL
Pascal vs KNL: Performance Evaluation with ICCG Solve 査読有り

Tetsuya Hoshino, Satoshi Ohshima, Toshihiro Hanawa, Kengo Nakaima, Akihiro Ida

HPC in Asia Workshop Poster Session, ISC High Performance 2017 2017年6月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）
OpenACCを用いたICCG法ソルバーのPascal GPUにおける性能評価

星野哲也, 大島聡史, 塙敏博, 中島研吾, 伊田明宏

情報処理学会研究報告(Web) 2017 巻 ( HPC-158 ) 頁： Vol.2017‐HPC‐158,No.18,1‐9 (WEB ONLY) - 9 2017年3月

　詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）

J-GLOBAL
A Directive-based Data Layout Abstraction for Performance Portability of OpenACC Applications 査読有り

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) 頁： 1147 - 1154 2016年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

Directive-based programming interfaces such as OpenACC and OpenMP are becoming more prevalent in application development targeting accelerators, in particular when porting existing CPU-only code. Unlike vendor-specific alternatives such as CUDA, they are designed to be portable across different accelerators, and therefore once necessary directives are added to an existing CPU-only code, it can be executed on different accelerator architectures depending on the availability of supporting compilers. However, it does not automatically mean that such code runs efficiently on different architectures, and in fact, architecture-specific coding such as choosing optimal data layouts is almost mandatory for optimal performance, imposing a significant burden if implemented manually. Towards realizing performance portability in accelerator programming, we propose a set of extended directives that allow the programmer to optimize data layouts for a given accelerator without modifying original program code. Unlike the manual approach, the code change is confined in the directives with the original code kept as it is. This paper evaluates the effectiveness of our proposed extensions in the OpenACC standard by extending UPACS and CCS-QCD OpenACC applications. A prototype source-to-source translator for the extensions achieves 123% and 120% of the baseline performance, respectively, which are comparable to manually tuned versions.

DOI： 10.1109/HPCC-SmartCity-DSS.2016.34

Web of Science
An OpenACC extension for data layout transformation 査読有り

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka

Proceedings of WACCPD 2014: 1st Workshop on Accelerator Programming Using Directives - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis 頁： 12 - 18 2015年4月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Institute of Electrical and Electronics Engineers Inc.

OpenACC is gaining momentum as an implicit and portable interface in porting legacy CPU-based applications to heterogeneous, highly parallel computational environment involving many-core accelerators such as GPUs and Intel Xeon Phi. OpenACC provides a set of loop directives similar to OpenMP for the parallelization and also to manage data movement, attaining functional portability across different heterogeneous devices
however, the performance portability of OpenACC is said to be insufficient due to the characteristics of different target devices, especially those regarding memory layouts, as automated attempts by the compilers to adapt is currently difficult. We are currently working to propose a set of directives to allow compilers to have better semantic information for adaptation
here, we particularly focus on data layout such as Structure of Arrays, advantageous data structure for GPUs, as opposed to Array of Structures, which exhibits good performance on CPUs. We propose a directive extension to OpenACC that allows the users to flexibility specify optimal layouts, even if the data structures are nested. Performance results show that we gain as much as 96 % in performance for CPUs and 165% for GPUs compared to programs without such directives, essentially attaining both functional and performance portability in OpenACC.

DOI： 10.1109/WACCPD.2014.12

Scopus
CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application11 査読有り

Tetsuya Hoshino, Naoya Maruyama, Satoshi Matsuoka, Ryoji Takaki

PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013) 頁： 136 - 143 2013年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

OpenACC is a new accelerator programming interface that provides a set of OpenMP-like loop directives for the programming of accelerators in an implicit and portable way. It allows the programmer to express the offloading of data and computations to accelerators, such that the porting process for legacy CPU-based applications can be significantly simplified. This paper focuses on the performance aspects of OpenACC using two microbenchmarks and one real-world computational fluid dynamics application. Both evaluations show that in general OpenACC performance is approximately 50% lower than CUDA. However, for some applications it can reach up to 98% with careful manual optimizations. The results also indicate several limitations of the OpenACC specification that hamper full use of the GPU hardware resources, resulting in a significant performance gap when compared to a highly tuned CUDA code. The lack of a programming interface for the shared memory in particular results in as much as three times lower performance.

DOI： 10.1109/CCGrid.2013.12

Web of Science

▼全件表示

論文の先頭へ▲

MISC 38

ABINIT-MPプログラムの現状と今後招待有り査読有り

望月祐志, 中野達也, 坂倉耕太, 土居英男, 奥脇弘次, 加藤季広, 滝沢寛之, 大島聡史, 星野哲也, 片桐孝洋

J. Comp. Chem. Jpn.23 巻 ( 4 ) 頁： 85 - 97 2024年12月

　詳細を見る

記述言語：日本語掲載種別：記事・総説・解説・論説等（学術雑誌）出版者・発行元：日本コンピュータ化学会

The fragment molecular orbital (FMO) program ABINIT-MP has a quarter-century history, and related research and development of the Open Version 2 series is currently underway. This paper first summarizes the current status of the latest Revision 8 (released on August 2023). It then describes future improvements and enhancements, including GPU support. The connection with coarse-grained simulation (dissipative particle dynamics) and the possibility of cooperation with quantum computation are also touched upon.

DOI： 10.2477/jccj.2024-0022

Web of Science

CiNii Research
FMOプログラムABINIT-MPの整備状況2023 招待有り査読有り

望月祐志, 中野達也, 坂倉耕太, 奥脇弘次, 土居英男, 加藤季広, 滝沢寛之, 成瀬彰, 大島聡史, 星野哲也, 片桐孝洋

J. Comp. Chem. Jpn.23 巻 ( 1 ) 頁： 4 - 8 2024年3月

　詳細を見る

記述言語：日本語掲載種別：速報，短報，研究ノート等（学術雑誌）出版者・発行元：日本コンピュータ化学会

In August 2023, we released the latest version of our ABINIT-MP program, Open Version 2 Revision 8. In this version, the most commonly used FMO-MP2 calculations are even faster than in the previous Revision 4. It is now also possible to calculate excitation and ionization energies for regions of interest. Improved interaction analysis is also available. In addition, we have started GPU-oriented modifications. In this preliminary report, we present the current status of ABINIT-MP.

DOI： 10.2477/jccj.2024-0001

CiNii Research
CPU・GPU並列プログラミング入門(4)

星野哲也, 中島研吾, 中島研吾

シミュレーション43 巻 ( 1 ) 2024年

　詳細を見る

J-GLOBAL
格子H行列を用いた地震シミュレーションのマルチGPU並列化

百武尚輝, 星野哲也, 星野哲也, 小澤創, 小澤創, 伊田明弘, 安藤亮輔, 河合直聡, 永井亨, 片桐孝洋

情報処理学会研究報告(Web)2024 巻 ( HPC-195 ) 2024年

　詳細を見る

J-GLOBAL
WaitIO+MPI Hybridによる異種システム間でのAllreduceの高速化

植野貴大, 住元真司, 中島研吾, 中島研吾, 片桐孝洋, 大島聡史, 星野哲也, 河合直聡, 永井亨

情報処理学会研究報告(Web)2024 巻 ( HPC-196 ) 2024年

　詳細を見る

J-GLOBAL
HPCカーネルベンチマークによるSapphire Rapids HBMの性能評価

星野哲也, 河合直聡, 伊田明弘, 塙敏博, 片桐孝洋

情報処理学会研究報告(Web)2024 巻 ( HPC-193 ) 2024年

　詳細を見る

J-GLOBAL
CPU・GPU並列プログラミング入門(1)—Introduction to Parallel Programming on CPU and GPU(1)

中島研吾, 星野哲也

シミュレーション = Journal of the Japan Society for Simulation Technology / 日本シミュレーション学会編42 巻 ( 2 ) 頁： 103 - 109 2023年6月

　詳細を見る

記述言語：日本語出版者・発行元：小宮山印刷工業

CiNii Books
数値計算ライブラリの自動チューニングにおけるXAI適用の試み—An Adaptation of XAI to Auto-tuning for Numerical Calculation Library

青木将太, 片桐孝洋, 大島聡史, 永井亨, 星野哲也

計算工学講演会論文集 = Proceedings of the Conference on Computational Engineering and Science / 日本計算工学会編28 巻頁： 904 - 907 2023年5月

　詳細を見る

記述言語：日本語出版者・発行元：日本計算工学会
CPU・GPU並列プログラミング入門(2)

中島研吾, 中島研吾, 星野哲也

シミュレーション42 巻 ( 3 ) 2023年

　詳細を見る

J-GLOBAL
CPU・GPU並列プログラミング入門(3)

星野哲也, 中島研吾, 中島研吾

シミュレーション42 巻 ( 4 ) 2023年

　詳細を見る

J-GLOBAL
Fortran標準規格do concurrentを用いたGPUオフローディング手法の評価

星野哲也, 塙敏博

情報処理学会研究報告(Web)2022-HPC-183 巻頁： 1 - 8 2022年
AMD製GPU・NVIDIA製GPU両対応direct N-body codeの実装と性能評価

三木洋平, 塙敏博, 河合直聡, 星野哲也

日本天文学会年会講演予稿集2022 巻 2022年

　詳細を見る

J-GLOBAL
OpenMPを用いたGPUオフローディングの有効性の評価

河合直聡, 三木洋平, 星野哲也, 塙敏博, 中島研吾, 中島研吾

情報処理学会研究報告(Web)2022 巻 ( HPC-183 ) 2022年

　詳細を見る

J-GLOBAL
A64FXにおけるテンポラルブロッキングの実装と性能評価

星野哲也, 塙敏博

研究報告ハイパフォーマンスコンピューティング（HPC）2021-HPC-178 巻 ( 17 ) 頁： 1 - 8 2021年3月

　詳細を見る

担当区分：筆頭著者
「計算・データ・学習」融合スーパーコンピュータシステム「Wisteria/BDEC-01」の概要

中島研吾, 塙敏博, 下川辺隆史, 伊田明弘, 芝隼人, 三木洋平, 星野哲也, 有間英志, 河合直聡, 坂本龍一, 近藤正章, 岩下武史, 八代尚, 長尾大道, 松葉浩也, 荻田武史, 片桐孝洋, 古村孝志, 鶴岡弘, 市村強, 藤田航平

情報処理学会研究報告(Web)2021 巻 ( HPC-179 ) 2021年

　詳細を見る

J-GLOBAL
「計算・データ・学習」融合スーパーコンピュータシステムWisteria/BDEC-01の性能評価

塙敏博, 中島研吾, 中島研吾, 下川辺隆史, 芝隼人, 三木洋平, 星野哲也, 河合直聡, 似鳥啓吾, 今村俊幸, 工藤周平, 中尾昌広

情報処理学会研究報告(Web)2021 巻 ( HPC-180 ) 2021年

　詳細を見る

J-GLOBAL
A64FXにおける階層型行列演算の性能評価

星野哲也, 伊田明弘, 伊田明弘, 塙敏博

情報処理学会研究報告(Web)2021 巻 ( HPC-180 ) 頁： 1 - 8 2021年

　詳細を見る

J-GLOBAL
Large-scale earthquake sequence simulations of 3D geometrically complex faults using the boundary element method accelerated by lattice H-matrices on distributed memory computer systems

伊田明弘, 星野哲也

arXiv preprint- 巻頁： 1 - 26 2021年
An Optimization of H-matrix-vector Multiplication by Using Un-used Cores

Tetsuya Hoshino, Toshihiro Hanawa, Akihiro Ida

HPC Asia 2020 2020年1月
Numerical Linear Algebra Based on Lattice H-Matrices

伊田明弘, Ichitaro Yamazaki, Rio Yokota, Satoshi Ohshima, Tasuku Hiraishi, Takeshi Iwashita, Tetsuya Hoshino, Toshihiro Hanawa

HPC Asia 2020年1月
メニーコアクラスタにおける階層型行列法の高速化に向けた性能評価

星野哲也, 伊田明弘

計算工学講演会論文集(CD-ROM)24 巻頁： ROMBUNNO.C‐07‐02 2019年6月

　詳細を見る

記述言語：日本語出版者・発行元：日本計算工学会

J-GLOBAL
メニーコアプロセッサにおける高性能計算のための高レベル抽象化

星野哲也, Hoshino Tetsuya

2018年9月

　詳細を見る

記述言語：英語
OpenCLを用いたFPGAによる階層型行列計算

塙敏博, 伊田明弘, 星野哲也

情報処理学会研究報告(Web)2018 巻 ( HPC-163 ) 頁： Vol.2018‐HPC‐163,No.26,1‐8 (WEB ONLY) 2018年2月

　詳細を見る

記述言語：日本語

J-GLOBAL
階層型行列計算のFPGAへの適用

塙敏博, 伊田明弘, 星野哲也

情報処理学会研究報告(Web)2017 巻 ( HPC-161 ) 頁： Vol.2017‐HPC‐161,No.10,1‐10 (WEB ONLY) 2017年9月

　詳細を見る

記述言語：日本語

J-GLOBAL
階層型行列法ライブラリHACApKを用いたアプリケーションのメニーコア向け最適化

星野哲也, 伊田明弘, 塙敏博, 中島研吾

情報処理学会研究報告(Web)2017 巻 ( HPC-160 ) 頁： Vol.2017‐HPC‐160,No.15,1‐10 (WEB ONLY) - 10 2017年7月

　詳細を見る

記述言語：日本語

J-GLOBAL
GPU搭載スーパーコンピュータReedbush‐Hの性能評価

塙敏博, 星野哲也, 中島研吾, 大島聡史, 伊田明弘

情報処理学会研究報告(Web)2017 巻 ( HPC-159 ) 頁： Vol.2017‐HPC‐159,No.9,1‐6 (WEB ONLY) 2017年4月

　詳細を見る

記述言語：日本語

J-GLOBAL
Xeon Phi+OmniPath環境におけるOpenMP,MPI性能最適化

塙敏博, 星野哲也, 中島研吾, 大島聡史, 伊田明弘

情報処理学会研究報告(Web)2017 巻 ( HPC-158 ) 頁： Vol.2017‐HPC‐158,No.21,1‐8 (WEB ONLY) 2017年3月

　詳細を見る

記述言語：日本語

J-GLOBAL
ICCG法ソルバーのIntel Xeon Phi向け最適化

中島研吾, 中島研吾, 大島聡史, 大島聡史, 塙敏博, 星野哲也, 伊田明弘, 伊田明弘

情報処理学会研究報告(Web)2016 巻 ( HPC-157 ) 頁： Vol.2016‐HPC‐157,No.16,1‐8 (WEB ONLY) 2016年12月

　詳細を見る

記述言語：日本語

J-GLOBAL
パイプライン型共役勾配法の性能評価

塙敏博, 中島研吾, 中島研吾, 大島聡史, 大島聡史, 星野哲也, 伊田明弘, 伊田明弘

情報処理学会研究報告(Web)2016 巻 ( HPC-157 ) 頁： Vol.2016‐HPC‐157,No.6,1‐9 (WEB ONLY) 2016年12月

　詳細を見る

記述言語：日本語

J-GLOBAL
データ解析・シミュレーション融合スーパーコンピュータシステムReedbush‐Uの性能評価

塙敏博, 中島研吾, 大島聡史, 伊田明弘, 星野哲也, 田浦健次朗

情報処理学会研究報告(Web)2016 巻 ( HPC-156 ) 頁： Vol.2016‐HPC‐156,No.10,1‐10 (WEB ONLY) - 10 2016年9月

　詳細を見る

記述言語：日本語

J-GLOBAL
データレイアウト最適化指示文によるOpenACCアプリケーションの高速化

星野哲也

研究報告ハイパフォーマンスコンピューティング（HPC）2016-HPC-155 巻頁： 1 - 8 2016年
圧縮性流体プログラムのOpenACCによる高速化

星野哲也

研究報告ハイパフォーマンスコンピューティング（HPC）2016-HPC-153 巻頁： 1 - 10 2016年
OpenACCディレクティブ拡張によるデータレイアウト最適化

星野哲也, 丸山直也, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC）2014 巻 ( 45 ) 頁： 1 - 8 2014年7月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人情報処理学会

近年増加傾向にある GPU 等のアクセラレータを搭載した計算環境への既存プログラムの移植方法として，CUDA・OpenCL に代表されるローレベルなプログラミングモデルを用いる方法に対し，ディレクティブベースの OpenACC のようなハイレベルなプログラミングモデルを用いる方法が注目されている．このようなディレクティブベースのプログラミングモデルの利点として，元のプログラムを維持したまま移植を行えるために，デバイス間の機能的な可搬性が高いことがあげられる．しかし現状の OpenACC などの High-level なプログラミングモデルは，スカラプロセッサとメニーコアアクセラレータの得意とするデータレイアウトの相違に対応することが出来ず，異なる性質を持ったデバイス間の性能可搬性に問題がある．そこで本研究では，データレイアウトを抽象化し，異なるデバイス間での性能可搬性を向上させるための OpenACC の拡張ディレクティブを試作し，姫野ベンチマークのデータレイアウトをトランスレーターにより変更し，マルチコア CPU，Intex Xeon Phi，K20X GPU のそれぞれで評価を行った．その結果，オリジナルと同一のデータレイアウトと比較して，Intel Xeon Phi では 27%，K20X GPU では 24%の性能向上が得られることを確認した．

CiNii Books
CPU-GPUそれぞれに最適なデータレイアウトを選択可能にするOpenACCディレクティブ拡張

星野哲也, 丸山直也, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC）2014 巻 ( 5 ) 頁： 1 - 5 2014年2月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人情報処理学会

近年増加傾向にある GPU 等のアクセラレータを搭載した計算環境への既存プログラムの移植方法として，CUDA・OpenCL に代表される Low-level なプログラミングモデルを用いる方法に対し，ディレクティブベースの OpenACC のような High-level なプログラミングモデルを用いる方法が考えられる．このようなディレクティブベースのプログラミングモデルの利点として，元のプログラムを壊さずに移植を行えるために，デバイス間の可搬性が高いことがあげられる．しかし現状の OpenACC などのプログラミングモデルは，スカラプロセッサとメニーコアアクセラレータの得意とするデータレイアウトの相違等に対応することが出来ず，異なる性質を持ったデバイス間の性能可搬性に問題がある．そこで本研究では，データレイアウトを抽象化し，異なるデバイス間での性能可搬性を向上させるための OpenACC の拡張ディレクティブを試作し，評価を行った．

CiNii Books
ディレクティブベースプログラミング言語OpenACCの性能評価

星野哲也, 丸山直也, 松岡聡

ハイパフォーマンスコンピューティングと計算科学シンポジウム論文集2013 巻頁： 91 - 91 2013年1月

　詳細を見る

記述言語：日本語
大規模流体アプリケーションのCUDA・OpenACCへの移植性の評価

星野哲也, 丸山直也, 松岡聡

研究報告ハイパフォーマンスコンピューティング（HPC）2012 巻 ( 42 ) 頁： 1 - 9 2012年7月

　詳細を見る

記述言語：日本語

地震や気象予測，航空機や高層ビル設計といったシミュレーションに利用される数値流体力学アプリケーションは，近年一般的になりつつある GPU を用いたスーパーコンピュータにおいて，目覚ましい成果を上げている．しかし，GPU を用いたプログラミングは，高い性能を得ること難しいと言われており，レガシープログラムの GPU 環境への移植が問題となっている．本稿では，実際に利用されている大規模流体アプリケーションである UPACS を手動により CUDA 化し，性能と移植コストの面から評価を行った．また，プログラムの移植性を解決すると期待されている，OpenACC の予備評価を行った．これら評価の結果を示し，今後解決すべき課題について述べる．Computational fluid dynamics (CFD) applications used for an earthquake and meteorological simulation are one of the most important application executed with high-speed supercomputers. Especially, GPU-based supercomputers have been showing remarkable performance of CFD applications. However, GPU-programing is still difficult to obtain high performance, which prevents legacy applications from being ported to GPU environment. We apply classical optimizations to a real-world CFD application UPACS and evaluate it's performance and porting costs, and we also evaluate OpenACC expected to provide portability across CPUs and GPUs. We demonstrate these results of evaluation and mention performance problems should be resolved in the future.

CiNii Books
大規模流体アプリケーションのGPUによる高速化手法の評価

星野哲也, 丸山直也, 松岡聡

先進的計算基盤システムシンポジウム論文集2012 巻頁： 73 - 74 2012年5月

　詳細を見る

記述言語：日本語
OpenACCプログラミング

丸山直也, 星野哲也

映像情報メディア学会誌 : 映像情報メディア = The journal of the Institute of Image Information and Television Engineers66 巻 ( 10 ) 頁： 817 - 822 2012年

　詳細を見る

記述言語：英語出版者・発行元：一般社団法人映像情報メディア学会

DOI： 10.3169/itej.66.817

Scopus

CiNii Books

▼全件表示

MISCの先頭へ▲

科研費 5

低ランク構造行列法の適用範囲拡大と多様な計算アーキテクチャの活用

研究課題/研究課題番号：24K02949 2024年4月 - 2027年3月

日本学術振興会科学研究費助成事業基盤研究(B)

伊田明弘, 横田理央, 塙敏博, 岩下武史, 大島聡史, 星野哲也, 平石拓, 河合直聡, 横田理央, 塙敏博, 岩下武史, 大島聡史, 星野哲也, 平石拓, 河合直聡

　詳細を見る

担当区分：研究分担者

本研究では、低ランク構造行列法ライブラリの高機能化を実施する。科学技術計算では、密行列演算に基づく計算手法の数値線形代数ライブラリが広く利用されいる。密行列演算から低ランク構造行列演算へ置き換えが行えるように、低ランク構造行列法の適用範囲を拡大する。低ランク構造行列に基づく新たな数値計算アルゴリズムを開発する。アルゴリズム開発は、GPU・FPGAなど最新の計算機アーキテクチャで構成されるクラスタ計算機を意識し、実装の最適化を行う。様々な低ランク構造行列の演算に対し、最適な計算機アーキテクチャを割当て、混合精度演算・動的負荷分散なども活用し、計算機の性能を最大限に引き出す実装法を研究する。
実アプリケーションの時空間ブロッキングによる高速化に関する研究

研究課題/研究課題番号：22K17898 2022年4月 - 2024年3月

日本学術振興会科学研究費助成事業若手研究

星野哲也

　詳細を見る

担当区分：研究代表者

配分額：1430000円（直接経費：1100000円、間接経費：330000円）

スーパーコンピュータに搭載される最新世代のCPUは大きな共有キャッシュを有し、これを効率的に利用する最適化手法として知られる時空間ブロッキングは、科学・工学分野のシミュレーションで頻出するステンシル計算を高速化するための手法である。しかし時空間ブロッキングは煩雑なプログラミングを要求するため、実アプリケーションへの応用は進んでいない。本研究課題では、比較的簡単なコード変換によって実現可能ながら、大容量の共有キャッシュメモリを用いることで高効率実行が可能なoverlapped方式の時空間ブロッキング手法に着目し、様々なCPUにおける性能モデリング及び実アプリケーションでの有効性の検証を行う。
微分方程式を解析的に解く際に生じる時・空間の離散格子に対する特定の計算パターンはステンシル計算と呼ばれ、様々な流体シミュレーションにおいて頻出する重要なカーネルである。ステンシル計算の高速化は盛んに研究されており、時空間ブロッキング手法はその一手法であるが、非常に煩雑なプログラミングを要求するため、実アプリケーションへの適用例はほとんどない。さらに、時空間ブロッキングの性能は実行するプロセッサの性能パラメータに大きく依存するため、人手によって最適化することは現実的ではない。そこで本研究では時空間ブロッキングの自動最適化に必要な性能モデリングを、最新のCPUを用いて行った。
本研究では、主にHigh Bandwidth Memory（HBM）を搭載した最新のCPUである、富岳スパコンのA64FXや、Intel Xeon Sapphire Rapids世代のCPUを用いて、性能モデル化を進めた点に大きな価値がある。時空間ブロッキング手法はその性質上、特にメインメモリの性能とラストレベルキャッシュの性能比に性能が大きく依存する。この性能比はHBMの登場によって既存のCPUと大きく変化し、本研究では性能モデルによってその影響を明らかにしたことが、高性能計算分野において意義のある成果である。また当初想定していなかった、命令レイテンシの影響を明らかにした点も意義がある。
格子H行列に基づく数値線形代数の構築と最新アーキテクチャへの高性能実装法

研究課題/研究課題番号：21H03447 2021年4月 - 2024年3月

日本学術振興会科学研究費助成事業基盤研究(B)

伊田明弘, 横田理央, 塙敏博, 岩下武史, 大島聡史, 星野哲也, 平石拓, 横田理央, 塙敏博, 岩下武史, 大島聡史, 星野哲也, 平石拓

　詳細を見る

担当区分：研究分担者

格子H行列を用いた数値線形代数系の構築を目的とし、固有値計算、LU分解、QR分解などを実行する新しいアルゴリズムの研究・開発を行った。多くの成果が得られた中でも、主要なものとして、BLR(Block Low-Rank)行列の固有値計算法が挙げられる。格子H行列の特殊な場合であるBLR行列について、全固有値を計算するアルゴリズムを開発した。開発したアルゴリズムの計算複雑度を、BLR行列を特徴付ける行列サイズ・ブロックサイズ・各ブロックの階数(ランク)を変数として見積り、最適な条件を検討した。最適条件下で提案アルゴリズムの計算量が従来の密行列の場合に比べて大幅に低減させられることを理論的に示した。また、数値実験を用いて、計算時間が理論通り計算量に比例すること、および、固有値と固有ベクトルの誤差が階数を増やすにつれて密行列に近づいていくことを確かめた。
格子H行列法の適用範囲の拡大に取り組んだ。従来、格子H行列を含む低ランク構造行列法は境界要素解析（時間項を含まない空間領域の積分方程式法）への適用が想定されていた。時空間領域積分方程式法に適用範囲を拡張すべく、FDP(Fast Domain Partitioning)法と格子H行列法を組み合わせた手法を開発し、従来手法の計算量を開発手法では大幅に低減させられることを理論的に示した。さらに、３次元弾性波動伝播解析を提案手法で行う計算コードを開発し、理論に近い計算時間で解析が行えることを確かめた。
格子H行列法の高性能実装に関する研究においても多くの成果が得られた。主要な成果としては、タスク並列言語Tascellを拡張し、分散メモリ環境における行列分割構造生成の並列実装を提案した。3次元電場解析に対する約１億要素を用いた数値実験において、最大8ノード×36ワーカーで良好な高速化を達成した。
余剰コアを活用する高性能計算・データ解析支援

研究課題/研究課題番号：20H00580 2020年4月 - 2023年3月

日本学術振興会科学研究費助成事業基盤研究(A)

塙敏博, 下川辺隆史, 星野哲也, 三木洋平, 伊田明弘, 下川辺隆史, 星野哲也, 三木洋平, 伊田明弘

　詳細を見る

担当区分：研究分担者

①ユーザプログラムを再コンパイルせずに、動的プロファイリングを実現する手法を検討した。SystemTapを用いて，測定したい関数部分に着目したプロファイル、OpenMP並列数の動的な変更をほぼオーバヘッドなく実現した。主計算と同時に他の処理を実行した際の影響を測定した。
②MPI＋OpenMP並列化されたアプリケーションを効率的に動作させることを目指して、MPIプロセス毎に割り付けられるコア数(OpenMPスレッド数)をコントロールし、各コアの負荷を均一化する手法について検討した。検討手法をDCBライブラリとして実装し、アプリケーションから簡単なAPIコールのみで利用できるようにした。DCBライブラリを格子H行列法コードに適用し、15.5%の計算速度向上と8.0%の消費電力削減を実現した。
③In Situデータ解析の実現に向けた準備として，宇宙物理分野のアプリケーションにおいて時間進化の計算中に解析処理を実行する機能を追加した．特定条件を満たしたデータだけを高頻度に単一ファイルへと追加出力する機能も追加し，非同期ファイルIO機能の予備評価に利用可能にした。
④流体計算において頻出するステンシル計算は一般的にメモリ律速となることが知られており、余剰な演算器が発生しやすい演算パターンである。キャッシュメモリを利用し数ステップ分の演算をメモリに書き戻すことなく実行することで、余剰な演算器を活用する手法として知られるテンポラルブロッキングを3次元の拡散方程式カーネルに適用し、最新のプロセッサにおいて評価した結果，最大4.99倍の性能向上を実現した。
⑤ステンシル計算のうち局所的に高精細にできる適合格子細分化法（AMR法）では、データ構造の複雑さから通信が性能低下につながる。余剰コアを活用するという観点からテンポラルブロッキングを最新のプロセッサで効率的に利用する方法について検討を進めた。
アプリケーションのデータ構造に着目したメニーコア向け自動最適化フレームワーク

研究課題/研究課題番号：16H06679 2016年8月 - 2018年3月

日本学術振興会科学研究費助成事業研究活動スタート支援

星野哲也

　詳細を見る

近年増加傾向にあるメニーコアプロセッサを用いた計算環境において、その性能を引き出すためにはVector Processing Unit (VPU)を効率良く利用することが重要である。しかし、VPUの効率的な利用にはハードウェアやコンパイラに関する知識が必要であり、またプログラムのデータ構造の変更などが往々にして必要となる。
本研究では、データ構造を抽象化するためのコンパイラ指示文の提案と、その指示文を解釈するトランスレータの開発、自動ベクトル化を促進するフレームワークデザインの提案と、そのデザインに則った境界要素法向けのフレームワークの開発を行った。

科研費の先頭へ▲

担当経験のある科目 (本学) 3

大規模並列数値計算特論

2023
大規模計算特論B

2023
プログラミング２

2023

担当経験のある科目 (本学)の先頭へ▲

社会貢献活動 1

最近のFortran向けGPUプログラミング事情（JAXA内部講習会）

役割：講師

2023年12月

社会貢献活動の先頭へ▲

学術貢献活動 2

HPC Asia 2024 Local Arrangement Chair

役割：企画立案・運営等

2024年1月

　詳細を見る

種別：学会・研究会等
xSIG 2023 プログラム委員

役割：査読

2023年8月

　詳細を見る

種別：査読等

学術貢献活動の先頭へ▲