研究者詳細 - 天野　英晴

GPU-accelerated language and communication support by FPGA

Boku T., Hanawa T., Murai H., Nakao M., Miki Y., Amano H., Umemura M., Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, 2018年12月

　概要を見る

Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready parallel systems, we need to combine different paradigms of programming such as CUDA/OpenCL, MPI, and OpenMP for advanced platforms. In the hardware configuration, inter-GPU communication through PCIe channel and support by CPU are required which causes large overhead to be a bottleneck of total parallel processing performance. In our project to be described in this chapter, we developed an FPGA-based platform to reduce the latency of inter-GPU communication and also a PGAS language for distributed-memory programming with accelerating devices such as GPU. Through this work, a new approach to compensate the hardware and software weakness of parallel GPU computing is provided. Moreover, FPGA technology for computation and communication acceleration is described upon astrophysical problem where GPU or CPU computation is not sufficient on performance.
- Access to Document (DOI)
FPGAの原理と構成

天野英晴（編）、飯田全広他14人, オーム社, 2016年04月
ディジタル回路設計とコンピュータアーキテクチャ　ARM版

天野英晴, SiBアクセス, 2016年04月
コンピュータアーキテクチャ　定量的アプローチ

J.L.Hennessy and D.A.Patterson, 翔泳社, 2014年03月
CMOS VLSI回路設計

N.H.E.Weste, D.M.Harris, 丸善出版, 2014年01月

担当範囲: 10章、付録

全件表示 >>

このページの先頭へ▲

論文【表示／非表示】

RT-libSGM: An Implementation of a Real-time Stereo Matching System on FPGA

Wei K., Kuno Y., Arai M., Amano H.

ACM International Conference Proceeding Series （ACM International Conference Proceeding Series） 1 - 9 2022年06月

　概要を見る

Stereo depth estimation has become an attractive topic in the computer vision field. Although various algorithms strive to optimize the speed and the precision of estimation, the energy cost of a system is also an essential metric for an embedded system. Among these various algorithms, Semi-Global Matching (SGM) has been a popular choice for some real-world applications because of its accuracy-and-speed balance. However, its power consumption makes it difficult to be applied to an embedded system. Thus, we propose a robust stereo matching system, RT-libSGM, working on the Xilinx Field-programmable gate array (FPGA) platforms. The dedicated design of each module optimizes the speed of the entire system while ensuring the flexibility of the system structure. Through an evaluation running on a Zynq FPGA board called M-KUBOS, RT-libSGM achieves state-of-the-art performance with lower power consumption. Compared with the original design (libSGM), when working on the Tegra X2 GPU, RT-libSGM runs 2 × faster at a lower energy cost.
- Access to Document (DOI)
Mapping-Aware Kernel Partitioning Method for CGRAs Assisted by Deep Learning

Kojima T., Ohwada A., Amano H.

IEEE Transactions on Parallel and Distributed Systems （IEEE Transactions on Parallel and Distributed Systems） 33 （ 5 ） 1213 - 1230 2022年05月

ISSN 10459219

　概要を見る

Coarse-grained reconfigurable architectures (CGRAs) provide high energy efficiency with word-level programmability rather than bit-level ones such as FPGAs. The coarser reconfigurability brings about higher energy efficiency and reduces the complexity of compiler tasks compared to the FPGAs. However, application mapping process for CGRAs is still time-consuming. When the compiler tries to map a large and complicated application data-flow-graph(DFG) onto the reconfigurable fabric, it tends to result in inefficient resource use or to fail in mapping. In case of failure, the compiler must divide it into several sub-DFGs and goes back to the same flow. In this work, we propose a novel partitioning method based on a genetic algorithm to eliminate the unmappable DFGs and improve the mapping quality. In order not to generate unmappable sub-DFGs, we also propose an estimation model which predicts the mappability and resource requirements using a DGCNN (Deep Graph Convolutional Neural Network). The genetic algorithm with this model can seek the most resource-efficient mapping without the back-end mapping process. Our model can predict the mappability with more than 98% accuracy and resource usage with a negligible error for two studied CGRAs. Besides, the proposed partitioning method demonstrates 53-75% of memory saving, 1.28-1.39x higher throughput, and better mapping quality over three comparative approaches.
- Access to Document (DOI)
A traffic-aware memory-cube network using bypassing

Shikama Y., Kawano R., Matsutani H., Amano H., Nagasaka Y., Fukumoto N., Koibuchi M.

Microprocessors and Microsystems （Microprocessors and Microsystems） 90 2022年04月

ISSN 01419331

　概要を見る

Three-dimensional stack memory which provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems. While a large number of memory cubes increase the aggregate memory capacity, the communication latency and power consumption increase significantly owing to its low-radix large-diameter packet network. In this context, we propose a memory-cube network called Diagonal Memory Network (DMN). A diagonal network topology, its floor layout, and its lightweight router were designed for low-latency and low-voltage memory-read communication. DMN routing efficiently avoids deadlocks of packets, although it allows each packet transmitted to a processor to use both bypassing and original datapaths. Our evaluation results show that the DMN router decreases the use of hardware resources by more than 31% compared with a conventional virtual channel router. The DMN router reduces energy consumption by 13% and 67% to transit a packet along with the original datapath and bypassing datapath, respectively. Furthermore, using flit-level discrete event simulation, a DMN topology achieves high throughput and latency that is lower than that of existing network topologies using conventional packet routers.
- Access to Document (DOI)
Body Bias Control on a CGRA based on Convex Optimization

Kojima T., Okuhara H., Kondo M., Amano H.

25th IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2022 - Proceedings （25th IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL Chips 2022 - Proceedings） 2022年

　概要を見る

Body biasing is one of the critical techniques to realize more energy-efficient computing with reconfigurable devices, such as Coarse-Grained Reconfigurable Architectures (CGRAs). Its benefit depends on the control granularity, whereas fine-grained control makes it challenging to find the best body bias voltage for each domain due to the complexity of the optimization problem. This work reformulates the optimization problem and introduces continuous relaxation to solve it faster than previous work. Experimental result shows the proposed method can solve the problem within 0.5 sec for all benchmarks in any conditions and demonstrates up to 5.65x speed-up compared to the previous method with negligible loss of accuracy.
- Access to Document (DOI)
An efficient compilation of coarse-grained reconfigurable architectures utilizing pre-optimized sub-graph mappings

Ohwada A., Kojima T., Amano H.

Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022 （Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022） 1 - 9 2022年

　概要を見る

In recent years, IoT devices have become widespread, and energy-efficient coarse-grained reconfigurable architectures (CGRAs) have attracted attention. CGRAs comprise several processing units called processing elements (PEs) arranged in a two-dimensional array. The operations of PEs and the interconnections between them are adaptively changed depending on a target application, and this contributes to a higher energy efficiency compared to general-purpose processors. The application kernel executed on CGRAs is represented as a data flow graph (DFG), and CGRA compilers are responsible for mapping the DFG onto the PE array. Thus, mapping algorithms significantly influence the performance and power efficiency of CGRAs as well as the compile time. This paper proposes POCOCO, a compiler framework for CGRAs that can use pre-optimized subgraph mappings. This contributes to reducing the compiler optimization task. To leverage the subgraph mappings, we extend an existing mapping method based on a genetic algorithm. Experiments on three architectures demonstrated that the proposed method reduces the optimization time by 48%, on an average, for the best case of the three architectures.
- Access to Document (DOI)

全件表示 >>

このページの先頭へ▲

KOARA（リポジトリ）収録論文等【表示／非表示】

ビルディングブロック型計算システムにおけるチップブリッジを用いた積層方式

天野, 英晴

科学研究費補助金研究成果報告書 2020年
誘導結合を用いたビルディングブロック型計算システムの研究

天野, 英晴

科学研究費補助金研究成果報告書 2017年
再構成可能LSIを用いた実用的CFDアプリケーションの高速化に関する研究

天野, 英晴

科学研究費補助金研究成果報告書 2011年

このページの先頭へ▲

総説・解説等【表示／非表示】

Message from the Organizing Committee Chair

Amano H.

IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings （IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings） I - II 2019年05月
- Access to Document (DOI)
Preface

Weinhardt M., Koch D., Hochberger C., Schwarz A., Amano H., Bauer L., Cardoso J.M.P., Chow P., Hannig F., Kenter T., Koch A., Leeser M., Marino M.D., Poznanovic D., Ul-Abdin Z., Willenberg R., Ziener D.

6th International Workshop on FPGAs for Software Programmers, FSP 2019, co-located with International Conference on Field Programmable Logic and Applications, FPL 2019 （6th International Workshop on FPGAs for Software Programmers, FSP 2019, co-located with International Conference on Field Programmable Logic and Applications, FPL 2019） 2019年

このページの先頭へ▲

研究発表【表示／非表示】

Zynq Cluster for CFD Parametric Survey

天野英晴

[国際会議] the International Symposium on Applied Reconfigurable Computing (ARC) （Lio De Janeiro） ,

2016年02月
,
口頭発表（一般）
Randomizing Packet Memory Networks for Low-latency Processor-memory Communication

天野英晴

[国際会議] The 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) （Crete） ,

2016年02月
,
口頭発表（一般）, IEEE
Power Optimization considering the chip temperature of low power reconfigurable accelerator CMA-SOTB

天野英晴

[国際会議] he 4rd International Symposium on Computing and Networking (CANDAR),

2015年12月
,
口頭発表（一般）, IEICE
A 297MOPS/0.4mW Ultra Low Power Coarse-grained Reconfigurable Accelerator CMA-SOTB-2

天野英晴

[国際会議] The 10th International Conference on ReConFigurable Computing and FPGAs （IEEE） ,

2015年12月
,
口頭発表（一般）
On-Chip Decentralized Routers with Balanced Pipelines for Avoiding Interconnect Bottleneck

天野英晴

[国際会議] the 9th ACM/IEEE International Symposium on Networks-on-Chip (NOCS) （Banqueber） ,

2015年10月
,
口頭発表（一般）

全件表示 >>

このページの先頭へ▲

競争的研究費の研究課題【表示／非表示】

ビルディングブロック型計算システムにおけるチップブリッジを用いた積層方式

2018年04月

-

2021年03月

文部科学省・日本学術振興会, 科学研究費助成事業, 天野　英晴, 基盤研究(B), 補助金, 研究代表者
誘導結合を用いたビルディングブロック型計算システムの研究

2013年05月

-

2018年03月

文部科学省・日本学術振興会, 科学研究費助成事業, 天野　英晴, 基盤研究(S), 補助金, 研究代表者

このページの先頭へ▲

受賞【表示／非表示】

電子情報通信学会フェロー

2015年09月
ISS功績賞

2014年05月, 電子情報通信学会

受賞区分：国内学会・会議・シンポジウム等の賞
論文賞

松谷、鯉渕、天野, 2008年05月, 情報処理学会, Network-on-ChipにおけるFat H-Treeトポロジに関する研究

受賞区分：国内学会・会議・シンポジウム等の賞
論文賞

柴田、宇野、天野, 2003年05月, 電子情報通信学会, DRL上への仮想ハードウェアの実装

受賞区分：国内学会・会議・シンポジウム等の賞
情報処理学会坂井記念学術賞

天野英晴, 1997年, 情報処理学会

全件表示 >>

このページの先頭へ▲

担当授業科目【表示／非表示】

コンピュータアーキテクチャ特論

2025年度
コンピュータアーキテクチャ特論

2024年度
情報工学輪講

2023年度
情報工学実験第２Ｂ

2023年度
開放環境科学課題研究

2023年度

全件表示 >>

このページの先頭へ▲

社会活動【表示／非表示】

ASP Design Automation Conference 2000

1998年

-

継続中
Japanese FPGA/PLD Conference and Exhibit

1998年

-

継続中
Cool Chips 1999

1998年

-

継続中
ASP Design Automation Conference 1998

1997年

-

継続中
IASTED International Conference of Applied Informa

1997年

-

1998年

全件表示 >>

このページの先頭へ▲

所属学協会【表示／非表示】

電子情報通信学会コンピュータシステム研究専門委員会,

2011年05月

-

継続中
First international workshop on highly-efficient accelerators and reconfigurable technologies (HEART),

2010年

-

継続中
Cool Chips,

2009年

-

継続中
International Symposium on Applied Reconfigurable Computing,

2008年03月

-

継続中
International Conference on Field Programmable Technology,

2007年12月

-

継続中

全件表示 >>

このページの先頭へ▲

委員歴【表示／非表示】

2015年10月

-

継続中

General Chair, IEEE/ACM International Symposium on Networks on Chip (NOCS) 2016
2015年05月

-

継続中

FIT実行委員長, 電子情報通信学会、情報処理学会
2015年05月

-

継続中

ISS副会長, 電子情報通信学会
2015年05月

-

2016年03月

全国大会プログラム委員長, 情報処理学会
2011年05月

-

2013年05月

専門委員長, 電子情報通信学会コンピュータシステム研究専門委員会

全件表示 >>

このページの先頭へ▲

慶應義塾研究者情報データベース

Scopus 論文情報

経歴 【 表示 ／ 非表示 】

経歴 【 表示 ／ 非表示 】

学歴 【 表示 ／ 非表示 】

学歴 【 表示 ／ 非表示 】

学位 【 表示 ／ 非表示 】

学位 【 表示 ／ 非表示 】

研究分野 【 表示 ／ 非表示 】

研究分野 【 表示 ／ 非表示 】

研究テーマ 【 表示 ／ 非表示 】

研究テーマ 【 表示 ／ 非表示 】

著書 【 表示 ／ 非表示 】

著書 【 表示 ／ 非表示 】

論文 【 表示 ／ 非表示 】

論文 【 表示 ／ 非表示 】

KOARA（リポジトリ）収録論文等 【 表示 ／ 非表示 】

KOARA（リポジトリ）収録論文等 【 表示 ／ 非表示 】

総説・解説等 【 表示 ／ 非表示 】

総説・解説等 【 表示 ／ 非表示 】

研究発表 【 表示 ／ 非表示 】

研究発表 【 表示 ／ 非表示 】

競争的研究費の研究課題 【 表示 ／ 非表示 】

競争的研究費の研究課題 【 表示 ／ 非表示 】

受賞 【 表示 ／ 非表示 】

受賞 【 表示 ／ 非表示 】

担当授業科目 【 表示 ／ 非表示 】

担当授業科目 【 表示 ／ 非表示 】

社会活動 【 表示 ／ 非表示 】

社会活動 【 表示 ／ 非表示 】

所属学協会 【 表示 ／ 非表示 】

所属学協会 【 表示 ／ 非表示 】

委員歴 【 表示 ／ 非表示 】

委員歴 【 表示 ／ 非表示 】

経歴【表示／非表示】

経歴【表示／非表示】

学歴【表示／非表示】

学歴【表示／非表示】

学位【表示／非表示】

学位【表示／非表示】

研究分野【表示／非表示】

研究分野【表示／非表示】

研究テーマ【表示／非表示】

研究テーマ【表示／非表示】

著書【表示／非表示】

著書【表示／非表示】

論文【表示／非表示】

論文【表示／非表示】

KOARA（リポジトリ）収録論文等【表示／非表示】

KOARA（リポジトリ）収録論文等【表示／非表示】

総説・解説等【表示／非表示】

総説・解説等【表示／非表示】

研究発表【表示／非表示】

研究発表【表示／非表示】

競争的研究費の研究課題【表示／非表示】

競争的研究費の研究課題【表示／非表示】

受賞【表示／非表示】

受賞【表示／非表示】

担当授業科目【表示／非表示】

担当授業科目【表示／非表示】

社会活動【表示／非表示】

社会活動【表示／非表示】

所属学協会【表示／非表示】

所属学協会【表示／非表示】

委員歴【表示／非表示】

委員歴【表示／非表示】