天野 英晴 (アマノ ヒデハル)

Amano, Hideharu

写真a

所属(所属キャンパス)

理工学部 情報工学科 (矢上)

職名

教授

HP

外部リンク

経歴 【 表示 / 非表示

  • 1985年04月
    -
    1989年03月

    大学助手(理工学部電気工学科)

  • 1989年04月
    -
    1994年03月

    大学専任講師(理工学部電気工学科)

  • 1989年10月
    -
    1990年09月

    Stanford大学 ,訪問講師

  • 1994年04月
    -
    1996年03月

    大学助教授(理工学部電気工学科)

  • 1996年04月
    -
    2001年03月

    大学助教授(理工学部情報工学科)

全件表示 >>

学歴 【 表示 / 非表示

  • 1981年03月

    慶應義塾大学, 工学部, 電気工学科

    大学, 卒業

  • 1983年03月

    慶應義塾大学, 工学研究科, 電気工学専攻

    大学院, 修了, 修士

  • 1986年03月

    慶應義塾大学, 工学研究科, 電気工学専攻

    大学院, 修了, 博士

学位 【 表示 / 非表示

  • 工学 , 慶應義塾大学, 1986年03月

 

研究分野 【 表示 / 非表示

  • 情報通信 / 情報学基礎論 (計算機科学)

研究テーマ 【 表示 / 非表示

  • 並列計算機アーキテクチャ リコンフィギャラブルシステム, 

     

     研究概要を見る

    SAN用相互結合網、PCクラスタ用ネットワークRHiNET
    仮想ハードウェア、動的適応ハードウェア、DRP

 

著書 【 表示 / 非表示

  • GPU-accelerated language and communication support by FPGA

    Boku T., Hanawa T., Murai H., Nakao M., Miki Y., Amano H., Umemura M., Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, 2018年12月

     概要を見る

    Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready parallel systems, we need to combine different paradigms of programming such as CUDA/OpenCL, MPI, and OpenMP for advanced platforms. In the hardware configuration, inter-GPU communication through PCIe channel and support by CPU are required which causes large overhead to be a bottleneck of total parallel processing performance. In our project to be described in this chapter, we developed an FPGA-based platform to reduce the latency of inter-GPU communication and also a PGAS language for distributed-memory programming with accelerating devices such as GPU. Through this work, a new approach to compensate the hardware and software weakness of parallel GPU computing is provided. Moreover, FPGA technology for computation and communication acceleration is described upon astrophysical problem where GPU or CPU computation is not sufficient on performance.

  • FPGAの原理と構成

    天野 英晴(編)、飯田全広他14人, オーム社, 2016年04月

  • ディジタル回路設計とコンピュータアーキテクチャ ARM版

    天野 英晴, SiBアクセス, 2016年04月

  • コンピュータアーキテクチャ 定量的アプローチ

    J.L.Hennessy and D.A.Patterson, 翔泳社, 2014年03月

  • CMOS VLSI回路設計

    N.H.E.Weste, D.M.Harris, 丸善出版, 2014年01月

    担当範囲: 10章、付録

全件表示 >>

論文 【 表示 / 非表示

  • RT-libSGM: An Implementation of a Real-time Stereo Matching System on FPGA

    Wei K., Kuno Y., Arai M., Amano H.

    ACM International Conference Proceeding Series (ACM International Conference Proceeding Series)     1 - 9 2022年06月

     概要を見る

    Stereo depth estimation has become an attractive topic in the computer vision field. Although various algorithms strive to optimize the speed and the precision of estimation, the energy cost of a system is also an essential metric for an embedded system. Among these various algorithms, Semi-Global Matching (SGM) has been a popular choice for some real-world applications because of its accuracy-and-speed balance. However, its power consumption makes it difficult to be applied to an embedded system. Thus, we propose a robust stereo matching system, RT-libSGM, working on the Xilinx Field-programmable gate array (FPGA) platforms. The dedicated design of each module optimizes the speed of the entire system while ensuring the flexibility of the system structure. Through an evaluation running on a Zynq FPGA board called M-KUBOS, RT-libSGM achieves state-of-the-art performance with lower power consumption. Compared with the original design (libSGM), when working on the Tegra X2 GPU, RT-libSGM runs 2 × faster at a lower energy cost.

  • Mapping-Aware Kernel Partitioning Method for CGRAs Assisted by Deep Learning

    Kojima T., Ohwada A., Amano H.

    IEEE Transactions on Parallel and Distributed Systems (IEEE Transactions on Parallel and Distributed Systems)  33 ( 5 ) 1213 - 1230 2022年05月

    ISSN  10459219

     概要を見る

    Coarse-grained reconfigurable architectures (CGRAs) provide high energy efficiency with word-level programmability rather than bit-level ones such as FPGAs. The coarser reconfigurability brings about higher energy efficiency and reduces the complexity of compiler tasks compared to the FPGAs. However, application mapping process for CGRAs is still time-consuming. When the compiler tries to map a large and complicated application data-flow-graph(DFG) onto the reconfigurable fabric, it tends to result in inefficient resource use or to fail in mapping. In case of failure, the compiler must divide it into several sub-DFGs and goes back to the same flow. In this work, we propose a novel partitioning method based on a genetic algorithm to eliminate the unmappable DFGs and improve the mapping quality. In order not to generate unmappable sub-DFGs, we also propose an estimation model which predicts the mappability and resource requirements using a DGCNN (Deep Graph Convolutional Neural Network). The genetic algorithm with this model can seek the most resource-efficient mapping without the back-end mapping process. Our model can predict the mappability with more than 98% accuracy and resource usage with a negligible error for two studied CGRAs. Besides, the proposed partitioning method demonstrates 53-75% of memory saving, 1.28-1.39x higher throughput, and better mapping quality over three comparative approaches.

  • A traffic-aware memory-cube network using bypassing

    Shikama Y., Kawano R., Matsutani H., Amano H., Nagasaka Y., Fukumoto N., Koibuchi M.

    Microprocessors and Microsystems (Microprocessors and Microsystems)  90 2022年04月

    ISSN  01419331

     概要を見る

    Three-dimensional stack memory which provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems. While a large number of memory cubes increase the aggregate memory capacity, the communication latency and power consumption increase significantly owing to its low-radix large-diameter packet network. In this context, we propose a memory-cube network called Diagonal Memory Network (DMN). A diagonal network topology, its floor layout, and its lightweight router were designed for low-latency and low-voltage memory-read communication. DMN routing efficiently avoids deadlocks of packets, although it allows each packet transmitted to a processor to use both bypassing and original datapaths. Our evaluation results show that the DMN router decreases the use of hardware resources by more than 31% compared with a conventional virtual channel router. The DMN router reduces energy consumption by 13% and 67% to transit a packet along with the original datapath and bypassing datapath, respectively. Furthermore, using flit-level discrete event simulation, a DMN topology achieves high throughput and latency that is lower than that of existing network topologies using conventional packet routers.

  • An efficient compilation of coarse-grained reconfigurable architectures utilizing pre-optimized sub-graph mappings

    Ohwada A., Kojima T., Amano H.

    Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022 (Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022)     1 - 9 2022年

     概要を見る

    In recent years, IoT devices have become widespread, and energy-efficient coarse-grained reconfigurable architectures (CGRAs) have attracted attention. CGRAs comprise several processing units called processing elements (PEs) arranged in a two-dimensional array. The operations of PEs and the interconnections between them are adaptively changed depending on a target application, and this contributes to a higher energy efficiency compared to general-purpose processors. The application kernel executed on CGRAs is represented as a data flow graph (DFG), and CGRA compilers are responsible for mapping the DFG onto the PE array. Thus, mapping algorithms significantly influence the performance and power efficiency of CGRAs as well as the compile time. This paper proposes POCOCO, a compiler framework for CGRAs that can use pre-optimized subgraph mappings. This contributes to reducing the compiler optimization task. To leverage the subgraph mappings, we extend an existing mapping method based on a genetic algorithm. Experiments on three architectures demonstrated that the proposed method reduces the optimization time by 48%, on an average, for the best case of the three architectures.

  • Power Consumption Reduction Method and Edge Offload Server for Multiple Robots

    Natsuho S., Ohkawa T., Amano H., Sugaya M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics))  12990 LNCS   1 - 19 2022年

    ISSN  03029743

     概要を見る

    There are emerging services for the transports and nursing with multiple robots has become more familiar to our society. Considering the increasing demand for automatic multiple robotic services, it appears the research into automatic multiple robotic services is not satisfactory. Specifically, the issues of power consumption of these robots, and its potential reduction have not been sufficiently discussed. In this research, we propose a method and system to reduce the aggregated power consumption of multiple robots by modelling the characteristics of the hardware and service of each robot. We firstly discuss the prediction model of the robot and improve the formula with consideration of its use in a wide range of situations. Then, we achieve the objective of reducing the aggregate power consumption of multiple robots, using consumption logs and re-allocating tasks of them based on the power consumption prediction model of the individual robot. We propose the design and develop a system using ROS (Robot Operating System) asynchronous server to collect the data from the robots, and make the prediction model for each robot, and reallocate tasks based on the findings of the optimized combination on the server. Through the evaluation of the design and implementation with the proposed system and the actual robot Zoom (GR-PEACH + Rasberry pi), we achieve an average power reduction effect of 14%. In addition, by offloading high-load processing to an edge server configured with FPGA instead the Intel Core i7 performance computer, we achieved and increase in processing speed of up to about 70 times.

全件表示 >>

KOARA(リポジトリ)収録論文等 【 表示 / 非表示

総説・解説等 【 表示 / 非表示

  • Message from the Organizing Committee Chair

    Amano H.

    IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings)     I - II 2019年05月

  • Preface

    Weinhardt M., Koch D., Hochberger C., Schwarz A., Amano H., Bauer L., Cardoso J.M.P., Chow P., Hannig F., Kenter T., Koch A., Leeser M., Marino M.D., Poznanovic D., Ul-Abdin Z., Willenberg R., Ziener D.

    6th International Workshop on FPGAs for Software Programmers, FSP 2019, co-located with International Conference on Field Programmable Logic and Applications, FPL 2019 (6th International Workshop on FPGAs for Software Programmers, FSP 2019, co-located with International Conference on Field Programmable Logic and Applications, FPL 2019)   2019年

研究発表 【 表示 / 非表示

  • Zynq Cluster for CFD Parametric Survey

    天野 英晴

    the International Symposium on Applied Reconfigurable Computing (ARC) (Lio De Janeiro) , 

    2016年02月

    口頭発表(一般)

  • Randomizing Packet Memory Networks for Low-latency Processor-memory Communication

    天野 英晴

    The 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (Crete) , 

    2016年02月

    口頭発表(一般), IEEE

  • Power Optimization considering the chip temperature of low power reconfigurable accelerator CMA-SOTB

    天野 英晴

    he 4rd International Symposium on Computing and Networking (CANDAR), 

    2015年12月

    口頭発表(一般), IEICE

  • A 297MOPS/0.4mW Ultra Low Power Coarse-grained Reconfigurable Accelerator CMA-SOTB-2

    天野 英晴

    The 10th International Conference on ReConFigurable Computing and FPGAs (IEEE) , 

    2015年12月

    口頭発表(一般)

  • On-Chip Decentralized Routers with Balanced Pipelines for Avoiding Interconnect Bottleneck

    天野 英晴

    the 9th ACM/IEEE International Symposium on Networks-on-Chip (NOCS) (Banqueber) , 

    2015年10月

    口頭発表(一般)

全件表示 >>

競争的研究費の研究課題 【 表示 / 非表示

  • ビルディングブロック型計算システムにおけるチップブリッジを用いた積層方式

    2018年04月
    -
    2021年03月

    文部科学省・日本学術振興会, 科学研究費助成事業, 天野 英晴, 基盤研究(B), 補助金,  研究代表者

  • 誘導結合を用いたビルディングブロック型計算システムの研究

    2013年05月
    -
    2018年03月

    文部科学省・日本学術振興会, 科学研究費助成事業, 天野 英晴, 基盤研究(S), 補助金,  研究代表者

受賞 【 表示 / 非表示

  • 電子情報通信学会フェロー

    2015年09月

  • ISS功績賞

    2014年05月, 電子情報通信学会

    受賞区分: 国内学会・会議・シンポジウム等の賞

  • 論文賞

    松谷、鯉渕、天野, 2008年05月, 情報処理学会, Network-on-ChipにおけるFat H-Treeトポロジに関する研究

    受賞区分: 国内学会・会議・シンポジウム等の賞

  • 論文賞

    柴田、宇野、天野, 2003年05月, 電子情報通信学会, DRL上への仮想ハードウェアの実装

    受賞区分: 国内学会・会議・シンポジウム等の賞

  • 情報処理学会坂井記念学術賞

    天野 英晴, 1997年, 情報処理学会

全件表示 >>

 

担当授業科目 【 表示 / 非表示

  • 情報工学輪講

    2023年度

  • 情報工学実験第2B

    2023年度

  • 開放環境科学課題研究

    2023年度

  • 開放環境科学特別研究第2

    2023年度

  • 開放環境科学特別研究第1

    2023年度

全件表示 >>

 

社会活動 【 表示 / 非表示

  • ASP Design Automation Conference 2000

    1998年
    -
    継続中
  • Japanese FPGA/PLD Conference and Exhibit

    1998年
    -
    継続中
  • Cool Chips 1999

    1998年
    -
    継続中
  • ASP Design Automation Conference 1998

    1997年
    -
    継続中
  • IASTED International Conference of Applied Informa

    1997年
    -
    1998年

全件表示 >>

所属学協会 【 表示 / 非表示

  • 電子情報通信学会コンピュータシステム研究専門委員会, 

    2011年05月
    -
    継続中
  • First international workshop on highly-efficient accelerators and reconfigurable technologies (HEART), 

    2010年
    -
    継続中
  • Cool Chips, 

    2009年
    -
    継続中
  • International Symposium on Applied Reconfigurable Computing, 

    2008年03月
    -
    継続中
  • International Conference on Field Programmable Technology, 

    2007年12月
    -
    継続中

全件表示 >>

委員歴 【 表示 / 非表示

  • 2015年10月
    -
    継続中

    General Chair, IEEE/ACM International Symposium on Networks on Chip (NOCS) 2016

  • 2015年05月
    -
    継続中

    FIT実行委員長, 電子情報通信学会、情報処理学会

  • 2015年05月
    -
    継続中

    ISS副会長, 電子情報通信学会

  • 2015年05月
    -
    2016年03月

    全国大会プログラム委員長, 情報処理学会

  • 2011年05月
    -
    2013年05月

    専門委員長, 電子情報通信学会コンピュータシステム研究専門委員会

全件表示 >>