Amano, Hideharu

写真a

Affiliation

Faculty of Science and Technology (Mita)

Position

Professor Emeritus

Related Websites

External Links

Career 【 Display / hide

  • 1985.04
    -
    1989.03

    大学助手(理工学部電気工学科)

  • 1989.04
    -
    1994.03

    大学専任講師(理工学部電気工学科)

  • 1989.10
    -
    1990.09

    Stanford大学 ,訪問講師

  • 1994.04
    -
    1996.03

    大学助教授(理工学部電気工学科)

  • 1996.04
    -
    2001.03

    大学助教授(理工学部情報工学科)

display all >>

Academic Background 【 Display / hide

  • 1981.03

    Keio University, Faculty of Engineering, 電気工学科

    University, Graduated

  • 1983.03

    Keio University, Graduate School, Division of Engineering, 電気工学専攻

    Graduate School, Completed, Master's course

  • 1986.03

    Keio University, Graduate School, Division of Engineering, 電気工学専攻

    Graduate School, Completed, Doctoral course

Academic Degrees 【 Display / hide

  • 工学 , Keio University, 1986.03

 

Research Areas 【 Display / hide

  • Informatics / Theory of informatics (Computer Science)

Research Themes 【 Display / hide

  • Parallel Computer Architecture Reconfigurable System, 

     

     View Summary

    SAN, RHiNET, Virtual Hardware, Dynamic Adaptive hardware, DRP

 

Books 【 Display / hide

  • GPU-accelerated language and communication support by FPGA

    Boku T., Hanawa T., Murai H., Nakao M., Miki Y., Amano H., Umemura M., Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, 2018.12

     View Summary

    Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready parallel systems, we need to combine different paradigms of programming such as CUDA/OpenCL, MPI, and OpenMP for advanced platforms. In the hardware configuration, inter-GPU communication through PCIe channel and support by CPU are required which causes large overhead to be a bottleneck of total parallel processing performance. In our project to be described in this chapter, we developed an FPGA-based platform to reduce the latency of inter-GPU communication and also a PGAS language for distributed-memory programming with accelerating devices such as GPU. Through this work, a new approach to compensate the hardware and software weakness of parallel GPU computing is provided. Moreover, FPGA technology for computation and communication acceleration is described upon astrophysical problem where GPU or CPU computation is not sufficient on performance.

  • FPGAの原理と構成

    AMANO HIDEHARU, オーム社, 2016.04

  • ディジタル回路設計とコンピュータアーキテクチャ ARM版

    AMANO HIDEHARU, SiBアクセス, 2016.04

  • Computer Architecture, A Quantitative Approach

    J.L.Hennessy and D.A.Patterson, 翔泳社, 2014.03

  • CMOS VLSI Design

    N.H.E.Weste, D.M.Harris, 丸善出版, 2014.01

    Scope: 10章、付録

display all >>

Papers 【 Display / hide

  • RT-libSGM: An Implementation of a Real-time Stereo Matching System on FPGA

    Wei K., Kuno Y., Arai M., Amano H.

    ACM International Conference Proceeding Series (ACM International Conference Proceeding Series)     1 - 9 2022.06

     View Summary

    Stereo depth estimation has become an attractive topic in the computer vision field. Although various algorithms strive to optimize the speed and the precision of estimation, the energy cost of a system is also an essential metric for an embedded system. Among these various algorithms, Semi-Global Matching (SGM) has been a popular choice for some real-world applications because of its accuracy-and-speed balance. However, its power consumption makes it difficult to be applied to an embedded system. Thus, we propose a robust stereo matching system, RT-libSGM, working on the Xilinx Field-programmable gate array (FPGA) platforms. The dedicated design of each module optimizes the speed of the entire system while ensuring the flexibility of the system structure. Through an evaluation running on a Zynq FPGA board called M-KUBOS, RT-libSGM achieves state-of-the-art performance with lower power consumption. Compared with the original design (libSGM), when working on the Tegra X2 GPU, RT-libSGM runs 2 × faster at a lower energy cost.

  • Mapping-Aware Kernel Partitioning Method for CGRAs Assisted by Deep Learning

    Kojima T., Ohwada A., Amano H.

    IEEE Transactions on Parallel and Distributed Systems (IEEE Transactions on Parallel and Distributed Systems)  33 ( 5 ) 1213 - 1230 2022.05

    ISSN  10459219

     View Summary

    Coarse-grained reconfigurable architectures (CGRAs) provide high energy efficiency with word-level programmability rather than bit-level ones such as FPGAs. The coarser reconfigurability brings about higher energy efficiency and reduces the complexity of compiler tasks compared to the FPGAs. However, application mapping process for CGRAs is still time-consuming. When the compiler tries to map a large and complicated application data-flow-graph(DFG) onto the reconfigurable fabric, it tends to result in inefficient resource use or to fail in mapping. In case of failure, the compiler must divide it into several sub-DFGs and goes back to the same flow. In this work, we propose a novel partitioning method based on a genetic algorithm to eliminate the unmappable DFGs and improve the mapping quality. In order not to generate unmappable sub-DFGs, we also propose an estimation model which predicts the mappability and resource requirements using a DGCNN (Deep Graph Convolutional Neural Network). The genetic algorithm with this model can seek the most resource-efficient mapping without the back-end mapping process. Our model can predict the mappability with more than 98% accuracy and resource usage with a negligible error for two studied CGRAs. Besides, the proposed partitioning method demonstrates 53-75% of memory saving, 1.28-1.39x higher throughput, and better mapping quality over three comparative approaches.

  • A traffic-aware memory-cube network using bypassing

    Shikama Y., Kawano R., Matsutani H., Amano H., Nagasaka Y., Fukumoto N., Koibuchi M.

    Microprocessors and Microsystems (Microprocessors and Microsystems)  90 2022.04

    ISSN  01419331

     View Summary

    Three-dimensional stack memory which provides both high-bandwidth access and large capacity is a promising technology for next-generation computer systems. While a large number of memory cubes increase the aggregate memory capacity, the communication latency and power consumption increase significantly owing to its low-radix large-diameter packet network. In this context, we propose a memory-cube network called Diagonal Memory Network (DMN). A diagonal network topology, its floor layout, and its lightweight router were designed for low-latency and low-voltage memory-read communication. DMN routing efficiently avoids deadlocks of packets, although it allows each packet transmitted to a processor to use both bypassing and original datapaths. Our evaluation results show that the DMN router decreases the use of hardware resources by more than 31% compared with a conventional virtual channel router. The DMN router reduces energy consumption by 13% and 67% to transit a packet along with the original datapath and bypassing datapath, respectively. Furthermore, using flit-level discrete event simulation, a DMN topology achieves high throughput and latency that is lower than that of existing network topologies using conventional packet routers.

  • An efficient compilation of coarse-grained reconfigurable architectures utilizing pre-optimized sub-graph mappings

    Ohwada A., Kojima T., Amano H.

    Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022 (Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022)     1 - 9 2022

     View Summary

    In recent years, IoT devices have become widespread, and energy-efficient coarse-grained reconfigurable architectures (CGRAs) have attracted attention. CGRAs comprise several processing units called processing elements (PEs) arranged in a two-dimensional array. The operations of PEs and the interconnections between them are adaptively changed depending on a target application, and this contributes to a higher energy efficiency compared to general-purpose processors. The application kernel executed on CGRAs is represented as a data flow graph (DFG), and CGRA compilers are responsible for mapping the DFG onto the PE array. Thus, mapping algorithms significantly influence the performance and power efficiency of CGRAs as well as the compile time. This paper proposes POCOCO, a compiler framework for CGRAs that can use pre-optimized subgraph mappings. This contributes to reducing the compiler optimization task. To leverage the subgraph mappings, we extend an existing mapping method based on a genetic algorithm. Experiments on three architectures demonstrated that the proposed method reduces the optimization time by 48%, on an average, for the best case of the three architectures.

  • Power Consumption Reduction Method and Edge Offload Server for Multiple Robots

    Natsuho S., Ohkawa T., Amano H., Sugaya M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics))  12990 LNCS   1 - 19 2022

    ISSN  03029743

     View Summary

    There are emerging services for the transports and nursing with multiple robots has become more familiar to our society. Considering the increasing demand for automatic multiple robotic services, it appears the research into automatic multiple robotic services is not satisfactory. Specifically, the issues of power consumption of these robots, and its potential reduction have not been sufficiently discussed. In this research, we propose a method and system to reduce the aggregated power consumption of multiple robots by modelling the characteristics of the hardware and service of each robot. We firstly discuss the prediction model of the robot and improve the formula with consideration of its use in a wide range of situations. Then, we achieve the objective of reducing the aggregate power consumption of multiple robots, using consumption logs and re-allocating tasks of them based on the power consumption prediction model of the individual robot. We propose the design and develop a system using ROS (Robot Operating System) asynchronous server to collect the data from the robots, and make the prediction model for each robot, and reallocate tasks based on the findings of the optimized combination on the server. Through the evaluation of the design and implementation with the proposed system and the actual robot Zoom (GR-PEACH + Rasberry pi), we achieve an average power reduction effect of 14%. In addition, by offloading high-load processing to an edge server configured with FPGA instead the Intel Core i7 performance computer, we achieved and increase in processing speed of up to about 70 times.

display all >>

Papers, etc., Registered in KOARA 【 Display / hide

Reviews, Commentaries, etc. 【 Display / hide

  • Message from the Organizing Committee Chair

    Amano H.

    IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings)     I - II 2019.05

  • Preface

    Weinhardt M., Koch D., Hochberger C., Schwarz A., Amano H., Bauer L., Cardoso J.M.P., Chow P., Hannig F., Kenter T., Koch A., Leeser M., Marino M.D., Poznanovic D., Ul-Abdin Z., Willenberg R., Ziener D.

    6th International Workshop on FPGAs for Software Programmers, FSP 2019, co-located with International Conference on Field Programmable Logic and Applications, FPL 2019 (6th International Workshop on FPGAs for Software Programmers, FSP 2019, co-located with International Conference on Field Programmable Logic and Applications, FPL 2019)   2019

Presentations 【 Display / hide

  • Zynq Cluster for CFD Parametric Survey

    AMANO HIDEHARU

    the International Symposium on Applied Reconfigurable Computing (ARC) (Lio De Janeiro) , 

    2016.02

    Oral presentation (general)

  • Randomizing Packet Memory Networks for Low-latency Processor-memory Communication

    AMANO HIDEHARU

    The 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) (Crete) , 

    2016.02

    Oral presentation (general), IEEE

  • Power Optimization considering the chip temperature of low power reconfigurable accelerator CMA-SOTB

    AMANO HIDEHARU

    he 4rd International Symposium on Computing and Networking (CANDAR), 

    2015.12

    Oral presentation (general), IEICE

  • A 297MOPS/0.4mW Ultra Low Power Coarse-grained Reconfigurable Accelerator CMA-SOTB-2

    AMANO HIDEHARU

    The 10th International Conference on ReConFigurable Computing and FPGAs (IEEE) , 

    2015.12

    Oral presentation (general)

  • On-Chip Decentralized Routers with Balanced Pipelines for Avoiding Interconnect Bottleneck

    AMANO HIDEHARU

    the 9th ACM/IEEE International Symposium on Networks-on-Chip (NOCS) (Banqueber) , 

    2015.10

    Oral presentation (general)

display all >>

Research Projects of Competitive Funds, etc. 【 Display / hide

  • Stacking methods with chip bridges for a building block computing system

    2018.04
    -
    2021.03

    MEXT,JSPS, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (B), Principal investigator

  • A Study on Building-Block Computing Systems using Inductive Coupling Interconnect

    2013.05
    -
    2018.03

    MEXT,JSPS, Grant-in-Aid for Scientific Research, Grant-in-Aid for Scientific Research (S), Principal investigator

Awards 【 Display / hide

  • 電子情報通信学会フェロー

    2015.09

  • ISS功績賞

    2014.05, 電子情報通信学会

    Type of Award: Award from Japanese society, conference, symposium, etc.

  • Best Paper Award

    松谷、鯉渕、天野, 2008.05, IPSJ, Network-on-ChipにおけるFat H-Treeトポロジに関する研究

    Type of Award: Award from Japanese society, conference, symposium, etc.

  • Best Paper Award

    柴田、宇野、天野, 2003.05, IEICE, Implementing of a Virtual Hardware on DRL

    Type of Award: Award from Japanese society, conference, symposium, etc.

  • 情報処理学会坂井記念学術賞

    天野 英晴, 1997, 情報処理学会

display all >>

 

Courses Taught 【 Display / hide

  • COMPUTER ARCHITECTURE

    2024

  • RECITATION IN INFORMATION AND COMPUTER SCIENCE

    2023

  • LABORATORIES IN INFORMATION AND COMPUTER SCIENCE 2B

    2023

  • INDEPENDENT STUDY ON SCIENCE FOR OPEN AND ENVIRONMENTAL SYSTEMS

    2023

  • GRADUATE RESEARCH ON SCIENCE FOR OPEN AND ENVIRONMENTAL SYSTEMS 2

    2023

display all >>

 

Social Activities 【 Display / hide

  • ASP Design Automation Conference 2000

    1998
    -
    Present
  • Japanese FPGA/PLD Conference and Exhibit

    1998
    -
    Present
  • Cool Chips 1999

    1998
    -
    Present
  • ASP Design Automation Conference 1998

    1997
    -
    Present
  • IASTED International Conference of Applied Informa

    1997
    -
    1998

display all >>

Memberships in Academic Societies 【 Display / hide

  • 電子情報通信学会コンピュータシステム研究専門委員会, 

    2011.05
    -
    Present
  • First international workshop on highly-efficient accelerators and reconfigurable technologies (HEART), 

    2010
    -
    Present
  • Cool Chips, 

    2009
    -
    Present
  • International Symposium on Applied Reconfigurable Computing, 

    2008.03
    -
    Present
  • International Conference on Field Programmable Technology, 

    2007.12
    -
    Present

display all >>

Committee Experiences 【 Display / hide

  • 2015.10
    -
    Present

    General Chair, IEEE/ACM International Symposium on Networks on Chip (NOCS) 2016

  • 2015.05
    -
    Present

    FIT実行委員長, 電子情報通信学会、情報処理学会

  • 2015.05
    -
    Present

    ISS副会長, 電子情報通信学会

  • 2015.05
    -
    2016.03

    全国大会プログラム委員長, 情報処理学会

  • 2011.05
    -
    2013.05

    専門委員長, 電子情報通信学会コンピュータシステム研究専門委員会

display all >>