Kondo, Masaaki



Faculty of Science and Technology, Department of Information and Computer Science (Yagami)



Related Websites


Research Areas 【 Display / hide

  • Informatics / Computer system

Research Keywords 【 Display / hide

  • Computer Architecture

  • High Performance Computing

  • Quantum Computing


Papers 【 Display / hide

  • Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks

    Kodama Y., Kondo M., Sato M.

    IEICE Transactions on Electronics (IEICE Transactions on Electronics)  E106.C ( 6 ) 303 - 311 2023.06

    ISSN  09168524

     View Summary

    The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.

  • An Interactive and Reductive Graph Processing Library for Edge Computing in Smart Society

    Zhou J., Kondo M.

    IEICE Transactions on Information and Systems (IEICE Transactions on Information and Systems)  E106D ( 3 ) 319 - 327 2023.03

    ISSN  09168532

     View Summary

    Due to the limitations of cloud computing on latency, bandwidth and data confidentiality, edge computing has emerged as a novel location-aware paradigm to provide them with more processing capacity to improve the computing performance and quality of service (QoS) in several typical domains of human activity in smart society, such as social networks, medical diagnosis, telecommunications, recommendation systems, internal threat detection, transports, Internet of Things (IoT), etc. These application domains often handle a vast collection of entities with various relationships, which can be naturally represented by the graph data structure. Graph processing is a powerful tool to model and optimize complex problems in which the graph-based data is involved. In view of the relatively insufficient resource provisioning of the portable terminals, in this paper, for the first time to our knowledge, we propose an interactive and reductive graph processing library (GPL) for edge computing in smart society at low overhead. Experimental evaluation is conducted to indicate that the proposed GPL is more user-friendly and highly competitive compared with other established systems, such as igraph, NetworKit and NetworkX, based on different graph datasets over a variety of popular algorithms.

  • Exploiting Data Parallelism in Graph-Based Simultaneous Localization and Mapping: A Case Study with GPU Accelerations

    Zheng J., He Y., Kondo M.

    ACM International Conference Proceeding Series (ACM International Conference Proceeding Series)     126 - 139 2023.02

     View Summary

    Graph-based simultaneous localization and mapping (G-SLAM) is an intuitive SLAM implementation where graphs are used to represent poses, landmarks and sensor measurements when a mobile robot builds a map of the environment and locates itself in it. Being a very important application employed in many realistic scenarios, estimating the whole environment and all trajectories through solving graph problems for SLAM can incur a large amount of computation and consume a significant amount of energy. For the purpose of improving both performance and energy efficiency, we have unveiled the critical path of the G-SLAM algorithm in this paper and implemented a GPU-based solution to aid it. Furthermore, we have attempted to offload performance-critical components (such as matrix inversions when updating the trajectory) in the G-SLAM process into GPUs through CUDA to exploit data parallelism. With our solution, we observe a speed-up of up to 19.7x and an energy saving of up to 83.7% over a modern workstation class x86 CPU; while on a platform dedicated for edge computing (NVIDIA Jetson Nano), we achieve a speed-up of up to 2.5x and an energy saving of up to 6.4% with its integrated GPU, respectively.

  • A Scalable Body Bias Optimization Method Toward Low-Power CGRAs

    Kojima T., Okuhara H., Kondo M., Amano H.

    IEEE Micro (IEEE Micro)  43 ( 1 ) 49 - 57 2023.01

    ISSN  02721732

     View Summary

    Body biasing is one of the critical techniques to realize more energy-efficient computing with reconfigurable devices, such as coarse-grained reconfigurable architectures. Its benefit depends on the control granularity, whereas fine-grained control makes it challenging to find the best body bias voltage for each domain due to the complexity of the optimization problem. This work reformulates the optimization problem and introduces continuous relaxation to solve it faster than previous work based on an integer linear program. Experimental result shows the proposed method can solve the problem within 0.5 s for all benchmarks in any conditions. For a middle-class problem, up to 5.65- speedup and a geometric mean of 2.06 - speedup are demonstrated compared to the previous method with negligible loss of accuracy. Besides, we explore finer body bias control considering the power- and area-overhead of an on-chip body bias generator and suggest the most reasonable design saves 66% of energy consumption.

  • Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data Compression

    Hu S., Ito M., Yoshikawa T., He Y., Kondo M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics))  13798 LNCS   467 - 480 2023

    ISSN  03029743

     View Summary

    SpMV is a very common algorithm in linear algebra, which is widely adopted by machine learning applications nowadays. Especially, fully-connected MLP layers dominate many SpMV tasks that play a critical role in diverse services, and therefore a large fraction of data center cycles are spent. Despite exploiting sparse matrix storage techniques such as CSR/CSC, SpMV still suffers from limited memory bandwidth during data transferring because of the architecture of modern computing systems. However, we find that both integer type and floating-point type data used in matrix-vector multiplications are handled plainly without any necessary pre-processing. We added compression and decompression pre-processing between the main memory and Last Level Cache (LLC) which may dramatically reduce the memory bandwidth consumption. Furthermore, we also observed that convergence speed in some typical scientific computation benchmarks will not be degraded when adopting compressed floating-point data instead of the original double type. Based on these discoveries, in this paper, we propose a simple yet effective compression approach that can be implemented in general computing architectures and HPC systems preferably. When adopting this technique, a performance improvement of 1.92x is made in the best case.

display all >>


Courses Taught 【 Display / hide











display all >>