ウェイ, カイジ ( ウエイ カイジ )

Wei, Kaijie

写真a

所属(所属キャンパス)

理工学研究科 ( 矢上 )

職名

特任助教(有期)

 

研究分野 【 表示 / 非表示

  • 情報通信 / 計算機システム (量子通信)

  • 情報通信 / 計算機システム (画像処理)

  • 情報通信 / 計算機システム (再構成可能システム)

  • 情報通信 / 計算機システム (量子計算)

研究キーワード 【 表示 / 非表示

  • FPGA

  • ハードウェアアクセラレーション

  • ハードウェア/ソフトウェア協調設計

  • リアルタイム画像・信号処理

  • 量子回路シミュレーション

 

論文 【 表示 / 非表示

  • FPT-EMS: An FPGA Implementation Using NB-LDPC Code for Continuous-Variable Quantum Key Distribution

    Wei K., Garg D., Nagai R., Tomono T., Amano H.

    Proceedings of the 15th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies Heart 2025    117 - 125 2025年05月

     概要を見る

    The Continuous-Variable Quantum Key Distribution (CV-QKD) is a groundbreaking technology that enables two parties, Alice and Bob, to share secret cryptographic keys with security guaranteed by the fundamental principle of quantum mechanics. The optical signal's amplitude and phase quadratures are transmitted through a quantum channel and measured by the receiver (Bob). Due to channel noise, loss, and other imperfections, error correction is necessary to ensure both parties share identical raw key bits while minimizing information leakage to potential eavesdroppers (Eve). Non-Binary Low-Density Parity-Check (NB-LDPC) codes are well-suited for CV-QKD because they achieve high reconciliation efficiency, particularly in low-SNR scenarios. However, the intensive computational complexity hinders its further deployment in real-world applications. In this paper, we present an HLS-based decoder system, FPT-EMS (Field-Programmable T-EMS), which consists of six submodules aligning with the construction of the base design, Trellis-Based Extended Min-Sum (T-EMS). After the dedicated design of each submodule considering algorithm properties and FPGA characteristics, we ultimately achieved a 9.36 × speedup compared with ARM cores of the target platform, RFSoC 4x2, at the throughput of 0.89 Mbps over one iteration.

  • Qu-Trefoil: Large-Scale Quantum Circuit Simulator Working on FPGA With SATA Storages

    Wei K., Amano H., Niwase R., Yamaguchi Y., Miyoshi T.

    IEEE Transactions on Computers 74 ( 4 ) 1306 - 1321 2025年

    ISSN  00189340

     概要を見る

    Quantum circuits are fundamental components of quantum computing, and state-vector-based quantum circuit simulation is a widely used technique for tracking qubit behavior throughout circuit evolution. However, simulating a circuit with n qubits requires 2<sup>n+4</sup> bytes of memory, making simulations of more than 40 qubits feasible only on supercomputers. To address this limitation, we propose the Qu-Trefoil, a system designed for large-scale quantum circuit simulations on an FPGA-based platform called Trefoil. Trefoil is a multi-FPGA system connected to eight storage subsystems, each equipped with 32 SATA disks. Qu-Trefoil integrates a suite of HLS-based universal quantum gates, including Clifford gates (Hadamard (H), Pauli-Z (Z), Phase (S), Controlled-NOT (CNOT)), the T gate, and unitary matrix computation, along with HDL-designed modules for system-wide integration. Our extensive evaluation demonstrates the system's robustness and flexibility, covering quantum gate performance, chunk size, disk extensibility, and efficiency across different SATA generations. We successfully simulated quantum circuits with over 43 qubits, which required more than 128 TB of memory, in approximately 3.72 to 13.06 hours on a single storage subsystem equipped with one FPGA. This achievement represents a significant milestone in the advancement of quantum computing simulations. Furthermore, thanks to its unique architecture, Qu-Trefoil is more accessible, flexible, and cost-efficient than other existing simulators for large-scale quantum circuit simulations, making it a viable option for researchers with limited access to supercomputers.

  • A data compressor for FPGA-based state vector quantum simulators

    Wei K., Amano H., Niwase R., Yamaguchi Y.

    ACM International Conference Proceeding Series    63 - 70 2024年06月

     概要を見る

    A quantum computer simulator is a tool that simulates the operation of a quantum computer using classical computers. Researchers widely adopt the state-vector-based simulator to reproduce the state of quantum bits (qubits) faithfully. A challenge arises in requiring a main memory space of 2n + 4 bytes to handle n quantum bits. Considering power efficiency and cost, we propose implementing a quantum computer simulator named Qu-Trefoil using Serial ATA (SATA) disks directly connected to an FPGA board called Trefoil. Due to the limited data transfer speed between SATA disks and FPGA, improving the overall system performance is challenging. This study proposes improving the overall system throughput using data compression. Considering the data characteristics in the state vector method and the overall system structure, we verify that we can eliminate the communication bottleneck on the host side of Trefoil by employing the floating-point compression algorithm ZFP based on FPGA. Focusing on the compression aspect in Trefoil, this paper elucidates the bottlenecks of the target system and conducts implementation and evaluation. Moreover, by utilizing partially optimized compression IP, it is possible to achieve performance nearly four times higher than the original design, resulting in a throughput of 3.7GB/s, comparable with the benchmark ZFP working on CPU devices. Optimizing the entire IP can potentially improve the overall simulator's performance.

  • RT-libSGM: FPGA-Oriented Real-Time Stereo Matching System with High Scalability

    Wei K., Kuno Y., Arai M., Amano H.

    IEICE Transactions on Information and Systems E106D ( 3 ) 337 - 348 2023年03月

    ISSN  09168532

     概要を見る

    Stereo depth estimation has become an attractive topic in the computer vision field. Although various algorithms strive to optimize the speed and the precision of estimation, the energy cost of a system is also an essential metric for an embedded system. Among these various algorithms, Semi-Global Matching (SGM) has been a popular choice for some real-world applications because of its accuracy-and-speed balance. However, its power consumption makes it difficult to be applied to an embedded system. Thus, we propose a robust stereo matching system, RT-libSGM, working on the Xilinx Field-Programmable Gate Array (FPGA) platforms. The dedicated design of each module optimizes the speed of the entire system while ensuring the flexibility of the system structure. Through an evaluation on a Zynq FPGA board called M-KUBOS, RT-libSGM achieves state-of-the-art performance with lower power consumption. Compared with the benchmark design (libSGM) working on the Tegra X2 GPU, RTlibSGM runs more than 2× faster at a much lower energy cost.

  • A cost/power efficient storage system with directly connected FPGA and SATA disks

    Niwase R., Harasawa H., Yamaguchi Y., Kaijie W., Amano H.

    Proceedings 2023 16th IEEE International Symposium on Embedded Multicore Many Core Systems on Chip Mcsoc 2023    51 - 58 2023年

     概要を見る

    Providing large storage on Multi-Access Edge (MEC) devices has various advantages: A large amount of bare data that cannot transfer to the cloud without anonymization can be kept easily. Preprocessing in the MEC devices is a hopeful approach for making "well-selected data"for training AI algorithms on the cloud. By distributing big data across numerous base stations, it is possible to prevent data loss during times of disaster. For such purpose, we propose the Trefoil storage subsystem supports cost and power-efficient stand-alone storage by connecting FPGA boards and up to 32 SATA Solid-State Drives (SSDs) directly. We introduced LiteX, an open-source FPGA system integration tool, to design the storage system. Various LiteX IP cores, including soft CPU cores(VexRISCv), memory controllers (LiteDRAM), and FPGA interconnect controllers (LiteICLink), are combined as well as the SATA controller(LiteSATA). We built an example evaluation system and evaluated a storage system's performance and power consumption with multiple SATA disks. Our evaluation results show that the designed SATA system can best use the SATA disk's maximum bandwidth, 569.1MB/sec for write operations and 523.8MB/sec for read operations. Striping data access with multiple disks demonstrates almost no overhead for a read operation, while a slight speed decrease of 4.24MB/sec per unit was observed for the write operation. Only 3W for the disk unit is needed during disk access. As an example application, a quantum computer simulation is briefly introduced.

全件表示 >>