Aoki, Yoshimitsu

写真a

Affiliation

Faculty of Science and Technology, Department of Electronics and Electrical Engineering (Yagami)

Position

Professor

Related Websites

Remarks

Professor

External Links

Profile Summary 【 Display / hide

  • ・1999年04月-2001年03月 早稲田大学理工学部 応用物理学科助手  橋本周司教授の研究室において、顔画像認識・合成、工業用精密画像計測、  ヒューマノイドロボットの視覚システムに関する研究に従事. ・2002年04月-2005年03月 芝浦工業大学工学部情報工学科 専任講師(青木研究室発足)  2005年04月-2008年3月 芝浦工業大学工学部情報工学科 准教授  顔形状・動作の3次元画像解析技術の医学・歯学応用  衛星画像他リモートセンシングデータの統合活用に関する研究  道路交通画像システム,高精度画像計測システムに関する研究等に従事.  ※芝浦工業大学にて、7年間で約90名の学生の研究指導を担当 ・2008年04月-現在 慶應義塾大学理工学部電子工学科 准教授  人物を対象とした画像計測・認識技術、及び応用システムに関する研究.  応用先として,セキュリティ,マーケティング,医療・福祉,美容,インターフェース,エンターテイメント,自動車,等を視野に入れ,幅広い産業応用を目指す.  人の認知機構や感性を考慮したメディア理解技術とその応用,新しい視覚センサ,ロバスト画像特徴量に関する研究等に従事. ・2013年2月-現在 株式会社イデアクエスト 取締役兼任  慶應理工発画像センシング技術の医療分野での実用化を目指している.

Career 【 Display / hide

  • 1999.04
    -
    2002.03

    早稲田大学, 理工学部 , 助手

  • 2002.04
    -
    2005.03

    芝浦工業大学 , 工学部 情報工学科, 専任講師

  • 2005.04
    -
    2008.03

    芝浦工業大学, 工学部 情報工学科, 助教授(2007より准教授)

  • 2008.04
    -
    2017.03

    慶應義塾大学, 理工学部, 准教授

  • 2013.02
    -
    2017.03

    株式会社イデアクエスト, 取締役

display all >>

Academic Background 【 Display / hide

  • 1996.03

    Waseda University, Faculty of Science and Engineering, 応用物理学科

    University, Graduated

  • 1998.03

    Waseda University, Graduate School, Division of Science and Engineering, 物理学及応用物理学専攻

    Graduate School, Completed, Master's course

  • 2001.02

    Waseda University, Graduate School, Division of Science and Engineering, 物理学及応用物理学専攻

    Graduate School, Completed, Doctoral course

Academic Degrees 【 Display / hide

  • 博士(工学), Waseda University, Coursework, 2001.02

 

Research Areas 【 Display / hide

  • Manufacturing Technology (Mechanical Engineering, Electrical and Electronic Engineering, Chemical Engineering) / Measurement engineering (Measurement Engineering)

  • Informatics / Database (Media Informatics/Data Base)

  • Informatics / Perceptual information processing (Perception Information Processing/Intelligent Robotics)

  • Life Science / Medical systems (Medical Systems)

 

Books 【 Display / hide

  • 画像センシングのしくみと開発がしっかりわかる教科書

    青木義満,輿水大和 他, 技術評論社, 2023.06,  Page: 239

  • 顔の百科事典

    丸善出版, 2015.09

    Scope: 7 章 コンピュータと顔 ─顔の情報学─

     View Summary

    顔を見ない日はないというくらい、「顔」は私達にとってあたり前の存在ですが、私達は一体どれほど「顔」のことを知っているのでしょうか。そのような「顔」を総合的に研究するのが「顔学」です。 顔学には、動物学や人類学をはじめ、解剖学、生理学、歯学、心理学、社会学の文化的な対象として扱われるだけでなく、演劇や美術などの芸術学、コンピュータの分野では、情報学、さらに、美容学、人相学など、実に多様な学問分野と関係しています。 本書では、私達と切り離すことのできない「顔」の、歴史的・文化的・社会的・科学的側面を中項目の事典としてまとめられていることにより、多様な分野を横断する知識にも容易にアクセスが可能になっています。 日本顔学会創立20周年記念出版として、「顔学」について体系化を行った、初めての百科事典です。

  • 三次元画像センシングの新展開

    AOKI Yoshimitsu, NTS, 2015.05

    Scope: 第5章1節 色情報とレンジデータのフュージョンによる高分解能三次元レンジセンサの開発

  • 電気学会125年史

    AOKI Yoshimitsu, 電気学会, 2013.05

  • 電気学会125年史

    AOKI Yoshimitsu, 電気学会, 2013.05

display all >>

Papers 【 Display / hide

  • RetinaViT: Efficient Visual Backbone for Online Video Streams

    Suzuki T., Aoki Y.

    Sensors 24 ( 17 )  2024.09

     View Summary

    In online video understanding, which has a wide range of real-world applications, inference speed is crucial. Many approaches involve frame-level visual feature extraction, which often represents the biggest bottleneck. We propose RetinaViT, an efficient method for extracting frame-level visual features in an online video stream, aiming to fundamentally enhance the efficiency of online video understanding tasks. RetinaViT is composed of efficiently approximated Transformer blocks that only take changed tokens (event tokens) as queries and reuse the already processed tokens from the previous timestep for the others. Furthermore, we restrict keys and values to the spatial neighborhoods of event tokens to further improve efficiency. RetinaViT involves tuning multiple parameters, which we determine through a multi-step process. During model training, we randomly vary these parameters and then perform black-box optimization to maximize accuracy and efficiency on the pre-trained model. We conducted extensive experiments on various online video recognition tasks, including action recognition, pose estimation, and object segmentation, validating the effectiveness of each component in RetinaViT and demonstrating improvements in the speed/accuracy trade-off compared to baselines. In particular, for action recognition, RetinaViT built on ViT-B16 reduces inference time by approximately 61.9% on the CPU and 50.8% on the GPU, while achieving slight accuracy improvements rather than degradation.

  • Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification

    Kato N., Nota Y., Aoki Y.

    Sensors (Sensors)  24 ( 11 )  2024.06

     View Summary

    Large vision-language models, such as Contrastive Vision-Language Pre-training (CLIP), pre-trained on large-scale image–text datasets, have demonstrated robust zero-shot transfer capabilities across various downstream tasks. To further enhance the few-shot recognition performance of CLIP, Tip-Adapter augments the CLIP model with an adapter that incorporates a key-value cache model constructed from the few-shot training set. This approach enables training-free adaptation and has shown significant improvements in few-shot recognition, especially with additional fine-tuning. However, the size of the adapter increases in proportion to the number of training samples, making it difficult to deploy in practical applications. In this paper, we propose a novel CLIP adaptation method, named Proto-Adapter, which employs a single-layer adapter of constant size regardless of the amount of training data and even outperforms Tip-Adapter. Proto-Adapter constructs the adapter’s weights based on prototype representations for each class. By aggregating the features of the training samples, it successfully reduces the size of the adapter without compromising performance. Moreover, the performance of the model can be further enhanced by fine-tuning the adapter’s weights using a distance margin penalty, which imposes additional inter-class discrepancy to the output logits. We posit that this training scheme allows us to obtain a model with a discriminative decision boundary even when trained with a limited amount of data. We demonstrate the effectiveness of the proposed method through extensive experiments of few-shot classification on diverse datasets.

  • Event-Based Background-Oriented Schlieren

    Shiba S., Hamann F., Aoki Y., Gallego G.

    IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Transactions on Pattern Analysis and Machine Intelligence)  46 ( 4 ) 2011 - 2026 2024.04

    Joint Work,  ISSN  01628828

     View Summary

    Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution, and data efficiency) to overcome such limitations due to their bio-inspired sensing principle. This article presents a novel technique for perceiving air convection using events and frames by providing the first theoretical analysis that connects event data and schlieren. We formulate the problem as a variational optimization one combining the linearized event generation model with a physically-motivated parameterization that estimates the temporal derivative of the air density. The experiments with accurately aligned frame- and event camera data reveal that the proposed method enables event cameras to obtain on par results with existing frame-based optical flow techniques. Moreover, the proposed method works under dark conditions where frame-based schlieren fails, and also enables slow-motion analysis by leveraging the event camera's advantages. Our work pioneers and opens a new stack of event camera applications, as we publish the source code as well as the first schlieren dataset with high-quality frame and event data.

  • Synthetic Document Images with Diverse Shadows for Deep Shadow Removal Networks

    Matsuo Y., Aoki Y.

    Sensors (Sensors)  24 ( 2 )  2024.01

    ISSN  14248220

     View Summary

    Shadow removal for document images is an essential task for digitized document applications. Recent shadow removal models have been trained on pairs of shadow images and shadow-free images. However, obtaining a large, diverse dataset for document shadow removal takes time and effort. Thus, only small real datasets are available. Graphic renderers have been used to synthesize shadows to create relatively large datasets. However, the limited number of unique documents and the limited lighting environments adversely affect the network performance. This paper presents a large-scale, diverse dataset called the Synthetic Document with Diverse Shadows (SynDocDS) dataset. The SynDocDS comprises rendered images with diverse shadows augmented by a physics-based illumination model, which can be utilized to obtain a more robust and high-performance deep shadow removal network. In this paper, we further propose a Dual Shadow Fusion Network (DSFN). Unlike natural images, document images often have constant background colors requiring a high understanding of global color features for training a deep shadow removal network. The DSFN has a high global color comprehension and understanding of shadow regions and merges shadow attentions and features efficiently. We conduct experiments on three publicly available datasets, the OSR, Kligler’s, and Jung’s datasets, to validate our proposed method’s effectiveness. In comparison to training on existing synthetic datasets, our model training on the SynDocDS dataset achieves an enhancement in the PSNR and SSIM, increasing them from 23.00 dB to 25.70 dB and 0.959 to 0.971 on average. In addition, the experiments demonstrated that our DSFN clearly outperformed other networks across multiple metrics, including the PSNR, the SSIM, and its impact on OCR performance.

  • Improving Perceptual Loss with CLIP for Super-Resolution

    Ohtani G., Kataoka H., Aoki Y.

    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering (Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering)  90 ( 2 ) 217 - 223 2024

    ISSN  09120289

     View Summary

    Perceptual loss, calculated by VGG network pre-trained on ImageNet, has been widely employed in the past for super-resolution tasks, enabling the generation of photo-realistic images. However, it has been reported that grid-like artifacts frequently appear in the generated images. To address this problem, we consider that large-scale pre-trained models can make significant contributions to super-resolution across different scenes. In particular, by combining language, those models can exhibit a strong capability to comprehend complex scenes, potentially enhancing super-resolution performance. Therefore, this paper proposes new perceptual loss with Contrastive Language-Image Pre-training (CLIP) based on Vision Transformer (ViT) instead of VGG network. The results demonstrate our proposed perceptual loss can generate photorealistic images without grid-like artifacts.

display all >>

Papers, etc., Registered in KOARA 【 Display / hide

Reviews, Commentaries, etc. 【 Display / hide

  • 密集領域での動作を理解するためのハイブリッド型映像解析

    大内一成,小林大祐,中州俊信,青木義満

    東芝レビュー (東芝)  72 ( 4 ) 30 - 34 2017.09

    Internal/External technical report, pre-print, etc., Joint Work

  • 画像センシング技術によるチームスポーツ映像からのプレー解析

    林 昌希,青木 義満

    映像情報メディア学会誌 (映像情報メディア学会)  70 ( 5 ) 710 - 714 2016.09

    Article, review, commentary, editorial, etc. (scientific journal), Joint Work

  • Image Sensing Technologies and its Applications for Human Action Recognition

    AOKI Yoshimitsu

    Journal of JSNDI (日本非破壊検査協会)  65 ( 6 ) 254 - 260 2016.06

    Article, review, commentary, editorial, etc. (scientific journal), Single Work

  • パターン計測技術の深化と広がる産業応用 -総論-

    AOKI Yoshimitsu

    計測と制御 (SICE)  53 ( 7 ) 555 - 556 2014.07

    Article, review, commentary, editorial, etc. (scientific journal), Single Work

Presentations 【 Display / hide

  • 不確実性を考慮したセマンティックマップの生成

    竹中悠,森巧磨,谷口恭弘,青木義満

    第27回 知能メカトロニクスワークショップ, 

    2022.09

    Oral presentation (general)

  • 自由な表現と被写体の質感を維持するメイク生成モデルの開発

    帯金駿, 田川晴菜, 中川雄介, 中村理恵, 青木義満

    第27回日本顔学会大会(フォーラム顔学2022), 

    2022.09

    Oral presentation (general)

  • 重要パッチ選択に基づく効率的動画認識

    鈴木 智之, 青木 義満

    第25回 画像の認識・理解シンポジウム(MIRU2022), 

    2022.07

    Poster presentation

  • 音響信号を用いた人物の3次元姿勢推定

    川島穣, 柴田優斗, 五十川麻理子, 入江豪, 木村昭悟, 青木義満

    第25回 画像の認識・理解シンポジウム(MIRU2022), 

    2022.07

    Oral presentation (general)

  • 完全合成画像での学習による文書画像の影除去

    松尾祐飛,青木義満

    第28回画像センシングシンポジウム(SSII2022), 

    2022.06

    Poster presentation

display all >>

Intellectual Property Rights, etc. 【 Display / hide

  • 画像処理装置,画像処理プログラムおよび画像処理方法

    Date applied: 2019-105297  2019.06 

    Joint

  • 危険度推定装置,危険度推定方法及び危険度推定用コンピュータプログラム

    Date applied: 特願2015-005241  2015.01 

    Date issued: 特許第6418574号  2018.10

    Patent, Joint

Awards 【 Display / hide

  • HCGシンポジウム2018 特集テーマセッション賞

    秋月 秀一(慶大)・大木 美加・バティスト ブロー・鈴木 健嗣(筑波大)・青木 義満(慶大), 2018.12, 電子情報通信学会ヒューマンコミュニケーショングループ, 床面プロジェクションに伴う動的な環境変化に対応する人物追跡技術

    Type of Award: Award from Japanese society, conference, symposium, etc.

  • HCGシンポジウム2018 優秀インタラクティブ発表賞

    秋月 秀一(慶大)・大木 美加・バティスト ブロー・鈴木 健嗣(筑波大)・青木 義満(慶大), 2018.12, 電子情報通信学会ヒューマンコミュニケーショングループ, 床面プロジェクションに伴う動的な環境変化に対応する人物追跡技術

    Type of Award: Award from Japanese society, conference, symposium, etc.

  • 精密工学会沼田記念論文賞

    加藤直樹,箱崎浩平,里雄二,古山純子,田靡雅基,青木ヨシミツ, 2018.03, 精密工学会, 畳み込みニューラルネットワークによる距離学習を用いた動画像人物再同定

    Type of Award: Award from Japanese society, conference, symposium, etc.

  • IWAIT2018 Best Paper Award

    Ryunosuke Kurose, Masaki Hayashi, Yoshimitsu Aoki, 2018.01, IWAIT2018

    Type of Award: International academic award (Japan or overseas)

  • IES-KCIC2017 Best Paper Award

    Siti Nor Khuzaimah Amit, Yoshimitsu Aoki, 2017.09, IEEE Indonesia Section, Disaster Detection from Aerial Imagery with Convolutional Neural Network

    Type of Award: International academic award (Japan or overseas)

display all >>

 

Courses Taught 【 Display / hide

  • SEMINOR IN ELECTRONICS AND INFOTMATION ENGINEERING(2)

    2024

  • RECITATION IN ELECTRONICS AND INFORMATION ENGINEERING

    2024

  • LABORATORIES IN ELECTRONICS AND INFORMATION ENGINEERING(2)

    2024

  • INDEPENDENT STUDY ON INTEGRATED DESIGN ENGINEERING

    2024

  • IMAGING SCIENCE AND TECHNOLOGY

    2024

display all >>

 

Social Activities 【 Display / hide

  • 画像情報教育振興協会

    2013.07
    -
    2015.03
  • 独立行政法人 交通安全環境研究所

    2009.12
    -
    2012.03

Memberships in Academic Societies 【 Display / hide

  • International Symposium on Optomechatronic Technologies 2013, 

    2013.04
    -
    2013.11
  • International Workshop on Advanced Image Technology 2013(IWAIT2013), 

    2013.01
    -
    2013.09
  • 11th International Conference on Quality Control by Artificial Vision(QCAV2013), 

    2012.12
    -
    2013.05
  • 3rd International Conference on 3D Body Scanning Technologies, 

    2012.06
    -
    2012.10
  • 計測自動制御学会パターン計測部会, 

    2012.04
    -
    Present

display all >>

Committee Experiences 【 Display / hide

  • 2017.04
    -
    Present

    NEDO技術委員, NEDO

  • 2016.07
    -
    2016.11

    Optics & Photonics Japan 2016 推進委員, 日本光学会

  • 2016.07
    -
    2016.12

    Program committee member, International Workshop on Human Tracking and Behavior Analysis 2016

  • 2015.09
    -
    2016.08

    第22回画像センシングシンポジウム 実行委員長, 画像センシング技術研究会

  • 2014.09
    -
    2015.08

    第21回画像センシングシンポジウム 実行委員長, 画像センシング技術研究会

display all >>