福島 幸宏 (フクシマ ユキヒロ)

Fukusima, Yukihiro

写真a

所属(所属キャンパス)

文学部 人文社会学科(図書館・情報系) (三田)

職名

准教授

 

論文 【 表示 / 非表示

  • Inconsistency-driven approach for human-in-the-loop entity matching

    Ito H., Koizumi T., Yoshimoto R., Fukushima Y., Harada T., Morishima A.

    Information Research 30 ( iConf (2025) ) 1024 - 1038 2025年

     概要を見る

    Introduction. Entity matching is a fundamental operation in a wide range of information management applications and a tremendous number of methods have been proposed to address the problem. Human-in-the-loop entity matching is a human-AI collaborative approach which is effective when the data for entity matching is incomplete or requires domain knowledge. A typical human-in-the-loop approach is to allow a machine-learning-based matcher to ask humans to match entities when it cannot match them with high confidence. However, ML-based matchers cannot avoid the unknown-unknown problem, i.e., they can resolve the entities incorrectly with high confidence. Method. This paper addresses an inconsistency-based method to deal with this problem. The method asks humans to resolve the entities when we find inconsistency in the transitivity property behind entity matching. For example, if a matcher returns a positive result only for two combinations among three entities, the result is inconsistent. Analysis. This paper shows an implementation of our idea in similarity-based blocking method and Bayesian inference and explains the result of an extensive set of experiments that reveals how and when the method is effective. Results. The result showed that the inconsistency-based sampling selects very different entity pairs compared to other sampling strategies and that a simple hybrid strategy performs well in many practical situations. Conclusion. The results indicate our approach complements any existing matcher that can cause the unknown-unknown problem in entity matching.

  • Sustainability of Digital Archives in Japan

    Harada T., Komura I., Fukusima Y.

    Proceedings of the 2023 Pacific Neighborhood Consortium Annual Conference and Joint Meetings Sea Change Renewal Reform and Resolve in Global Arts Sciences and Business Pnc 2023    54 - 60 2023年

     概要を見る

    Since the mid-1990s, numerous digital archives have been established in Japan. Regrettably, some have been discontinued because they became unmanageable. Moreover, budget and management system cuts have impacted the longevity of existing digital archives. This study aims to identify factors influencing the longevity of these archives by analyzing the current state of Japanese digital archives and conducting interviews with pertinent stakeholders. In our investigation into the survival of digital archives, we manually confirmed the continued web accessibility of 512 digital archives that were surveyed by the National Diet Library in 2009 and for which contact information was provided. In addition, we distributed surveys to 348 institutions responsible for maintaining digital archives. The results of our web publication status survey revealed that, of the 453 digital archives made publicly accessible online in 2009, 431 (or 95.1%) continued to be published in 2018. In addition, 38 (or 64.4%) of the 59 that were not initially made available in 2009 had begun to be published online. However, the survey responses painted a different picture. Among the 192 responding institutions, 151 (or 78.6%) reported that they continue to operate their archives, whereas 41 (or 21.4%) reported that they no longer do so. In light of these results, it is evident that a substantial number of digital archives that are no longer actively operating maintain a web presence. In addition, interviews with key figures engaged in Japanese digital archives were conducted. Results reveal a number of reasons for the discontinuation of digital archives, including budgetary and human resource concerns, and technical issues like copyright handling and adapting to changes in system and data formats. In addition to these factors related to the construction and operation of digital archives, the magnitude of the external response to digital archives was identified to significantly affect their continuity. Specifically, it was suggested that the discontinuation of digital archives that receive little response from within and outside the institution could pose significant continuity issues.

  • BUBBLE : A Quality-Aware Human-in-the-loop Entity Matching Framework

    Osawa N., Ito H., Fukushima Y., Harada T., Morishima A.

    Proceedings 2021 IEEE International Conference on Big Data Big Data 2021    3557 - 3565 2021年

     概要を見る

    Entity matching is an issue of interest in information integration and data cleaning. Since the representations of the same entity vary, it is often impossible to fully automate the entity matching and require human inputs. However, to guarantee high-quality entity matching, how to integrate human resources into the entity matching while minimizing the cost of human resources? In this paper, we propose BUBBLE, a novel human-in-the-loop entity matching framework hybridizing Bayesian inference and crowdsourcing. To guarantee entity matching quality, Bayesian inference is conducted to determine whether the matching requires crowdsourcing. We show that we can define Bayesian error rate for this problem. For optimization, we use metric learning to select the candidate matching pairs by nearest-neighbor search in the learned embedding space, and we construct a k-nearest neighbor graph to avoid the redundant matching. We applied BUBBLE to a bibliographic data matching problem on the National Diet Library. The experimental results show that BUBBLE can assign tasks to humans with higher quality results compared to those of the same number of task assignments to humans. The result also shows that our optimization scheme is effective without sacrificing the quality.

 

担当授業科目 【 表示 / 非表示

  • 図書館基礎Ⅱ

    2025年度

  • 情報学特殊研究Ⅷ

    2025年度

  • 情報学特殊研究Ⅶ

    2025年度

  • 図書館・情報学研究会Ⅱ

    2025年度

  • 図書館・情報学研究会Ⅰ

    2025年度

全件表示 >>