研究者詳細 - 駒水　孝裕

2026/02/19 更新

　基本情報

　研究活動

　教育活動

◆

担当経験のある科目 (本学)

　社会貢献

業績はありません

2026/02/19 更新

写真a

コマミズ　タカヒロ

駒水　孝裕

KOMAMIZU Takahiro

所属

数理・データ科学・人工知能教育研究センター基幹教育部門准教授

大学院担当

大学院情報学研究科

連絡先

メールアドレス

ホームページ

http://taka-coma.pro

学位 3

博士（工学）（ 2015年3月筑波大学）
修士（工学）（ 2011年3月筑波大学）
学士（情報工学）（ 2009年3月筑波大学）

学位の先頭へ▲

研究キーワード 6

情報検索
Linked Open Data
OLAP
データ工学
データベース
マルチメディア情報処理

研究キーワードの先頭へ▲

研究分野 3

情報通信 / 知能情報学
情報通信 / データベース
情報通信 / ウェブ情報学、サービス情報学

研究分野の先頭へ▲

経歴 5

名古屋大学数理・データ科学・人工知能教育研究センター准教授

2022年3月 - 現在
名古屋大学未来社会創造機構特任講師

2021年4月 - 2021年12月
名古屋大学情報基盤センター助教

2018年2月 - 2021年3月
筑波大学計算科学研究センター研究員

2015年4月 - 2018年1月

　詳細を見る

国名：日本国
筑波大学計算科学研究センター研究員

2015年4月 - 2018年1月

経歴の先頭へ▲

学歴 3

筑波大学システム情報工学研究科コンピュータサイエンス専攻

2011年4月 - 2015年3月

　詳細を見る

国名：日本国
筑波大学システム情報工学研究科コンピュータサイエンス専攻

2009年4月 - 2011年3月

　詳細を見る

国名：日本国
筑波大学第三学群情報学類

2005年4月 - 2009年3月

　詳細を見る

国名：日本国

学歴の先頭へ▲

所属学協会 8

電子情報通信学会正会員

2018年6月 - 現在
人工知能学会正会員

2018年4月 - 現在
言語処理学会正会員

2018年2月 - 現在
the American Association for Artificial Intelligence

2016年12月 - 2017年12月
Association for Computing Machinery Regular Member

2012年5月 - 現在
Institute of Electrical and Electronics Engineers

2012年3月 - 現在
情報処理学会正会員

2010年6月 - 現在
日本データベース学会正会員

2008年12月 - 現在

▼全件表示

所属学協会の先頭へ▲

委員歴 120

IEEE BigData 2026 PC member

2026年1月 - 2026年12月
IEEE MIPR 2026 Poster co-chair

2026年1月 - 2026年8月
ICMR 2026 PC member

2025年12月 - 2026年6月
SCSN@ICSC 2026 PC member

2025年11月 - 2026年2月
SoICT 2025 PC member

2025年9月 - 2025年12月
第18回データ工学と情報マネジメントに関するフォーラム (DEIM 2026) 実行委員会プログラム副委員長

2025年7月 - 2026年3月
電子情報通信学会データ工学研究専門委員会専門委員

2025年6月 - 2027年6月
東海関西データベースワークショップ 2025 プログラム委員

2025年6月 - 2025年9月
ICDM 2025 PC member

2025年5月 - 2025年12月
IEEE BigData 2025 PC member

2025年3月 - 2025年12月
iiWAS 2025 Organization co-chair

2025年2月 - 2025年12月
MUWS@ACMMM 2025 Organizing Committee member

2025年2月 - 2025年10月
IntentVC Organizing Committee member

2025年2月 - 2025年10月
KJDB 2025 Organizing Co-chair

2024年12月 - 2025年12月
ICDAR@ICMR 2025 Organizing Committee member

2024年11月 - 2025年7月
SCSN@ICSC 2025 PC member

2024年11月 - 2025年2月
SoICT 2024 PC member

2024年9月 - 2024年12月
東海関西データベースワークショップ 2024 プログラム委員

2024年9月
AMLDS 2025 Conference Co-chair

2024年7月 - 2025年7月
ExpertSUM@MMM 2025 Session Co-chair

2024年3月 - 2025年1月
IEEE BigData 2024 PC member

2024年2月 - 2024年12月
DEXA 2024 PC member

2024年2月 - 2024年8月
MMM 2025 Web Co-chair

2024年1月 - 2025年1月
ICDAR@ICMR 2024 Organizing Committee member

2023年12月 - 2024年6月
ICMR 2024 PC member

2023年11月 - 2024年6月
IW-FCV 2024 TPC member

2023年11月 - 2024年2月
第23回情報科学技術フォーラム (FIT 2024) 研究会担当委員 (ISS-DE)

2023年10月 - 2024年9月
SCSN@ICSC 2024 PC member

2023年10月 - 2024年2月
SoICT 2023 PC member

2023年9月 - 2023年12月
東海関西データベースワークショップ 2023 プログラム委員会プログラム委員

2023年9月
東海関西データベースワークショップ 2023 プログラム委員

2023年9月
電子情報通信学会データ工学研究専門委員会幹事補佐

2023年6月 - 2025年6月
情報処理学会論文誌ジャーナル/JIP編集委員会編集委員

2023年4月 - 2027年5月
情報処理学会論文誌データベース編集委員会編集委員

2023年4月 - 2027年3月
情報処理学会データベースシステム研究運営委員会専門委員

2023年4月 - 2025年3月
IEEE BigData 2023 PC member

2023年4月 - 2023年12月
数理・データサイエンス・AI教育強化拠点コンソーシアム教材分科会委員

2023年4月
NarSUM@ACMMM 2023 Program Co-chiar

2023年3月 - 2023年11月
IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI 2023) Workshop Co-Chair

2023年1月 - 2023年8月
IRI 2023 Workshop Co-Chair

2023年1月 - 2023年8月
11th IEEE International Workshop on Semantic Computing for Social Networks and Organization Sciences (SCSN@ICSC 2023) PC member

2022年9月 - 2023年2月
SCSN@ICSC 2023 PC member

2022年9月 - 2023年2月
東海関西データベースワークショップ 2022 プログラム委員会プログラム委員

2022年9月
東海関西データベースワークショップ 2022 プログラム委員

2022年9月
第15回データ工学と情報マネジメントに関するフォーラム (DEIM 2023) 実行委員会ローカル共同委員長

2022年4月 - 2023年3月
数理・データサイエンス・AI教育強化拠点コンソーシアム調査研究分科会委員

2022年4月 - 2023年3月
数理・データサイエンス・AI教育強化拠点コンソーシアム調査研究分科会委員

2022年4月 - 2023年3月
第15回データ工学と情報マネジメントに関するフォーラム (DEIM 2023) 実行委員会ローカル共同委員長

2022年4月 - 2023年3月
1st Workshop on User-Centric Narrative Summarization of Long Videos (NarSUM@ACM MM 2022) Web and SNS Chair

2022年4月 - 2022年10月
NarSUM@ACMMM 2022 Web and SNS Chair PC member

2022年4月 - 2022年10月
日本データベース学会広報委員会幹事

2022年4月
DASFAA 2024 Local Arrangement Committee member

2022年3月 - 2024年7月
日本データベース学会システム委員会構成メンバー

2022年3月 - 2022年9月
日本データベース学会システム委員会構成メンバー

2022年3月 - 2022年9月
10th IEEE International Workshop on Semantic Computing for Social Networks and Organization Sciences (SCSN@ICSC 2022) PC member

2021年11月 - 2022年1月
SCSN@ICSC 2022 PC member

2021年11月 - 2022年1月
東海関西データベースワークショップ 2021 プログラム委員会．プログラム委員

2021年9月
東海関西データベースワークショップ 2021 プログラム委員

2021年9月
11th International Symposium on Information and Communication Technology (SoICT 2022) PC member

2021年7月 - 2022年12月
SoICT 2022 PC member

2021年7月 - 2022年12月
IEICE Transactions on Information and Systems Associate Editor

2021年4月 - 2026年7月
第14回データ工学と情報マネジメントに関するフォーラム (DEIM 2022) 実行委員会ローカル共同委員長

2021年4月 - 2022年3月
第14回データ工学と情報マネジメントに関するフォーラム (DEIM 2022) 実行委員会ローカル共同委員長

2021年4月 - 2022年3月
TMI Educational Video Competition Organizing Co-Chair

2021年4月 - 2021年9月
TMI Educational Video Competition in collaboration with IV21 Organizing Co-Chair

2021年4月 - 2021年8月
the 9th International Workshop on Semantic Computing for Social Networks (SCSN 2021) Program committee

2020年11月 - 2021年1月
9th International Workshop on Semantic Computing for Social Networks (SCSN@ICSC 2021) PC member

2020年11月 - 2021年1月
SCSN@ICSC 2021 PC member

2020年11月 - 2021年1月
第18回情報学ワークショッププログラム委員

2020年9月 - 2020年11月
第18回情報学ワークショップ (WiNF 2020) プログラム委員会プログラム委員

2020年9月 - 2020年11月
第18回情報学ワークショップ (WiNF 2020) プログラム委員

2020年9月 - 2020年11月
he 4th International Conference on Multimedia Information Processing and Retrieval (MIPR 2021) Web and SNS Co-Chair

2020年6月 - 現在
4th International Conference on Multimedia Information Processing and Retrieval (MIPR 2021) Web and SNS Co-Chair

2020年6月 - 2021年12月
MIPR 2021 Web and SNS Co-Chair

2020年6月 - 2021年12月
the r3rdInternational Workshop on EntitY REtrieval (EYRE@CIKM2020) Program committee

2020年4月 - 2020年10月
3rd International Workshop on EntitY REtrieval (EYRE@CIKM2020) PC member

2020年4月 - 2020年10月
EYRE@CIKM 2020 PC member

2020年4月 - 2020年10月
PC member

2019年9月 - 2020年2月
8th International Workshop on Semantic Computing for Social Networks (SCSN@ICSC 2020) PC member

2019年9月 - 2020年2月
SCSN@ICSC 2020 PC member

2019年9月 - 2020年2月
第17回情報学ワークショッププログラム委員会プログラム委員

2019年7月 - 2019年11月
第17回情報学ワークショップ (WiNF 2019) プログラム委員会プログラム委員

2019年7月 - 2019年11月
第17回情報学ワークショップ (WiNF 2019) プログラム委員

2019年7月 - 2019年11月
電子情報通信学会データ工学研究専門委員会専門委員

2019年6月 - 現在
電子情報通信学会データ工学研究専門委員会専門委員

2019年6月 - 2023年6月
電子情報通信学会データ工学研究専門委員会専門委員

2019年6月 - 2023年6月
PC member

2019年4月 - 2019年11月
2nd International Workshop on EntitY REtrieval (EYRE@CIKM2019) PC member

2019年4月 - 2019年11月
EYRE@CIKM 2019 PC member

2019年4月 - 2019年11月
第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020) 実行委員会幹事（Web・出版担当）

2019年3月 - 現在
第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020) 実行委員会幹事（Web・出版担当）

2019年3月 - 2020年5月
第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020) 実行委員会幹事（Web・出版担当）

2019年3月 - 2020年5月
PC member

2018年12月 - 2019年2月
7th International Workshop on Semantic Computing for Social Networks (SCSN@ICSC 2019) PC member

2018年12月 - 2019年2月
SCSN@ICSC 2019 PC member

2018年12月 - 2019年2月
PC member

2018年10月 - 2019年7月
第16回情報学ワークショップ実行委員会現地実行委員

2018年9月 - 2918年11月
第11回Webとデータベースに関するフォーラム (WebDB Forum 2018) 学生奨励賞評価委員会委員

2018年9月
第16回情報学ワークショップ (WiNF 2018) 実行委員会委員（会計担当）

2018年7月 - 2019年3月
第16回情報学ワークショップ (WiNF 2018) 実行委員会委員（会計担当）

2018年7月 - 2019年3月
言語処理学会第25回年次大会実行委員会実行委員

2018年6月 - 2019年3月
言語処理学会第25回年次大会 (NLP 2019) 実行委員会委員

2018年6月 - 2019年3月
言語処理学会第25回年次大会 (NLP 2019) 実行委員会委員

2018年6月 - 2019年3月
第11回 Webとデータベースに関するフォーラム実行委員会出版・印刷担当幹事

2018年5月 - 2018年9月
第11回Webとデータベースに関するフォーラム (WebDB Forum 2018) 実行委員会出版・印刷担当幹事

2018年5月 - 2018年9月
第11回Webとデータベースに関するフォーラム (WebDB Forum 2018) 出版・印刷担当幹事委員

2018年5月 - 2018年9月
第11回データ工学と情報マネジメントに関するフォーラム (DEIM 2019) 実行委員会 Web・出版委員長

2018年3月 - 2019年5月
第11回データ工学と情報マネジメントに関するフォーラム (DEIM 2019) 実行委員会 Web・出版委員長

2018年3月 - 2019年5月
第11回データ工学と情報マネジメントに関するフォーラム (DEIM 2019) 実行委員会 Web・出版委員長

2018年3月 - 2019年3月
ICSC 2018 PC member

2017年12月 - 2018年2月
第10回 Webとデータベースに関するフォーラム実行委員会 Web担当幹事

2017年9月
第10回Webとデータベースに関するフォーラム (WebDB Forum 2017) 実行委員会 Web担当幹事

2017年8月 - 2017年9月
第10回Webとデータベースに関するフォーラム (WebDB Forum 2017) 実行委員会 Web担当幹事

2017年8月 - 2017年9月
第9回 Webとデータベースに関するフォーラム学生奨励賞評価委員会学生奨励賞評価委員

2016年9月
第9回Webとデータベースに関するフォーラム (WebDB Forum 2016) 学生奨励賞評価委員会委員

2016年9月
DBSJ電子広報編集委員会編集委員

2015年8月 - 現在
日本データベース学会電子広報編集委員会編集委員

2015年7月 - 2022年3月
日本データベース学会電子広報編集委員会編集委員

2015年7月 - 2022年3月
第13回情報科学技術フォーラム (FIT 2014) 学生スタッフリーダー

2014年9月
DASFAA 2010 Student leading staff

2010年2月 - 2010年4月

▼全件表示

委員歴の先頭へ▲

受賞 31

Best Oral Award

2025年12月 MMAsia 2025 Q-Adapter: Visual Query Adapter for Extracting Textually-related Features in Video Captioning

Junan Chen, Trung Thanh Nguyen, Takahiro Komamizu, Ichiro Ide

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：マレーシア
Best Paper Award

2024年8月 DEXA 2024 R-DiP: Re-ranking Based Diffusion Pre-computation for Image Retrieval

Tatsuya Kato, Takahiro Komamizu, Ichiro IDE

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：イタリア共和国
Best Paper Award

2023年8月 DEXA 2023 Towards Ensemble-Based Imbalanced Text Classification Using Metric Learning

Takahiro Komamizu

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：マレーシア
Best Student Paper Award

2025年5月 FG 2025 MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion

Trung Thanh NGUYEN, Yasutomo Kawanishi, Vijay John, Takahiro Komamizu, Ichiro Ide

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：アメリカ合衆国
Best Paper Nomination

2024年10月 ACM MM 2024 Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation

Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Takatsugu Hirayama, Ichiro Ide

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：オーストラリア連邦
MIRU インタラクティブ発表賞

2023年7月 MIRU 2023 類音語の連想性を考慮した未知語の発音に対する画像生成

松平茅隼, カストナーマークアウレル, 駒水孝裕, 平山高嗣, 道満恵介, 川西康友, 井手一郎

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
学生プレゼンテーション賞

2023年3月 DEIM 2023 固有表現タグおよびPOSタグによる交換制約付きデータ拡張手法

寺本優香, 駒水孝裕, 波多野賢治

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
Best Paper Runner-up

2022年12月 The 24th International Conference on Asia-Pacific Digital Libraries (ICADL 2022) Towards Efficient Data Access Through Multiple Relationship in Graph-Structured Digital Archives

Kazuma Kusu, Takahiro Komamizu, Kenji Hatano

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：ベトナム社会主義共和国
Best Paper Runner-up

2022年12月 ICADL 2022 Towards Efficient Data Access Through Multiple Relationship in Graph-Structured Digital Archives

Kazuma Kusu, Takahiro Komamizu, Kenji Hatano

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：ベトナム社会主義共和国
人工知能学会研究会優秀賞

2021年6月人工知能学会法令沿革LOD構築のためのDBpediaにおける法令エンティティの同定

駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
人工知能学会研究会優秀賞

2021年6月第51回SWO研究会法令沿革LOD構築のためのDBpediaにおける法令エンティティの同定

駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
最優秀賞

2020年11月第18回情報学ワークショップ利用規約中の不公平文検出における不均衡データ分類に対する EasyEnsemble の利用

近藤匠, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞
最優秀賞

2020年11月第18回情報学ワークショップ利用規約中の不公平文検出における不均衡データ分類に対する EasyEnsemble の利用

近藤匠, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
オンラインプレゼンテーション賞

2020年3月第12回データ工学と情報マネジメントに関するフォーラム不均衡データ分類フレームワークにおけるサンプリング比率の最適化

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞
オンラインプレゼンテーション賞

2020年3月 DEIM 2020 不均衡データ分類フレームワークにおけるサンプリング比率の最適化

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
FUJITSU賞

2019年9月第12回Webとデータベースに関するフォーラム弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞
マイクロアド賞

2019年9月第12回Webとデータベースに関するフォーラム弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞
株式会社FRONTEO賞

2019年9月第12回Webとデータベースに関するフォーラム弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞
株式会社FRONTEO賞

2019年9月 WebDB Forum 2019 弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
マイクロアド賞

2019年9月 WebDB Forum 2019 弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
FUJITSU賞

2019年9月 WebDB Forum 2019 弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
JURIX 2018 Best paper award

2018年12月 Japanese Legal Term Correction using Random Forests

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：オランダ王国
Best Paper Award

2018年12月 JURIX 2018 Japanese Legal Term Correction using Random Forests

Takahiro Yamakoshi, Takahiro Komamizu, Yasuhiro Ogawa, Katsuhiko Toyama

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：オランダ王国
優秀インタラクティブ賞

2018年3月第10回データ工学と情報マネジメントに関するフォーラムノードがテキスト情報を持つ動的ネットワークにおけるノードと単語の分散表現学習

伊藤寛祥, 駒水孝裕, 天笠俊之, 北川博之

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞
優秀インタラクティブ賞

2018年3月 DEIM 2018 ノードがテキスト情報を持つ動的ネットワークにおけるノードと単語の分散表現学習

伊藤寛祥, 駒水孝裕, 天笠俊之, 北川博之

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
学生プレゼンテーション賞

2017年3月 DEIM 2017 ノードが複数の属性を持つグラフにおけるコミュニティ検出

伊藤寛祥, 駒水孝裕, 天笠俊之, 北川博之

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
学生奨励賞

2017年3月情報処理学会全国大会 2017 GitHubとStack Overflowにおけるユーザ行動の統一的な分析

永野真知, 早瀬康裕, 駒水孝裕, 北川博之

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
iiWAS 2015 Best paper award

2015年12月 the 17th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2015)

　詳細を見る

受賞区分：国際学会・会議・シンポジウム等の賞受賞国：ベルギー王国
情報処理学会第73回全国大会学生奨励賞

2011年3月情報処理学会

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
山下記念研究賞

2011年3月情報処理学会

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国
第2回データ工学と情報マネジメントに関するフォーラム 2010 学生奨励賞

2010年3月第2回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

受賞区分：国内学会・会議・シンポジウム等の賞受賞国：日本国

▼全件表示

受賞の先頭へ▲

論文 115

Towards Ensemble-Based Imbalanced Text Classification Using Metric Learning

Komamizu, T

DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2023, PT II 14147 巻頁： 188 - 202 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

This paper reports s series of ensemble approaches for imbalance text classification. All the approaches utilize a metric learning technique for obtaining better representations of texts to train weak classifiers. Each approach deals with the class imbalance problem with an undersampling-based ensemble approach, because metric learning techniques also suffer from this problem. In this paper, four ensemble approaches (namely, MLBagging, MLBoosting, MLStacking, and MLBoostacking) are proposed, three of which are corresponding to ensemble frameworks (namely, bagging, boosting, and stacking), and the other is a combination of boosting and stacking. MLBagging, MLBoosting, and MLStacking train metric learners on the individual undersampled dataset and combine them, while MLBoostacking trains metric learners in a step-by-step manner; that is, a metric learner learns a feature transformation so that failed-to-classify samples in the previous step should be correctly classified. The experimental evaluation on three imbalanced text classification datasets (namely, unfair statement classification in terms of service, hate speech detection in a forum, and hate speech tweet detection) shows that the proposed approaches lift classification performance from BERT-based approaches, by improving the representations of texts through metric learning.

DOI： 10.1007/978-3-031-39821-6_15

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/dexa/dexa2023-2.html#Komamizu23
MMEnsemble: Imbalanced Classification Framework Using Metric Learning and Multi-sampling Ratio Ensemble

Komamizu, T

DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2021, PT II 12924 巻頁： 176 - 188 2021年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

In classification, class imbalance is a factor that degrades the classification performance of many classification methods. Resampling is one widely accepted approach to the class imbalance; however, it still suffers from an insufficient data space, which also degrades performance. To overcome this, in this paper, an undersampling-based imbalanced classification framework, MMEnsemble, is proposed that incorporates metric learning into a multi-ratio undersampling-based ensemble. This framework also overcomes a problem with determining the appropriate sampling ratio in the multi-ratio ensemble method. It was evaluated by using 12 real-world datasets. It outperformed the state-of-the-art approaches of metric learning, undersampling, and oversampling in recall and ROC-AUC, and it performed comparably with them in terms of Gmean and F-measure metrics.

DOI： 10.1007/978-3-030-86475-0_18

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/dexa/dexa2021-2.html#Komamizu21
Analysis and Prediction of Attractive Fonts on Title-Overlaid Food Images Open Access

Takagi, N; Kyutoku, NH; Doman, K; Komamizu, T; Ide, I

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E109D 巻 ( 2 ) 頁： 284 - 287 2026年2月

　詳細を見る

記述言語：英語出版者・発行元：IEICE Transactions on Information and Systems

Title-overlaid images are useful as thumbnails on social media, where users prefer concise information to share and watch contents. Focusing on food contents, we aim to support creation of attractive title-overlaid food images to attract viewers’ attentions. This paper first analyzes the effect of font styles of the title on the attractiveness of title-overlaid images via preference experiments, and creates a dataset. Next, we design a prototype model of attractive font selection for a food image and its title. Its effectiveness is demonstrated through experiments on the created dataset.

DOI： 10.1587/transinf.2025DVL0007

Open Access

Web of Science

Scopus

CiNii Research
Hierarchical Global-Local Fusion for One-stage Open-vocabulary Temporal Action Detection Open Access

Nguyen T.T., Kawanishi Y., Komamizu T., Ide I.

ACM Transactions on Multimedia Computing Communications and Applications 22 巻 ( 1 ) 2026年1月

　詳細を見る

出版者・発行元：ACM Transactions on Multimedia Computing Communications and Applications

Open-vocabulary Temporal Action Detection (Open-vocab TAD) extends the detection scope of Closed-vocabulary Temporal Action Detection (Closed-vocab TAD) to unseen action classes specified by vocabularies not included in the training data, within untrimmed video. Typical Open-vocab TAD methods adopt a two-stage approach that first proposes candidate action intervals and then identifies those actions. However, errors in the first stage can affect the subsequent stage and the final detection results. Moreover, conventional methods for temporal context analyses tend to focus solely on either global or local context. Focusing solely on the global context can lead to lack of momentary detail, making it difficult to distinguish one action from another. Conversely, focusing only on the local context makes it challenging to determine the start and end timings of action intervals. To address these challenges, we introduce a one-stage approach named Hierarchical Open-vocab TAD (HOTAD), consisting of two branches: Temporal Context Analysis (TCA) and Video-Text Alignment (VTA). The former utilizes Hierarchical Encoder (HE) to fuse global and local temporal features, enabling a comprehensive capture of temporal actions, while the latter branch exploits the synergy between visual and textual modalities for precisely detecting unseen actions in the Open-vocab setting. Experiments and in-depth analysis using the widely recognized datasets THUMOS14 and ActivityNet-1.3 are performed to show the effectiveness of HOTAD. The results highlight remarkable accuracy in detecting a wide range of unseen actions. Furthermore, HOTAD significantly reduces wrong labels and localizes action instances with high precision, showcasing its robustness in complex and dynamic video settings.

DOI： 10.1145/3773986

Open Access

Scopus
Lip Shape-Aware Word Selection for Lyric Translation

Ikeda K., Matsuhira C., Kato H., Kastner M.A., Hirayama T., Komamizu T., Ide I.

Lecture Notes in Computer Science 16175 LNCS 巻頁： 48 - 62 2026年

　詳細を見る

出版者・発行元：Lecture Notes in Computer Science

The pronunciation of a language and corresponding lip shapes have a strong relationship. As such, technologies that align the lip movements of a sentence in the translated language with that in the original language can be useful for scenarios such as movie dubbing, particularly in singing scenes, where consistency between a character’s speech and their visual lip movements is desirable. In this paper, we define lip shape similarity based on the International Phonetic Alphabet (IPA) chart which is a knowledge on phonetics, and integrate it into a word selection algorithm for general machine translation. We propose an automatic lyric translation method that balances the semantics and the lip shape similarity when translating the source lyrics into a target language. For quantitative evaluation, by using professionally translated lyrics as a reference, we optimize the proposed method to best preserve both semantics and lip shape similarity. Experimental results demonstrate that the generated translations yield higher lip shape similarity than that by baseline translations.

DOI： 10.1007/978-981-95-4398-4_4

Scopus
Origami Crease Recognition for Automatic Folding Diagrams Generation

Kato H., Kato H., Hirayama T., Komamizu T., Ide I.

Lecture Notes in Computer Science 16175 LNCS 巻頁： 16 - 31 2026年

　詳細を見る

出版者・発行元：Lecture Notes in Computer Science

Origami is a recreational activity that involves folding a paper to create shapes imitating various objects. To facilitate the sharing and instruction of origami creations, folding diagrams and instructional videos are widely employed to illustrate the folding procedure. However, since creating folding diagrams requires significant time and effort, we are attempting to automatically generate them from instructional videos. In this paper, we first decompose the generation process into a series of sub-problems. Then, we focus on one of the sub-problems, the crease estimation, which is the task to estimate a folding line (i.e., crease) from a pair of images taken before and after a folding operation. This paper is the first attempt to address this task using a learning-based methodology. Evaluation on a hand-crafted dataset showed that a model using a five-layer CNN as the backbone achieved the best performance, with a distance displacement of approximately 6% of the image’s side length and an angle displacement of approximately 14<sup>∘</sup>. These results suggest the potential feasibility of constructing a learning-based crease estimator.

DOI： 10.1007/978-981-95-4398-4_2

Scopus
Q-Adapter: Visual Query Adapter for Extracting Textually-related Features in Video Captioning Open Access

Chen J., Nguyen T.T., Komamizu T., Ide I.

Proceedings of the 7th ACM International Conference on Multimedia in Asia Mmasia 2025 2025年12月

　詳細を見る

出版者・発行元：Proceedings of the 7th ACM International Conference on Multimedia in Asia Mmasia 2025

Recent advances in video captioning are driven by large-scale pretrained models, which follow the standard “pre-training followed by fine-tuning” paradigm, where the full model is fine-tuned for downstream tasks. Although effective, this approach becomes computationally prohibitive as the model size increases. The Parameter-Efficient Fine-Tuning (PEFT) approach offers a promising alternative, but primarily focuses on the language components of Multimodal Large Language Models (MLLMs). Despite recent progress, PEFT remains underexplored in multimodal tasks and lacks sufficient understanding of visual information during fine-tuning the model. To bridge this gap, we propose Query-Adapter (Q-Adapter), a lightweight visual adapter module designed to enhance MLLMs by enabling efficient fine-tuning for the video captioning task. Q-Adapter introduces learnable query tokens and a gating layer into Vision Encoder, enabling effective extraction of sparse, caption-relevant features without relying on external textual supervision. We evaluate Q-Adapter on two well-known video captioning datasets, MSR-VTT and MSVD, where it achieves state-of-the-art performance among the methods that take the PEFT approach across BLEU@4, METEOR, ROUGE-L, and CIDEr metrics. Q-Adapter also achieves competitive performance compared to methods that take the full fine-tuning approach while requiring only 1.4% of the parameters. We further analyze the impact of key hyperparameters and design choices on fine-tuning effectiveness, providing insights into optimization strategies for adapter-based learning. These results highlight the strong potential of Q-Adapter in balancing caption quality and parameter efficiency, demonstrating its scalability for video-language modeling.

DOI： 10.1145/3743093.3770950

Scopus
IntentVC 2025: The ACM Multimedia Grand Challenge on Intention-Oriented Controllable Video Captioning

Komamizu T., Kastner M.A., Kawanishi Y., Nguyen T.T., Chen J.

Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025 頁： 13813 - 13814 2025年10月

　詳細を見る

出版者・発行元：Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025

The IntentVC Challenge, held in conjunction with ACM Multimedia 2025, introduces a novel benchmark for intention-oriented controllable video captioning. Unlike conventional captioning methods that generate generic, scene-level summaries, IntentVC focuses on intention-specific generation. Participants are required to produce captions explicitly conditioned on user-defined intentions, such as emphasizing a specific object tracked within a video. To support this task, the challenge provides an extended version of the LaSOT dataset annotated with intention-focused captions across 70 object categories. A standardized evaluation protocol and public leaderboard enable fair and reproducible comparison among submitted methods. By advancing research in personalized and adaptive video understanding, IntentVC offers a platform for exploring controllable vision-language modeling with practical relevance for accessibility, retrieval, and human-AI interaction. As a result, a total of 23 teams and 58 active participants have participated, and a total of 1,443 entries have been submitted. More information and resources are available at https://sites.google.com/view/intentvc/.

DOI： 10.1145/3746027.3762057

Scopus
MUWS 2025: The 4th International Workshop on Multimodal Human Understanding for the Web and Social Media

Hakimov S., Semedo D., Müller-Budack E., Kastner M.A., Komamizu T.

Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025 頁： 14308 - 14310 2025年10月

　詳細を見る

出版者・発行元：Mm 2025 Proceedings of the 33rd ACM International Conference on Multimedia Co Located with mm 2025

Multimodal human understanding is an evolving interdisciplinary field integrating computer science, psychology, and social sciences to model human perception, behaviour, and biases in multimodal data. While recent advancements in multimodal learning excel in tasks like image-text synthesis, they often overlook nuanced human-centric dynamics - -such as cultural, political, and individual influences on how modalities (e.g., text and images) interact, complement, or contradict each other. The 4th International Workshop on Multimodal Human Understanding (MUWS) aims at addressing these challenges, fostering novel solutions that explicitly model human perception, behaviour, and biases in multimodal data, with a particular emphasis on real-world challenges in web and social media analysis. This year edition covers two tracks: (1) human-centred multimodal understanding, such as quantifying social biases, analysing sentiment and hate speech, and modelling cross-modal interactions through interdisciplinary theories (e.g., semiotics, gestalt psychology); and (2) Multimodal understanding of global events, supported by a newly curated dataset covering news articles with diverse stances, which facilitates research on cultural framing, societal impact, and bias mitigation in vision-language models. The event features two keynotes from renowned experts from journalism and computer science, research presentations for six accepted papers, and interactive discussions to explore and discuss cutting-edge methodologies and applications in multimodal human understanding. The workshop proceedings can be found at: https://dl.acm.org/doi/proceedings/10.1145/3728481

DOI： 10.1145/3746027.3762109

Scopus
隣接リストを用いた反復グラフ走査のための到達点索引構築手法

楠和馬, 駒水孝裕, 波多野賢治

同志社大学ハリス理化学研究報告 66 巻 ( 3 ) 頁： 183 - 195 2025年10月

　詳細を見る

記述言語：英語出版者・発行元：同志社大学ハリス理化学研究所

グラフデータベース管理システム(GDBMS)はグラフ専門のデータベースであるが，高次ノード(HDN)を経由してエッジを繰り返し辿ることにより，計算コストが著しく高くなる．本研究では，HDNを区別する新しいグラフ索引を提案し，これらの特定のグラフ走査を最適化する．また，索引を効率的に構築するために，再帰的スキャン操作を構築効率化の方法として提案する．本研究における評価実験は大きな結果を示した．繰返し経路の索引により巨大なグラフに対する走査性能が最大1.176倍向上した．さらに，再帰的スキャン操作は，ベースライン手法と比較して索引構築に要する時間を最大64.6%短縮した．

DOI： 10.14988/0002001369

CiNii Research
Multi-proposal collaboration and multi-task training for weakly-supervised video moment retrieval

Zhang, BL; Yang, C; Jiang, B; Komamizu, T; Ide, I

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 16 巻 ( 7-8 ) 頁： 4509 - 4524 2025年8月

　詳細を見る

出版者・発行元：International Journal of Machine Learning and Cybernetics

This study focuses on weakly-supervised Video Moment Retrieval (VMR), aiming to identify a moment semantically similar to the given query within an untrimmed video using only video-level correspondences, without relying on temporal annotations during training. Previous methods either aggregate predictions for all instances in the video, or indirectly address the task by proposing reconstructions for the query. However, these methods often produce low-quality temporal proposals, struggle with distinguishing misaligned moments in the same video, or lack stability due to a reliance on a single auxiliary task. To address these limitations, we present a novel weakly-supervised method called Multi-proposal Collaboration and Multi-task Training (MCMT). Initially, we generate multiple proposals and derive corresponding learnable Gaussian masks from them. These masks are then combined to create a high-quality positive sample mask, highlighting video clips most relevant to the query. Concurrently, we classify other clips in the same video as the easy negative sample and the entire video as the hard negative sample. During training, we introduce forward and inverse masked query reconstruction tasks to impose more substantial constraints on the network, promoting more robust and stable retrieval performance. Extensive experiments on two standard benchmarks affirm the effectiveness of the proposed method in VMR.

DOI： 10.1007/s13042-024-02520-w

Web of Science

Scopus
Analysis and Prediction of Attractive Fonts on Title-overlaid Food Images

Takagi Nanami, Kyutoku Haruya, Doman Keisuke, Komamizu Takahiro, Ide Ichiro

IEICE Proceeding Series 93 巻頁： O3-1-3 2025年7月

　詳細を見る

記述言語：英語出版者・発行元：The Institute of Electronics, Information and Communication Engineers

Title-overlaid images are useful as thumbnails on social media, where users prefer concise information to share and watch contents. Focusing on food contents, we aim to support creation of attractive title-overlaid food images to attract viewers' attentions. This paper first analyzes the effect of font styles of the title on the attractiveness of title-overlaid images via preference experiments, and creates a dataset. Next, we propose an attractive font selection model for a food image and its title. Its effectiveness is demonstrated through experiments on the created dataset.

DOI： 10.34385/proc.93.o3-1-3

CiNii Research
ICDAR 25: Intelligent Cross-Data Analysis and Retrieval

Komamizu T., Kastner M.A., Dao M.S., Riegler M.A., Dang-Nguyen D.T., Tran S.

Icmr 2025 Proceedings of the 2025 International Conference on Multimedia Retrieval 頁： 2145 - 2147 2025年6月

　詳細を見る

出版者・発行元：Icmr 2025 Proceedings of the 2025 International Conference on Multimedia Retrieval

The sixth edition of the Intelligent Cross-Data Analysis and Retrieval (ICDAR) workshop continues to serve as a forum for researchers and practitioners addressing the integration, analysis, and retrieval of heterogeneous data sources. While individual modalities such as wearable sensors, lifelogging cameras, and social media have been well studied, analyzing cross-data that incorporates multiple perspectives remains a crucial yet challenging task for advancing human-centered applications. In 2025, the workshop received 19 submissions, of which 7 were accepted following a careful peer-review process, resulting in an acceptance rate of 37%. The accepted papers covered a wide range of topics, including zero-shot composed image retrieval, vision-language scene understanding, adaptive modality fusion, lightweight fine-tuning with truncated SVD, and real-world federated split learning on mobile devices. By fostering interdisciplinary collaboration across domains such as well-being, disaster mitigation, mobility, food computing, and smart cities, the workshop continues to highlight emerging challenges and solutions for building intelligent, sustainable, and human-centric systems driven by cross-modal and multimodal data analytics.

DOI： 10.1145/3731715.3734511

Scopus
Icdar 25workshop chairs welcome message

Komamizu T., Kastner M.A., Dao M.S., Riegler M.A., Dang-Nguyen D.T., Tran S.

ICDAR 2025 Proceedings of the 6th ACM International Conference on Intelligent Cross Data Analysis and Retrieval 2025年6月

　詳細を見る

出版者・発行元：ICDAR 2025 Proceedings of the 6th ACM International Conference on Intelligent Cross Data Analysis and Retrieval

Scopus
Feature Extraction for Claim Check-Worthiness Prediction Tasks Using LLM

Teramoto, Y; Komamizu, T; Matsushita, M; Hatano, K

INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2024, PT I 15342 巻頁： 53 - 58 2025年

　詳細を見る

掲載種別：論文集(書籍)内論文出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

This study explores the use of Large Language Models (LLMs) for Claim Check-Worthiness Prediction (CCWP), a crucial pre-screening task in fact-checking. We predict the time between a claim’s occurrence and verification by analyzing data from fact-checking organizations. The results show that validation time is the same between the top 25% and bottom 75% of total checklist condition fulfillment claims. That is, further optimization is needed for LLMs to perform effective CCWPs.

DOI： 10.1007/978-3-031-78090-5_5

Web of Science

Scopus
Towards Visual Storytelling by Understanding Narrative Context Through Scene-Graphs

Phueaksri, I; Kastner, MA; Kawanishi, Y; Komamizu, T; Ide, I

MULTIMEDIA MODELING, MMM 2025, PT IV 15523 巻頁： 226 - 239 2025年

　詳細を見る

出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

VIsual STorytelling (VIST) is a task that transforms a sequence of images into narrative text stories. A narrative story requires an understanding of the contexts and relationships among images. Our study introduces a story generation process that emphasizes creating a coherent narrative by constructing both image and narrative contexts to control the coherence. First, the image contexts are generated from the content of individual images, using image features and scene graphs that detail the elements of the images. Second, the narrative context is generated by focusing on the overall image sequence. Ensuring that each caption fits within the overall story maintaining continuity and coherence. We also introduce a narrative concept summary, which is external knowledge represented as a knowledge graph. This summary encapsulates the narrative concept of an image sequence to enhance the understanding of its overall content. Following this, both image and narrative contexts are used to generate a coherent and engaging narrative. This framework is based on Long Short-Term Memory (LSTM) with an attention mechanism. We evaluate the proposed method using the VIST dataset, and the results highlight the importance of understanding the context of an image sequence in generating coherent and engaging stories. The study demonstrates the significance of incorporating narrative context into the generation process to ensure the coherence of the generated narrative.

DOI： 10.1007/978-981-96-2071-5_17

Web of Science

Scopus
Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models

Matsuhira, C; Kastner, MA; Komamizu, T; Hirayama, T; Ide, I

MULTIMEDIA MODELING, MMM 2025, PT IV 15523 巻頁： 428 - 441 2025年

　詳細を見る

出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Quantifying the associations between images and adjectives, i.e., how much the visual characteristics of an image are connected with a certain adjective, is important for better image understanding. For instance, the appearance of a kitten can be associated with adjectives such as “soft”, “small”, and “cute” rather than the opposite “hard”, “large”, and “scary”. Thus, giving scores for a kitten photo considering the degree of its association with each antonym adjective pair (termed adjective axis, e.g., “round” vs. “sharp”) aids in understanding the image content and its atmosphere. Existing methods rely on subjective human engagement, making it difficult to estimate the association of images with arbitrary adjective axes in a single framework. To enable the extension to arbitrary axes, we explore the use of large-scale pretrained models, including Large Language Models (LLMs) and Vision Language Models (VLMs). In the proposed training-free framework, users only need to specify a pair of antonym nouns that negatively and positively describe the target axis (e.g., “roundness” and “sharpness”). Evaluation confirms that the proposed framework can predict negative and positive associations between adjectives and images as correctly as the manually-assisted comparative. The result also highlights the pros and cons of utilizing the VLM’s textual or visual embedding for specific types of adjective axes. Furthermore, computing the similarities among four adjective axes unveils how the proposed framework connects them with each other, such as its tendency to regard a sharp object as being small, hard, and quick in motion.

DOI： 10.1007/978-981-96-2071-5_31

Web of Science

Scopus
ASC: Aggregating Sentence-Level Classifications for Multi-label Long Text Classification

Komamizu T.

Communications in Computer and Information Science 2352 CCIS 巻頁： 174 - 185 2025年

　詳細を見る

出版者・発行元：Communications in Computer and Information Science

Classification is a fundamental task for metadata estimation in archival document management within a digital library. Although pre-trained language models (PLMs) have evolved significantly, multi-label long text classification (MLLTC) remains challenging for PLM-based text classification methods due to their input text length limitations. Existing PLM-based classifiers typically utilize a single representation for a long text. In contrast, this paper explores a sentence-level classification approach. The basic idea is two-fold: a sentence in a text can often focus on one or a few classes, meaning multiple classes can be derived from the individual sentences; furthermore, sentences can typically fit within the length limit. There are two main issues with implementing a sentence-level classifier: the loss of context for each sentence and the increased training cost due to the larger number of documents that need to be processed by a PLM-based model. To address these issues, this paper proposes a framework, ASC, that uses sentence-level n-grams to form a sentence representation and employs a sentence selection method to reduce the number of sentences needed for training. The experimental results demonstrate that ASC outperforms existing text-level classifiers, achieving 25% and 48% improvements in Macro F1 metrics.

DOI： 10.1007/978-981-96-4288-5_14

Scopus
Semantic Alignment on Action for Image Captioning Open Access

Huo D., Kastner M.A., Hirayama T., Komamizu T., Kawanishi Y., Ide I.

IEEE Access 13 巻頁： 199615 - 199629 2025年

　詳細を見る

出版者・発行元：IEEE Access

Image captioning is a popular task in vision and language processing, which aims to generate textual descriptions for images. Previously, it simply used image and text as input with self-attention to capture global dependencies. Recent research further uses objects detected from the input image, so-called object tags, as anchor points to ease alignment between image and text with the attention mechanism. However, they only consider object information in images, while neglecting the actions and object interactions that also appear in the image, which causes actions not caught properly in image captioning. To tackle this previously underrepresented dimension of the semantic alignment, we take account of actions on the semantic level. Specifically, our work focuses on human actions and interactions, which ensures that more salient parts of the image get captioned. We introduce a new type of tag, called action tag, to anchor the action information. First, we provide a method for obtaining such action tags using an action detection model which predicts actions in the image. Next, we leverage these action tags into the captioning model. Experimental results indicate that the proposed action tags can help learn action semantics and catch the salient actions leading to perceived improvements in common performance. Experimental results on MS-COCO Karpathy test split show that the proposed model achieves good scores in BLEU-4 and CIDEr metrics, using action tags as anchors. Furthermore, the number of action tags (no more than 5) is smaller than that of object tags (commonly more than 20), which means there is a potential to reduce FLOPs by reducing the total sequence length. It indicates the potential for efficient reasoning and may be applied to daily activity scenes in the future.

DOI： 10.1109/ACCESS.2025.3631093

Open Access

Scopus
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion

Nguyen, TT; Kawanishi, Y; John, V; Komamizu, T; Ide, I

2025 IEEE 19TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2025年

　詳細を見る

出版者・発行元：2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition Fg 2025

Multi-modal multi-view action recognition is a rapidly growing field in computer vision, offering significant potential for applications in surveillance. However, current datasets often fail to address real-world challenges such as widearea distributed settings, asynchronous data streams, and the lack of frame-level annotations. Furthermore, existing methods face difficulties in effectively modeling inter-view relationships and enhancing spatial feature learning. In this paper, we introduce the MultiSensor-Home dataset, a novel benchmark designed for comprehensive action recognition in home environments, and also propose the Multi-modal Multi-view Transformer-based Sensor Fusion (MultiTSF) method. The proposed MultiSensor-Home dataset features untrimmed videos captured by distributed sensors, providing high-resolution RGB and audio data along with detailed multi-view frame-level action labels. The proposed MultiTSF method leverages a Transformer-based fusion mechanism to dynamically model inter-view relationships. Furthermore, the proposed method integrates a human detection module to enhance spatial feature learning, guiding the model to prioritize frames with human activity to enhance action the recognition accuracy. Experiments on the proposed MultiSensor-Home and the existing MM-Office datasets demonstrate the superiority of MultiTSF over the state-of-the-art methods. Quantitative and qualitative results highlight the effectiveness of the proposed method in advancing real-world multi-modal multi-view action recognition.

DOI： 10.1109/FG61629.2025.11099071

Web of Science

Scopus
Cross-modal recipe retrieval based on unified text encoder with fine-grained contrastive learning Open Access

Zhang, BL; Kyutoku, H; Doman, K; Komamizu, T; Ide, I; Qian, JB

KNOWLEDGE-BASED SYSTEMS 305 巻 2024年12月

　詳細を見る

出版者・発行元：Knowledge Based Systems

Cross-modal recipe retrieval is vital for transforming visual food cues into actionable cooking guidance, making culinary creativity more accessible. Existing methods separately encode the recipe Title, Ingredient, and Instruction using different text encoders, then aggregate them to obtain recipe feature, and finally match it with encoded image feature in a joint embedding space. These methods perform well but require significant computational cost. In addition, they only consider matching the entire recipe and the image but ignore the fine-grained correspondence between recipe components and the image, resulting in insufficient cross-modal interaction. To this end, we propose Unified Text Encoder with Fine-grained Contrastive Learning (UTE-FCL) to achieve a simple but efficient model. Specifically, in each recipe, UTE-FCL first concatenates each of the Ingredient and Instruction texts composed of multiple sentences as a single text. Then, it connects these two concatenated texts with the original single-phrase Title to obtain the concatenated recipe. Finally, it encodes these three concatenated texts and the original Title by a Transformer-based Unified Text Encoder (UTE). This proposed structure greatly reduces the memory usage and improves the feature encoding efficiency. Further, we propose fine-grained contrastive learning objectives to capture the correspondence between recipe components and the image at Title, Ingredient, and Instruction levels by measuring the mutual information. Extensive experiments demonstrate the effectiveness of UTE-FCL compared to existing methods.

DOI： 10.1016/j.knosys.2024.112641

Open Access

Web of Science

Scopus
ICDAR'24 Workshop Chairs' Welcome Message

Dao M.S., Riegler M.A., Dang-Nguyen D.T., Tran H.N., Kiran R.U., Komamizu T.

ACM International Conference Proceeding Series 2024年6月

　詳細を見る

出版者・発行元：ACM International Conference Proceeding Series

Scopus
検出領域絞込みと検出履歴を考慮した広角映像中の鳥追跡

Tingwei Liu, 川西康友, 駒水孝裕, 井手一郎

人工知能学会第二種研究会資料 2024 巻 ( Challenge-064 ) 頁： 07 2024年3月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人人工知能学会

DOI： 10.11517/jsaisigtwo.2024.challenge-064_07

CiNii Research
Computational measurement of perceived pointiness from pronunciation (vol 83, pg 26183, 2024) Open Access

Matsuhira, C; Kastner, MA; Komamizu, T; Ide, I; Hirayama, T; Kawanishi, Y; Doman, K; Deguchi, D

MULTIMEDIA TOOLS AND APPLICATIONS 83 巻 ( 9 ) 頁： 26211 - 26212 2024年3月

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Multimedia Tools and Applications

The original publication of this article contains the following errors: missing ORCID of authors incorrect author contribution statement pronunciation symbols were not shown correctly in both online and PDF versionsthe gamma symbol "Γ" were incorrectly displayed as "0" in the PDF version missing ORCID of authors incorrect author contribution statement pronunciation symbols were not shown correctly in both online and PDF versions the gamma symbol "Γ" were incorrectly displayed as "0" in the PDF version The original article has been corrected.

DOI： 10.1007/s11042-023-17657-z

Open Access

Web of Science

Scopus
Computational measurement of perceived pointiness from pronunciation Open Access

Matsuhira, C; Kastner, MA; Komamizu, T; Ide, I; Hirayama, T; Kawanishi, Y; Doman, K; Deguchi, D

MULTIMEDIA TOOLS AND APPLICATIONS 83 巻 ( 9 ) 頁： 26183 - 26210 2024年3月

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Multimedia Tools and Applications

Sound symbolism is a well-researched topic of psycholinguistics, which tries to comprehend the connection between the sound of a word and its meanings. The Bouba-Kiki effect, one form of sound symbolism, claims that people perceive the pronunciation of “Kiki” as pointier than that of “Bouba.” There is no research that focuses on modeling such perception, i.e., how pointy a pronunciation sounds to humans, through computational and data-driven approaches. To address this, this paper first proposes the novel concept of “phonetic pointiness” defined as how pointy a shape humans are most likely to associate with a given pronunciation. We then model this phonetic pointiness from computational and data-driven approaches to calculate a score for an arbitrary pronunciation. There are three proposed models: a referential model, an expressive model, and a combined model, which integrates the previous two. The idea comes from an existing psycholinguistic classification of two types of sound symbolisms: referential symbolism and expressive symbolism, where the former relates to vocabulary knowledge, while the latter is based on pure human intuition. The proposed models are constructed only with image and language data available on the Web, therefore not requiring task-specific human annotations. We evaluate these models through a crowd-sourced user study, finding a promising correlation between human perception and the phonetic pointiness calculated by the proposed models. The results indicate that human perception can be modeled better by combining both types of sound symbolisms. Furthermore, by observing the behaviors of the models, we show several possible use-cases, such as product naming and psycholinguistic research, which can be a useful insight to further studies and applications.

DOI： 10.1007/s11042-023-15732-z

Open Access

Web of Science

Scopus
Image-Collection Summarization Using Scene-Graph Generation With External Knowledge Open Access

Phueaksri, I; Kastner, MA; Kawanishi, Y; Komamizu, T; Ide, I

IEEE ACCESS 12 巻頁： 17499 - 17512 2024年

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：IEEE Access

Summarization tasks aim to summarize multiple pieces of information into a short description or representative information. A text summarization task summarizes textual information into a short description, whereas an image collection summarization task summarizes an image collection into images or textual representation in which the challenge is to understand the relationship between images. In recent years, scene-graph generation has shown the advantage of describing the visual contexts of a single-image, and incorporating external knowledge into the scene-graph generation model has also given effective directions for unseen single-image scene-graph generation. While external knowledge has been implemented in related work, it is still challenging to use this information efficiently for relationship estimation during the summarization. Following this trend, in this paper, we propose a novel scene-graph-based image-collection summarization model that aims to generate a summarized scene-graph of an image collection. The key idea of the proposed method is to enhance the relation predictor toward relationships between images in an image collection incorporating knowledge graphs as external knowledge for training a model. With this approach, we build an end-to-end framework that can generate a summarized scene graph of an image collection. To evaluate the proposed method, we also build an extended annotated MS-COCO dataset for this task and introduce an evaluation process that focuses on estimating the similarity between a summarized scene graph and ground-truth scene graphs. Traditional evaluation focuses on calculating precision and recall scores, which involve true positive predictions without balancing precision and recall. Meanwhile, the proposed evaluation process focuses on calculating the F-score of the similarity between a summarized scene graph and ground-truth scene graphs, which aims to balance both false positives and false negatives. Experimental results show that using external knowledge to enhance the relation predictor achieves better results than existing methods.

DOI： 10.1109/ACCESS.2024.3360113

Open Access

Web of Science

Scopus
Zero-Shot Pill-Prescription Matching With Graph Convolutional Network and Contrastive Learning Open Access

Nguyen, TT; Nguyen, PL; Kawanishi, Y; Komamizu, T; Ide, I

IEEE ACCESS 12 巻頁： 55889 - 55904 2024年

　詳細を見る

出版者・発行元：IEEE Access

Patients' safety is paramount in the healthcare industry, and reducing medication errors is essential for improvement. A promising solution to this problem involves the development of automated systems capable of assisting patients in verifying their pill intake mistakes. This paper investigates a Pill-Prescription matching task that seeks to associate pills in a multi-pill photo with their corresponding names in the prescription. We specifically aim to overcome the limitations of existing pill detection methods when faced with unseen pills, a situation characteristic of zero-shot learning. We propose a novel method named Zero-PIMA (Zero-shot Pill-Prescription Matching), designed to match pill images with prescription names effectively, even for pills not included in the training dataset. Zero-PIMA is an end-to-end model that includes an object localization module to determine and extract features of pill images and a graph convolutional network to capture the spatial relationship of the pills' text in the prescription. After that, we leverage the contrastive learning paradigm to increase the distance between mismatched pill images and pill name pairs while minimizing the distance between matched pairs. In addition, to deal with the zero-shot pill detection problem, we leverage pills' metadata retrieved from the DrugBank database to fine-tune a pre-trained text encoder, thereby incorporating visual information about pills (e.g., shape, color) into their names, making them more informative and ultimately enhancing the pill image-name matching accuracy. Extensive experiments are conducted on our collected real-world VAIPEPP dataset of multi-pill photos and prescriptions. Through a series of comprehensive experiments, the proposed method outperforms other methods for both seen and unseen pills in terms of mean average precision. These results indicate that the proposed method could reduce medication errors and improve patients' safety.

DOI： 10.1109/ACCESS.2024.3390153

Open Access

Web of Science

Scopus
R-DiP: Re-ranking Based Diffusion Pre-computation for Image Retrieval

Kato, T; Komamizu, T; Ide, I

DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT II, DEXA 2024 14911 巻頁： 233 - 247 2024年

　詳細を見る

出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

In image retrieval tasks, although efficient methods based on pre-computing information related to retrieval and effective methods utilizing re-ranking have been proposed, developing a method that achieves both efficiency and effectiveness at the same time, remains challenging. To develop an efficient and effective image retrieval method, we propose a simple-yet-effective novel image retrieval framework; R-DiP (Re-ranking based Diffusion Pre-computation). It incorporates an effective re-ranking model into the pre-computation step of an existing efficient method, namely, Offline Diffusion that pre-computes the diffusion process in the offline step and provides a simple linear combination-based retrieval in the online step. Experimental results on standard benchmarks shows that R-DiP performs comparable to the State-Of-The-Art (SOTA) image retrieval method, namely SuperGlobal, while maintaining the efficiency of Offline Diffusion. Notably, in million-scale datasets, R-DiP improves the mAP (mean Average Precision) by about 2.0%, and reduces the speed by about 75% on average, surpassing SOTA methods. These results indicate that R-DiP is a promising solution to the efficiency-effectiveness trade-off in image retrieval, that offers the flexibility to incorporate any advanced re-ranking method in the future.

DOI： 10.1007/978-3-031-68312-1_18

Web of Science

Scopus
One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features

Nguyen, TT; Kawanishi, Y; Komamizu, T; Ide, I

2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024 2024年

　詳細を見る

出版者・発行元：2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition Fg 2024

Open-vocabulary Temporal Action Detection (Open-vocab TAD) is an advanced video analysis approach that expands Closed-vocabulary Temporal Action Detection (Closed-vocab TAD) capabilities. Closed-vocab TAD is typically confined to localizing and classifying actions based on a predefined set of categories. In contrast, Open-vocab TAD goes further and is not limited to these predefined categories. This is particularly useful in real-world scenarios where the variety of actions in videos can be vast and not always predictable. The prevalent methods in Open-vocab TAD typically employ a 2-stage approach, which involves generating action proposals and then identifying those actions. However, errors made during the first stage can adversely affect the subsequent action identification accuracy. Additionally, existing studies face challenges in handling actions of different durations owing to the use of fixed temporal processing methods. Therefore, we propose a L-stage approach consisting of two primary modules: Multi-scale Video Analysis (MVA) and Video-Text Alignment (VTA). The MVA module captures actions at varying temporal resolutions, overcoming the challenge of detecting actions with diverse durations. The VTA module leverages the synergy between visual and textual modalities to precisely align video segments with corresponding action labels, a critical step for accurate action identification in Open-vocab scenarios. Evaluations on widely recognized datasets THUMOSl4 and ActivityNet-I.3, showed that the proposed method achieved superior results compared to the other methods in both Open-vocab and Closed-vocab settings. This serves as a strong demonstration of the effectiveness of the proposed method in the TAD task.

DOI： 10.1109/FG59268.2024.10581896

Web of Science

Scopus
Interpolating the Text-to-Image Correspondence Based on Phonetic and Phonological Similarities for Nonword-to-Image Generation Open Access

Matsuhira, C; Kastner, MA; Komamizu, T; Hirayama, T; Doman, K; Kawanishi, Y; Ide, I

IEEE ACCESS 12 巻頁： 41299 - 41316 2024年

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：IEEE Access

Text-to-Image (T2I) generation is the task of synthesizing images corresponding to a given text input. The recent innovations in artificial intelligence have enhanced the capacity of conventional T2I generation, yielding more and more powerful models day by day. However, their behavior is known to become unstable in the face of text inputs containing nonwords that have no definition within a language. This behavior not only results in situations where image generation does not match human expectations but also hinders these models from being utilized in psycholinguistic applications and simulations. This paper exploits the human nature of associating nonwords with their phonetically and phonologically similar words and uses it to propose a T2I generation framework robust against nonword inputs. The framework comprises a phonetics-aware language model as well as an adjusted T2I generation model. Our evaluations confirm that the proposed nonword-to-image generation synthesizes images that depict visual concepts of phonetically similar words more stably than comparative methods. We also assess how the image generation results match human expectations, showing a better agreement than the phonetics-blind baseline.

DOI： 10.1109/ACCESS.2024.3378095

Open Access

Web of Science

Scopus
ICDAR 24: Intelligent Cross-Data Analysis and Retrieval Open Access

Dao, MS; Riegler, MA; Dang-Nguyen, DT; Tran, HN; Kiran, RU; Komamizu, T

PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 頁： 1332 - 1333 2024年

　詳細を見る

出版者・発行元：Icmr 2024 Proceedings of the 2024 International Conference on Multimedia Retrieval

Our workshop aims to provide a platform for both academic and industrial professionals engaged in the analysis and retrieval of cross-data from diverse perspectives, with a particular emphasis on wearable and ambient sensors, lifelog cameras, social networks, and surrounding sensors. Despite numerous studies exploring individual viewpoints, there remains a significant gap in the analysis and retrieval of cross-data to maximize benefits for humanity. Additionally, challenges such as data security and distributed learning for cross-modal model training and inference arise when dealing with large and distributed datasets. We invite researchers to contribute to this initiative, with the overarching goal of fostering the development of a smart and sustainable society through the efficient utilization of intelligent cross-data analysis and retrieval techniques.

DOI： 10.1145/3652583.3659999

Web of Science

Scopus
Do LLMs Agree with Humans on Emotional Associations to Nonsense Words? Open Access

Miyakawa Y., Matsuhira C., Kato H., Hirayama T., Komamizu T., Ide I.

Cmcl 2024 13th Edition of the Workshop on Cognitive Modeling and Computational Linguistics Proceedings of the Workshop 頁： 81 - 85 2024年

　詳細を見る

出版者・発行元：Cmcl 2024 13th Edition of the Workshop on Cognitive Modeling and Computational Linguistics Proceedings of the Workshop

Understanding human perception of nonsense words is helpful to devise product and character names that match their characteristics. Previous studies have suggested the usefulness of Large Language Models (LLMs) for estimating such human perception, but they did not focus on its emotional aspects. Hence, this study aims to elucidate the relationship of emotions evoked by nonsense words between humans and LLMs. Using a representative LLM, GPT-4, we reproduce the procedure of an existing study to analyze evoked emotions of humans for nonsense words. A positive correlation of 0.40 was found between the emotion intensity scores reproduced by GPT-4 and those manually annotated by humans. Although the correlation is not very high, this demonstrates that GPT-4 may agree with humans on emotional associations to nonsense words. Considering that the previous study reported that the correlation among human annotators was about 0.68 on average and that between a regression model trained on the annotations for real words and humans was 0.17, GPT-4’s agreement with humans is notably strong.

DOI： 10.18653/v1/2024.cmcl-1.7

Open Access

Scopus
Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation Open Access

Matsuhira, C; Kastner, MA; Komamizu, T; Hirayama, T; Ide, I

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2024 頁： 7307 - 7315 2024年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia

Text-to-image diffusion models sometimes depict blended concepts in the generated images. One promising use case of this effect would be the nonword-to-image generation task which attempts to generate images intuitively imaginable from a non-existing word (nonword). To realize nonword-to-image generation, an existing study focused on associating nonwords with similar-sounding words. Since each nonword can have multiple similar-sounding words, generating images containing their blended concepts would increase intuitiveness, facilitating creative activities and promoting computational psycholinguistics. Nevertheless, no existing study has quantitatively evaluated this effect in either diffusion models or the nonword-to-image generation paradigm. Therefore, this paper first analyzes the conceptual blending in a pretrained diffusion model, Stable Diffusion. The analysis reveals that a high percentage of generated images depict blended concepts when inputting an embedding interpolating between the text embeddings of two text prompts referring to different concepts. Next, this paper explores the best text embedding space conversion method of an existing nonword-to-image generation framework to ensure both the occurrence of conceptual blending and image generation quality. We compare the conventional direct prediction approach with the proposed method that combines k-nearest neighbor search and linear regression. Evaluation reveals that the enhanced accuracy of the embedding space conversion by the proposed method improves the image generation quality, while the emergence of conceptual blending could be attributed mainly to the specific dimensions of the high-dimensional text embedding space.

DOI： 10.1145/3664647.3681202

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mm/mm2024.html#Matsuhira0KHI24
Lightweight Maize Disease Detection through Post-Training Quantization with Similarity Preservation

Padeiro, CV; Chen, TW; Komamizu, T; Ide, I

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 頁： 2111 - 2120 2024年

　詳細を見る

出版者・発行元：IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Traditional crop disease diagnosis, reliant on expert visual observation, is expensive, time-consuming, and prone to error. While Convolutional Neural Networks (CNNs) offer promising alternatives, their high resource demands limit their accessibility to farmers, particularly those in resource-constrained settings. Lightweight models that operate on resource-limited devices without network access are crucial to address this gap. This paper proposes a Similarity-Preserving Quantization (SPQ) method to convert high-precision CNNs into lower-precision models while maintaining similar feature representations. While quantization offers a promising approach for building lightweight CNNs for crop disease detection, the quality of quantized models often suffers. SPQ addresses this challenge by ensuring equivalent activation patterns for similar crop images in both the original and quantized models. Experimental evaluation using MobileNetV2 and ResNet-50 demonstrates that SPQ improves throughput, inference, and memory footprint more than 3 times while preserving the detection performance.

DOI： 10.1109/CVPRW63382.2024.00216

Web of Science

Scopus
L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models

Yusuke Kimura, Takahiro Komamizu, Kenji Hatano

Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U) 頁： 53 - 62 2024年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Association for Computational Linguistics

DOI： 10.18653/v1/2024.customnlp4u-1.6
L3Masking: Multi-task Fine-tuning for Language Models by Leveraging Lessons Learned from Vanilla Models Open Access

Kimura Y., Komamizu T., Hatano K.

1st Workshop on Customizable Nlp Progress and Challenges in Customizing Nlp for A Domain Application Group or Individual Customnlp4u 2024 Proceedings of the Workshop 頁： 53 - 62 2024年

　詳細を見る

出版者・発行元：1st Workshop on Customizable Nlp Progress and Challenges in Customizing Nlp for A Domain Application Group or Individual Customnlp4u 2024 Proceedings of the Workshop

When distributional differences exist between pre-training and fine-tuning data, language models (LMs) may perform poorly on downstream tasks. Recent studies have reported that multi-task learning of downstream task and masked language modeling (MLM) task during the fine-tuning phase improves the performance of the downstream task. Typical MLM tasks (e.g., random token masking (RTM)) tend not to care tokens corresponding to the knowledge already acquired during the pre-training phase, therefore LMs may not notice the important clue or not effective to acquire linguistic knowledge of the task or domain. To overcome this limitation, we propose a new masking strategy for MLM task, called L3Masking 1, that leverages lessons (specifically, token-wise likelihood in a context) learned from the vanilla language model to be fine-tuned. L3Masking actively masks tokens with low likelihood on the vanilla model. Experimental evaluations on text classification tasks in different domains confirms a multi-task text classification method with L3Masking performed task adaptation more effectively than that with RTM. These results suggest the usefulness of assigning a preference to the tokens to be learned as the task or domain adaptation.

DOI： 10.18653/v1/2024.customnlp4u-1.6

Open Access

Scopus
Action Selection Learning for Multi-label Multi-view Action Recognition Open Access

Nguyen, TT; Kawanishi, Y; Komamizu, T; Ide, I

PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2024 2024年

　詳細を見る

DOI： 10.1145/3696409.3700211

Web of Science
RecipeMeta: Metapath-enhanced Recipe Recommendation on Heterogeneous Recipe Network Open Access

Shi J., Komamizu T., Doman K., Kyutoku H., Ide I.

Proceedings of the 5th ACM International Conference on Multimedia in Asia Mmasia 2023 2023年12月

　詳細を見る

出版者・発行元：Proceedings of the 5th ACM International Conference on Multimedia in Asia Mmasia 2023

Recipe is a set of instructions that describes how to make food. It can help people from the preparation of ingredients, food cooking process, etc. to prepare the food, and increasingly in demand on the Web. To help users find the vast amount of recipes on the Web, we address the task of recipe recommendation. Due to multiple data types and relationships in a recipe, we can treat it as a heterogeneous network to describe its information more accurately. To effectively utilize the heterogeneous network, metapath was proposed to describe the higher-level semantic information between two entities by defining a compound path from peer entities. Therefore, we propose a metapath-enhanced recipe recommendation framework, RecipeMeta, that combines GNN (Graph Neural Network)-based representation learning and specific metapath-based information in a recipe to predict User-Recipe pairs for recommendation. Through extensive experiments, we demonstrate that the proposed model, RecipeMeta, outperforms state-of-the-art methods for recipe recommendation.

DOI： 10.1145/3595916.3626430

Open Access

Scopus
NarSUM 2023 Chairs Welcome

Kankanhalli M.S., Patras I., Liu J., Wong Y., Komamizu T.

Narsum 2023 Proceedings of the 2nd Workshop on User Centric Narrative Summarization of Long Videos Co Located with mm 2023 2023年10月

　詳細を見る

出版者・発行元：Narsum 2023 Proceedings of the 2nd Workshop on User Centric Narrative Summarization of Long Videos Co Located with mm 2023

Scopus
An Automatic Labeling Method for Subword-Phrase Recognition in Effective Text Classification Open Access

Kimura Y., Komamizu T., Hatano K.

Informatica Slovenia 47 巻 ( 3 ) 頁： 315 - 326 2023年8月

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Informatica Slovenia

The deep learning-based text classification methods perform better than traditional ones. In addition to the success of the deep learning technique, multi-Task learning (MTL) has come to become a promising approach for text classification; for instance, an MTL approach in text classification employs named entity recognition as an auxiliary task and has showcased that the task helps to improve the text classification performance. Existing MTL-based text classification methods depend on the auxiliary tasks using supervised labels. Obtaining such supervision labels requires additional human and financial costs in addition to those for the main text classification task. To reduce these additional costs, we propose an MTL-based text classification framework on supervised label creation by automatically labeling phrases in texts for the auxiliary recognition task. A basic idea to realize the proposed framework is to utilize phrasal expressions consisting of subwords (called subword-phrases). To the best of our knowledge, no text classification approach has been designed on top of subword-phrases because subwords only sometimes express a coherent set of meanings. The novelty of the proposed framework is in adding subword-phrase recognition as an auxiliary task and utilizing subword-phrases for text classification. It extracts subword-phrases in an unsupervised manner using the statistics approach. To construct labels for effective subword-phrase recognition tasks, extracted subword-phrases are classified based on document classes to ensure that subword-phrases dedicated to some classes can be distinguishable. Experimental evaluation for text classification using five popular datasets showcased the effectiveness of the subword-phrase recognition as an auxiliary task. It also showed that comparing various labeling schemes in recent studies indicated insights for labeling common subword-phrases among several document classes.

DOI： 10.31449/inf.v47i3.4742

Open Access

Scopus
Image Impression Estimation by Clustering People with Similar Tastes

Kojima Banri, Komamizu Takahiro, Kawanishi Yasutomo, Doman Keisuke, Ide Ichiro

IEICE Proceeding Series 78 巻頁： P1-14 2023年7月

　詳細を見る

記述言語：英語出版者・発行元：The Institute of Electronics, Information and Communication Engineers

This paper proposes a method for estimating impressions received from images according to the personal attributes of users, so that they can find the desired images based on their tastes. A previous study taking into account gender and age as personal attributes showed promising results. However, it also showed that users sharing the same gender and age do not necessarily share similar tastes. Therefore, other attributes should be considered to well capture users' personal tastes. However, taking more attributes into account leads to a problem that insufficient amounts of data are served to classifiers, due to explosion of the number of combinations of attributes. To tackle this problem, we propose an aggregation-based method to condense training data for impression estimation while personal attribute information is taken into account. For evaluation, a dataset of 4,000 carpet images annotated with 24 impression words by crowd-workers was prepared, which contained 273k annotations. Experimental results showed that the use of combinations of personal attributes improved the accuracy of impression estimation. This indicates that combinations of personal attributes are helpful to estimate impressions of individual viewers to images.

DOI： 10.34385/proc.78.p1-14

CiNii Research
Towards Achieving Lightweight Deep Neural Network for Precision Agriculture with Maize Disease Detection

Padeiro Carlos-Victorino, Komamizu Takahiro, Ide Ichiro

IEICE Proceeding Series 78 巻頁： P1-23 2023年7月

　詳細を見る

記述言語：英語出版者・発行元：The Institute of Electronics, Information and Communication Engineers

Agriculture is the pillar industry of human sur- vival. However, various crop diseases reduce the hu- man food supply and lead to starvation and death in the worst cases. Experts perform visual symptoms ob- servation for crop disease diagnosis. Which process is time-consuming and expensive. Also, the process has significant risk of human error due to subjective per- ception. Convolutional Neural Networks (CNN) use image processing techniques to show great potential in plant disease detection. However, it requires thou- sands of channels to learn rich features, resulting in large models requiring powerful computing, power sup- ply, and high bandwidth, making it more expensive and difficult for farmers to acquire. Therefore, deploying these solutions on resource-constrained devices is de- sirable to make them more accessible. Thus, we pro- pose a lightweight object detection CNN that can run on resource-constrained devices to detect crop diseases. Channel pruning is applied to optimize resource use by removing unimportant channels and filter weights to reduce network parameters, inference time, and the number of FLOPS. Experimental results with object de- tector, Faster R-CNN with two backbones, ResNet-50, and EfficientNet-B7, show significant improvement in model efficiency, keeping high accuracy.

DOI： 10.34385/proc.78.p1-23

CiNii Research
Small Object Detection for Birds with Swin Transformer

Huo Da, Kastner Marc-A., Liu Tingwei, Kawanishi Yasutomo, Hirayama Takatsugu, Komamizu Takahiro, Ide Ichiro

IEICE Proceeding Series 78 巻頁： TE-3 2023年7月

　詳細を見る

記述言語：英語出版者・発行元：The Institute of Electronics, Information and Communication Engineers

Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by difficulties due to blur, occlusion, and so on. Current small object detection methods are tailored to small and dense situations, such as pedestrians in a crowd or far objects in remote sensing scenarios. However, when the target object is small and sparse, there is a lack of objects available for training, making it more difficult to learn effective features. In this paper, we propose a specialized method for detecting a specific category of small objects; birds. Particularly, we improve the features learned by the neck; the sub-network between the backbone and the prediction head, to learn more effective features with a hierarchical design. We employ Swin Transformer to upsample the image features. Moreover, we change the shifted window size for adapting to small objects. Experiments show that the proposed Swin Transformerbased neck combined with CenterNet can lead to good performance by changing the window sizes. We further find that smaller window sizes (default 2) benefit mAPs for small object detection.

DOI： 10.34385/proc.78.te-3

CiNii Research
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

Kondo Yuki, Ukita Norimichi, Yamaguchi Takayuki, Hou Hao-Yu, Shen Mu-Yi, Hsu Chia-Chi, Huang En-Ming, Huang Yu-Chen, Xia Yu-Cheng, Wang Chien-Yao, Lee Chun-Yi, Huo Da, Kastner Marc-A., Liu Tingwei, Kawanishi Yasutomo, Hirayama Takatsugu, Komamizu Takahiro, Ide Ichiro, Shinya Yosuke, Liu Xinyao, Liang Guang, Yasui Syusuke

IEICE Proceeding Series 78 巻頁： TE-1 2023年7月

　詳細を見る

記述言語：英語出版者・発行元：The Institute of Electronics, Information and Communication Engineers

Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the awardwinning methods. The dataset, the baseline code, and the website for evaluation on the public testset are publicly available.

DOI： 10.34385/proc.78.te-1

CiNii Research
[D21] 歴史情報としての法令データベースの構築

佐野智也, 外山勝彦, 駒水孝裕, 増田知子

デジタルアーカイブ学会誌 7 巻 ( s2 ) 頁： s142 - s145 2023年

　詳細を見る

記述言語：日本語出版者・発行元：デジタルアーカイブ学会

国家・社会制度に関する政策は、法令を通して制度化されるため、日本社会の動きは、法令情報を介して捉えることができる。本研究は、制定・改正などを通じた法令の連続的変遷を把握し、日本の国家・社会運営の長期的変化を調査するための研究基盤の確立を目指すものである。その最初の目標として、明治以降の全法令を検索可能なオープンデータベースシステムの構築を進めているが、現在、明治 19（1886）年から平成 29（2017）年までに公布された法律と勅令のXML文書化を完了し、それらの全文検索が可能なデータベースの構築を終えた。本報告では、既存のデータベースの問題点について述べた上で、構築したデータベースを説明する。

DOI： 10.24506/jsda.7.s2_s142

CiNii Research
Visual Passage Score Aggregation for Image Retrieval

Komamizu T.

Proceedings 2023 IEEE 6th International Conference on Multimedia Information Processing and Retrieval Mipr 2023 頁： 37 - 42 2023年

　詳細を見る

出版者・発行元：Proceedings 2023 IEEE 6th International Conference on Multimedia Information Processing and Retrieval Mipr 2023

This paper proposes an effective image retrieval method. Recent image retrieval approaches attempt to construct a single global feature, including local features of an image. In contrast, this paper proposes multiple features for each image. The basic idea is that a target object in a query image is not necessarily in a major part of a database image; therefore, its single feature may include noisy information from the surroundings of the target object. To deal with this, this paper proposes a Visual Passage Score Aggregation framework (VPSA). VPSA first decomposes an image into several pieces of images, called Visual Passages. Based on visual passages, VPSA aggregates relevance scores of visual passages for ranking. VPSA is efficient in the retrieval phase because an ordinary nearest neighbor search is used. The experiment revealed that VPSA showed superior or comparable performance to the state-of-the-art methods, and it takes a shorter time in the retrieval phase.

DOI： 10.1109/MIPR59079.2023.00021

Scopus
Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach

Phueaksri, I; Kastner, MA; Kawanishi, Y; Komamizu, T; Ide, I

MULTIMEDIA MODELING, MMM 2023, PT I 13833 巻頁： 178 - 190 2023年

　詳細を見る

掲載種別：論文集(書籍)内論文出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Most content summarization models from the field of natural language processing summarize the textual contents of a collection of documents or paragraphs. In contrast, summarizing the visual contents of a collection of images has not been researched to this extent. In this paper, we present a framework for summarizing the visual contents of an image collection. The key idea is to collect the scene graphs for all images in the image collection, create a combined representation, and then generate a visually summarizing caption using a scene-graph captioning model. Note that this aims to summarize common contents across all images in a single caption rather than describing each image individually. After aggregating all the scene graphs of an image collection into a single scene graph, we normalize it by using an additional concept generalization component. This component selects the common concept in each sub-graph with ConceptNet based on word embedding techniques. Lastly, we refine the captioning results by replacing a specific noun phrase with a common concept from the concept generalization component to improve the captioning results. We construct a dataset for this task based on the MS-COCO dataset using techniques from image classification and image-caption retrieval. An evaluation of the proposed method on this dataset shows promising performance.

DOI： 10.1007/978-3-031-27077-2_14

Web of Science

Scopus
Towards Achieving Lightweight Deep Neural Network for Precision Agriculture with Maize Disease Detection

Padeiro, CV; Komamizu, T; Ide, I

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA 頁： 1 - 6 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings of Mva 2023 18th International Conference on Machine Vision and Applications

Agriculture is the pillar industry of human survival. However, various crop diseases reduce the human food supply and lead to starvation and death in the worst cases. Experts perform visual symptoms observation for crop disease diagnosis. Which process is time-consuming and expensive. Also, the process has significant risk of human error due to subjective perception. Convolutional Neural Networks (CNN) use image processing techniques to show great potential in plant disease detection. However, it requires thousands of channels to learn rich features, resulting in large models requiring powerful computing, power supply, and high bandwidth, making it more expensive and difficult for farmers to acquire. Therefore, deploying these solutions on resource-constrained devices is desirable to make them more accessible. Thus, we propose a lightweight object detection CNN that can run on resource-constrained devices to detect crop diseases. Channel pruning is applied to optimize resource use by removing unimportant channels and filter weights to reduce network parameters, inference time, and the number of FLOPS. Experimental results with object detector, Faster R-CNN with two backbones, ResNet-50, and EfficientNet-B7, show significant improvement in model efficiency, keeping high accuracy.

DOI： 10.23919/MVA57639.2023.10215815

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mva/mva2023.html#PadeiroKI23
Small Object Detection for Birds with Swin Transformer Open Access

Huo, D; Kastner, MA; Liu, T; Kawanishi, Y; Hirayama, T; Komamizu, T; Ide, I

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA 頁： 1 - 5 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings of Mva 2023 18th International Conference on Machine Vision and Applications

Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by difficulties due to blur, occlusion, and so on. Current small object detection methods are tailored to small and dense situations, such as pedestrians in a crowd or far objects in remote sensing scenarios. However, when the target object is small and sparse, there is a lack of objects available for training, making it more difficult to learn effective features. In this paper, we propose a specialized method for detecting a specific category of small objects; birds. Particularly, we improve the features learned by the neck; the sub-network between the backbone and the prediction head, to learn more effective features with a hierarchical design. We employ Swin Transformer to upsample the image features. Moreover, we change the shifted window size for adapting to small objects. Experiments show that the proposed Swin Transformer-based neck combined with CenterNet can lead to good performance by changing the window sizes. We further find that smaller window sizes (default 2) benefit mAPs for small object detection.

DOI： 10.23919/MVA57639.2023.10216093

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mva/mva2023.html#HuoKLKHKI23
Nonword-to-Image Generation Considering Perceptual Association of Phonetically Similar Words

Matsuhira, C; Kastner, MA; Komamizu, T; Hirayama, T; Doman, K; Ide, I

PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT GENERATION AND EVALUATION, MCGE 2023 頁： 115 - 125 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1145/3607541.3616818

Web of Science

その他リンク： https://dblp.uni-trier.de/db/conf/mcge/mcge2023.html#Matsuhira0KHDI23
NarSUM'23: The 2ndWorkshop on User-Centric Narrative Summarization of Long Videos Open Access

Kankanhalli, M; Patras, I; Liu, JQ; Wong, YK; Komamizu, T; Yamazaki, S; Stephen, K; Kansal, K

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 頁： 9731 - 9733 2023年

　詳細を見る

出版者・発行元：Mm 2023 Proceedings of the 31st ACM International Conference on Multimedia

With video capture devices becoming widely popular, the amount of video data generated per day has seen a rapid increase over the past few years. Browsing through hours of video data to retrieve useful information is a tedious and boring task. Video Summarization technology has played a crucial role in addressing this issue. It is a well-researched topic in the multimedia community. However, the focus so far has been limited to creating summary to videos which are short (only a few minutes). This workshop aims to call for researchers on relevant background to focus on novel solutions for user-centric narrative summarization of long videos. This workshop will also cover important aspects of video summarization research like what is "important"in a video, how to evaluate the goodness of a created summary, open challenges in video summarization, etc.

DOI： 10.1145/3581783.3610946

Open Access

Web of Science

Scopus
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results.

Yuki Kondo, Norimichi Ukita, Takayuki Yamaguchi, Hao-Yu Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yu-Cheng Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner 0001, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, Syusuke Yasui

CoRR abs/2307.09143 巻 2023年

　詳細を見る

掲載種別：研究論文（学術雑誌）

DOI： 10.48550/arXiv.2307.09143
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

Kondo, Y; Ukita, N; Yamaguchi, T; Hou, HY; Shen, MY; Hsu, CC; Huang, EM; Huang, YC; Xia, YC; Wang, CY; Lee, CY; Da Huo; Kastner, MA; Liu, TW; Kawanishi, Y; Hirayama, T; Komamizu, T; Ide, I; Shinya, Y; Liu, XY; Liang, G; Yasui, S

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA 頁： 1 - 11 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings of Mva 2023 18th International Conference on Machine Vision and Applications

Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset 1 is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset 2, the baseline code 3, and the website for evaluation on the public testset 4 are publicly available.

DOI： 10.23919/MVA57639.2023.10215935

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mva/mva2023.html#KondoUYHSHHHXWLHKLKHKISLLY23
IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining.

Chihaya Matsuhira, Marc A. Kastner 0001, Takahiro Komamizu, Takatsugu Hirayama, Keisuke Doman, Yasutomo Kawanishi, Ichiro Ide

CoRR abs/2303.03144 巻 2023年

　詳細を見る

掲載種別：研究論文（学術雑誌）

DOI： 10.48550/arXiv.2303.03144
Image Impression Estimation by Clustering People with Similar Tastes

Kojima, B; Komamizu, T; Kawanishi, Y; Doman, K; Ide, I

2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA 頁： 1 - 5 2023年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings of Mva 2023 18th International Conference on Machine Vision and Applications

This paper proposes a method for estimating impressions from images according to the personal attributes of users so that they can find the desired images based on their tastes. Our previous work, which considered gender and age as personal attributes, showed promising results, but it also showed that users sharing these attributes do not necessarily share similar tastes. Therefore, other attributes should be considered to capture the personal tastes of each user well. However, taking more attributes into account leads to a problem in which insufficient amounts of data are served to classifiers due to the explosion of the number of combinations of attributes. To tackle this problem, we propose an aggregation-based method to condense training data for impression estimation while considering personal attribute information. For evaluation, a dataset of 4,000 carpet images annotated with 24 impression words was prepared. Experimental results showed that the use of combinations of personal attributes improved the accuracy of impression estimation, which indicates the effectiveness of the proposed approach.

DOI： 10.23919/MVA57639.2023.10216055

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mva/mva2023.html#KojimaKKDI23
An Approach to Generate a Caption for an Image Collection Using Scene Graph Generation Open Access

Phueaksri, I; Kastner, MA; Kawanishi, Y; Komamizu, T; Ide, I

IEEE ACCESS 11 巻頁： 128245 - 128260 2023年

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：IEEE Access

Summarization is a challenging task that aims to generate a summary by grasping common information of a given set of information. Text summarization is a popular task of determining the topic or generating a textual summary of documents. In contrast, image summarization aims to find a representative summary of a collection of images. However, current methods are still restricted to generating a visual scene graph, tags, and noun phrases, but cannot generate a fitting textual description of an image collection. Thus, we introduce a novel framework for generating a summarized caption of an image collection. Since scene graph generation shows advancement in describing objects and their relationships on a single image, we use it in the proposed method to generate a scene graph for each image in an image collection. Then, we find common objects and their relationships from all scene graphs and represent them as a summarized scene graph. For this, we merge all scene graphs and select part of it by estimating the most common objects and relationships. Finally, the summarized scene graph is input into a captioning model. In addition, we introduce a technique to generalize specific words in the final caption into common concept words incorporating external knowledge. To evaluate the proposed method, we construct a dataset for this task by extending the annotation of the MS-COCO dataset using an image retrieval method. The evaluation of the proposed method on this dataset showed promising performance compared to text summarization-based methods.

DOI： 10.1109/ACCESS.2023.3332098

Open Access

Web of Science

Scopus
Multi-Task Learning-based Text Classification with Subword-Phrase Extraction Open Access

Kimura Y., Komamizu T., Hatano K.

ACM International Conference Proceeding Series 頁： 23 - 30 2022年12月

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ACM International Conference Proceeding Series

Text classification using deep learning, which is trained with a tremendous amount of text, has achieved superior performance than traditional methods. In addition to its success, multi-Task learning has become a promising approach for text classification; for instance, a multi-Task learning approach employs named entity recognition as an auxiliary task for text classification. The existing MTL-based text classification methods depend on auxiliary tasks using supervised labels, which require large human and/or financial efforts to create. To reduce these efforts, this paper proposes a multi-Task learning-based text classification framework which reduces the additional efforts on supervised label creation. A basic idea to realize this is that to utilize phrasal expressions consisting of subwords (called subword-phrase). To the best of our knowledge, there has been no text classification approach on top of subword-phrases, because subwords do not always express a coherent set of meanings. The proposed framework is new to add subword-phrase recognition as an auxiliary task, and to utilize subword-phrases for text classification. To realize the low-cost auxiliary recognition task, the framework extracts subword-phrases in an unsupervised manner. The experimental evaluation of the five popular datasets for text classification showcases the effectiveness of the involvement of the subword-phrase recognition as an auxiliary task. It also shows comparative results with the state-of-The-Art method.

DOI： 10.1145/3568562.3568635

Open Access

Scopus
Detection of Birds in a 3D Environment Referring to Audio-Visual Information

Kawanishi, Y; Ide, I; Chu, B; Matsuhira, C; Kastner, MA; Komamizu, T; Deguchi, D

2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022) 頁： 1 - 7 2022年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Avss 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance

We propose a method to detect birds in a 3D environment referring to both audio information observed from a microphone array and visual information observed from a panorama camera. In general, in panorama images, birds appear relatively too small to be detected accurately even with the state-of-the-art deep learning models. Thus, the proposed method takes a two step approach where the birds are first roughly located referring to audio information by Sound Source Localization (SSL), and then image detection is applied within its vicinity. Through evaluation on a dataset annotated with bounding boxes surrounding the birds, we show that the proposed method improves detection performance of birds that appear in relatively small sizes in the image, in both accuracy and processing speed.

DOI： 10.1109/AVSS56176.2022.9959510

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/avss/avss2022.html#KawanishiICMKKD22
Intuitive Gait Modeling using Mimetic-Words for Gait Description and Generation

Kato, H; Hirayama, T; Doman, K; Ide, I; Kawanishi, Y; Komamizu, T; Deguchi, D; Murase, H

2022 IEEE 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 頁： 240 - 245 2022年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings 5th International Conference on Multimedia Information Processing and Retrieval Mipr 2022

Gait is one of the most familiar action for us, that is why we can distinguish slight difference of human gaits and perceive their impressions. However, the relationship has been never explored because of the absence of intuitive labels for the slight differences. In this paper, to solve this problem, we propose a intuitive gait model using Japanese mimetic-words. A mimetic-word has sound-symbolism, which means that there is an association between linguistic sounds and sensory experiences, and the phonemes of a mimetic-word is strongly related to the visual sensation. Thanks to the sound-symbolism, Japanese mimetic-words have a possibility of modeling gaits intuitively. Thus, we have previously proposed a method which describes gait with a mimetic-word. In this paper, in the opposite direction, we propose a method which generates gait from a mimetic-word, and confirm the effectiveness of the proposed intuitive gait model which consists of the phonetic-vector through evaluations of both the generation task and the description task.

DOI： 10.1109/MIPR54900.2022.00050

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mipr/mipr2022.html#KatoHDIKKDM22
Towards Efficient Data Access Through Multiple Relationship in Graph-Structured Digital Archives

Kusu, K; Komamizu, T; Hatano, K

FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022 13636 巻頁： 377 - 391 2022年

　詳細を見る

掲載種別：論文集(書籍)内論文出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

The research field of digital libraries mainly deals with data with graph structure. Graph database management systems (GDBMSs) are suitable for managing data in the digital library because the data size is large and its structure is complex. However, when performing a non-simple search or analysis on a graph, GDBMSs cannot avoid reaching already-scanned nodes from different starting nodes by repeatedly traversing edges such as property paths pattern in SPARQL. Therefore, when a GDBMS reaches high degree nodes, the number of graph traversals increases in proportion to the number of its adjacent nodes. Consequently, the cost of traversing multiple paths extremely increases affected by nodes connected enormous the number of edges in conventional GDBMSs. In this paper, we propose a data access approach by repeatedly traversing edges belonging to a specific relationship or anything one while distinguishing between high degree nodes and low degree ones. Finally, a result of our experiment indicated our approach can increase the speed of repeat traversals by a factor of a maximum of ten.

DOI： 10.1007/978-3-031-21756-2_29

Web of Science

Scopus
Action Semantic Alignment for Image Captioning

Huo, D; Kastner, MA; Komamizu, T; Ide, I

2022 IEEE 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 頁： 194 - 197 2022年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings 5th International Conference on Multimedia Information Processing and Retrieval Mipr 2022

Image captioning is one of the main goals in vision and language processing, which aims to generate proper descriptions of images. Recently, the attention mechanisms became crucial in captioning tasks, as they can capture global dependencies between modalities. Moreover, some works have used objects detected from the input image as anchor points, so called object tags, to ease such alignments resulting in good performance for this task. In this paper, we newly introduce action information as a prior to further improve this, by adding action tags for training. The action tags can learn alignment at action semantic level and catch the previously ignored dimension of action, that could be very important in image captioning. We found that training with action tags can be used to describe images in a dynamic style. Furthermore, we found it can actually lead to a significant improvement compared with other methods in captioning performance measured by common metrics.

DOI： 10.1109/MIPR54900.2022.00041

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mipr/mipr2022.html#HuoKKI22
An Ensemble Framework of Multi-ratio Undersampling-based Imbalanced Classification 査読有り Open Access

駒水孝裕

Journal of Data Intelligence 2 (1) 巻頁： 30 - 46 2021年

　詳細を見る

Open Access

CiNii Research
FPX-G: First Person Exploration for Graph

Komamizu T., Ito S., Ogawa Y., Toyama K.

Proceedings 4th International Conference on Multimedia Information Processing and Retrieval Mipr 2021 頁： 70 - 76 2021年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Proceedings 4th International Conference on Multimedia Information Processing and Retrieval Mipr 2021

Data exploration is a fundamental user task in the information seeking process. In data exploration, users have ambiguous information needs, and they traverse across the data for gathering information. In this paper, a novel data exploration system, called FPX-G, is proposed that uses virtual reality (VR) technology. VR-based data exploration (or immersive analytics) is a recent trend in data analytics, and the existing work approaches involve aggregated information in an interactive and 3D manner. However, exploration for individual pieces of data scarcely has been approached. Traditional data exploration is done on 2D displays, therefore space is limited, and there is no depth. FPX-G fully utilizes 3D space to make individual piece of data visible in the user’s line of sight. In this paper, the data structure in FPX-G is designed as a graph, and the data exploration process is modeled as graph traversal. To utilize the capability of VR, FPX-G provides a first person view-based interface from which users can look at individual pieces of data and can walk through the data (like walking in a library). In addition to the walking mechanism, to deal with limited physical space in a room, FPX-G introduces eye-tracking technology for traversing data through a graph. A simulation-based evaluation reveals that FPX-G provides a significantly efficient interface for exploring data compared with the traditional 2D interface.

DOI： 10.1109/MIPR51284.2021.00018

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/mipr/mipr2021.html#KomamizuIOT21
Evaluation Scheme of Focal Translation for Japanese Partially Amended Statutes

Yamakoshi T., Komamizu T., Ogawa Y., Toyama K.

WAT 2021 - 8th Workshop on Asian Translation, Proceedings of the Workshop 頁： 124 - 132 2021年

　詳細を見る

出版者・発行元：WAT 2021 - 8th Workshop on Asian Translation, Proceedings of the Workshop

For updating the translations of Japanese statutes based on their amendments, we need to consider the translation “focality;” that is, we should only modify expressions that are relevant to the amendment and retain the others to avoid misconstruing its contents. In this paper, we introduce an evaluation metric and a corpus to improve focality evaluations. Our metric is called an Inclusive Score for DIfferential Translation: (ISDIT). ISDIT consists of two factors: (1) the n-gram recall of expressions unaffected by the amendment and (2) the n-gram precision of the output compared to the reference. This metric supersedes an existing one for focality by simultaneously calculating the translation quality of the changed expressions in addition to that of the unchanged expressions. We also newly compile a corpus for Japanese partially amendment translation that secures the focality of the post-amendment translations, while an existing evaluation corpus does not. With the metric and the corpus, we examine the performance of existing translation methods for Japanese partially amendment translations.

DOI： 10.18653/v1/2021.wat-1.12

Scopus
Combining Multi-ratio Undersampling and Metric Learning for Imbalanced Classification.

Takahiro Komamizu

Journal of Data Intelligence 2 巻 ( 4 ) 頁： 462 - 474 2021年

　詳細を見る

掲載種別：研究論文（学術雑誌）

DOI： 10.26421/JDI2.4-5
Random walk-based entity representation learning and re-ranking for entity search

Komamizu, T

KNOWLEDGE AND INFORMATION SYSTEMS 62 巻 ( 8 ) 頁： 2989 - 3013 2020年8月

　詳細を見る

掲載種別：研究論文（学術雑誌）出版者・発行元：Knowledge and Information Systems

Linked Data (LD) has become a valuable source of factual records, and entity search is a fundamental task in LD. The task is, given a query consisting of a set of keywords, to retrieve a set of relevant entities in LD. The state-of-the-art approaches for entity search are based on information retrieval techniques. This paper first examines these approaches with a traditional evaluation metric, recall@k, to reveal their potential for improvement. To obtain evidence for the potentials, an investigation is carried out on the relationship between queries and answer entities in terms of path lengths on a graph of LD. On the basis of the investigation, learning representations of entities are dealt with. The existing methods of entity search are based on heuristics that determine relevant fields (i.e., predicates and related entities) to constitute entity representations. Since the heuristics require burdensome human decisions, this paper is aimed at removing the burden with a graph proximity measurement. To this end, in this paper, RWRDoc is proposed. It is an RWR (random walk with restart)-based representation learning method that learns representations of entities by using weighted combinations of representations of reachable entities w.r.t. RWR. RWRDoc is mainly designed to improve recall scores; therefore, as shown in experiments, it lacks capability in ranking. In order to improve the ranking qualities, this paper proposes a personalized PageRank-based re-ranking method, PPRSD (Personalized PageRank-based Score Distribution), for the retrieved results. PPRSD distributes relevance scores calculated by text-based entity search methods in a personalized PageRank manner. Experimental evaluations showcase that RWRDoc can improve search qualities in terms of recall@1000 and PPRSD can compensate for RWRDoc’s insufficient ranking capability, and the evaluations confirmed this compensation.

DOI： 10.1007/s10115-020-01445-4

Web of Science

Scopus
ランダムフォレストを用いた法令用語の校正

山腰貴大, 小川泰弘, 駒水孝裕, 外山勝彦

人工知能学会論文誌 35 巻 ( 1 ) 頁： H-J53_1 - 14 2020年1月

　詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：一般社団法人人工知能学会

We propose a method that assists legislation drafters in finding inappropriate use of Japanese legal terms and their corrections from Japanese statutory sentences. In particular, we focus on sets of similar legal terms whose usages are strictly defined in legislation drafting rules that have been established over the years. In this paper, we first define input and output of legal term correction task. We regard it as a special case of sentence completion test with multiple choices. Next, we describe a legal term correction method for Japanese statutory sentences. Our method predicts suitable legal terms using Random Forest classifiers. The classifiers in our method use adjacent words to a target legal term as input features, and are optimized in various parameters including the number of adjacent words to be used for each legal term set. We conduct an experiment using actual statutory sentences from 3,983 existing acts and cabinet orders that consist of approximately 47M words in total. As for legal term sets, we pick 27 sets from legislation drafting manuals. The experimental result shows that our method outperformed existing modern word prediction methods using neural language models and that each Random Forest classifier utilizes characteristics of its corresponding legal term set.

DOI： 10.1527/tjsai.h-j53

Scopus

CiNii Research
Japanese mistakable legal term correction using infrequency-aware bert classifier Open Access

Yamakoshi T., Komamizu T., Ogawa Y., Toyama K.

Transactions of the Japanese Society for Artificial Intelligence 35 巻 ( 4 ) 頁： 1 - 17 2020年

　詳細を見る

記述言語：英語出版者・発行元：Transactions of the Japanese Society for Artificial Intelligence

We propose a method to assist legislative drafters that locates inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on BERT (Bidirectional Encoder Representations from Transformers). The BERT classifier is pretrained with a huge number of whole sentences; thus, it contains abundant linguistic knowledge. Classifiers for predicting legal terms suffer from two-level infrequency: term-level infrequency and set-level infrequency. The former causes a class imbalance problem and the latter causes an underfitting problem; both degrade classification performance. To overcome these problems, we apply three techniques, namely, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. The preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, the repetitive soft undersampling overcomes term-level infrequency, and the classifier unification overcomes set-level infrequency while saving storage consumption. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or language models, and that all three training techniques improve performance.

DOI： 10.1527/tjsai.E-K25

Open Access

Scopus

CiNii Research
事前学習モデルBERTによる法令用語の校正

山腰貴大, 駒水孝裕, 小川泰弘, 外山勝彦

人工知能学会全国大会論文集 2020 巻 ( 0 ) 頁： 4P3OS805 - 4P3OS805 2020年

　詳細を見る

出版者・発行元：一般社団法人人工知能学会

<p>法令文書には，「者」「物」「もの」や「規定」「規程」のように互いに類似している法令用語が出現する．このような法令用語は，法制執務（法令の起草・制定・改廃など法令文書の作成・管理に関する業務）の慣習や規則によって使用法とともに定義されている．法令において，これらの法令用語はそれに従い，厳密に書き分ける必要がある．契約書や約款などの広義の法令文書においても，誤解を防ぐために，法令に準じて正しく書き分けることが望ましい．そこで，本研究では，与えられた法令文から法令用語を検出し，誤用と思われるものに対してその修正案を出力することにより，法令文書の作成を支援する手法を提案する．本手法では，このタスクを選択肢付き穴埋め問題とみなし，分類器により解決する．分類器は，一般文によって事前学習したBERTモデルから構築する．このとき，（１）法令文によるドメイン適応，（２）訓練データのアンダーサンプリング，（３）分類器の統一の三つの工夫を施すことにより性能向上を図る．実験の結果，ランダムフォレストやニューラル言語モデルによる分類器よりも本手法の方が高い性能を発揮することを明らかにした．</p>

DOI： 10.11517/pjsai.JSAI2020.0_4P3OS805
Exploring Relevant Parts Between Legal Documents Using Substructure Matching 査読有り

Komamizu, T; Fujioka, K; Ogawa, Y; Toyama, K

NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2019 12331 巻頁： 5 - 19 2020年

　詳細を見る

記述言語：英語掲載種別：研究論文（その他学術会議資料等）出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Legal documents are typically hierarchically structured. This paper focuses on ordinances and rules (OR documents for short) in the local governments, which are designed for social lives under the governments. OR documents are composed of provisions for social lives in various aspects such as healthy development of youths and landscape preservation. OR documents in different local governments share common provisions but also include different provisions depending on their social situations. There is a large demand on helping governmental officers draft OR documents, especially searching “relevant parts” of OR documents. To help drafting OR documents, this paper designs the relevancy of OR documents with two basic measurements; matching ratio and provision commonality. Based on the relevancy, this paper develops a structured document search algorithm for OR documents. Experimental evaluation on real OR documents in Japan demonstrates that the proposed algorithm successfully discovers relevant parts of OR documents.

DOI： 10.1007/978-3-030-58790-1_1

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/jsai/jsai2019w.html#KomamizuFOT19
MUEnsemble: Multi-ratio Undersampling-Based Ensemble Framework for Imbalanced Data

Komamizu, T; Uehara, R; Ogawa, Y; Toyama, K

DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2020, PT II 12392 巻頁： 213 - 228 2020年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Class imbalance is commonly observed in real-world data, and it is still problematic in that it hurts classification performance due to biased supervision. Undersampling is one of the effective approaches to the class imbalance. The conventional undersampling-based approaches involve a single fixed sampling ratio. However, different sampling ratios have different preferences toward classes. In this paper, an undersampling-based ensemble framework, MUEnsemble, is proposed. This framework involves weak classifiers of different sampling ratios, and it allows for a flexible design for weighting weak classifiers in different sampling ratios. To demonstrate the principle of the design, in this paper, three quadratic weighting functions and a Gaussian weighting function are presented. To reduce the effort required by users in setting parameters, a grid search-based parameter estimation automates the parameter tuning. An experimental evaluation shows that MUEnsemble outperforms undersampling-based methods and oversampling-based state-of-the-art methods. Also, the evaluation showcases that the Gaussian weighting function is superior to the fundamental weighting functions. In addition, the parameter estimation predicted near-optimal parameters, and MUEnsemble with the estimated parameters outperforms the state-of-the-art methods.

DOI： 10.1007/978-3-030-59051-2_14

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/dexa/dexa2020-2.html#KomamizuUOT20
SPARQL with XQuery-based Filtering.

Takahiro Komamizu

CoRR abs/2009.06194 巻 2020年

　詳細を見る

掲載種別：研究論文（学術雑誌）

その他リンク： https://dblp.uni-trier.de/db/journals/corr/corr2009.html#abs-2009-06194
SPARQL with XQuery-based filtering

Komamizu T.

Ceur Workshop Proceedings 2721 巻頁： 69 - 73 2020年

　詳細を見る

出版者・発行元：Ceur Workshop Proceedings

Linked Open Data (LOD) has been proliferated over various domains, however, there are still lots of open data in various format other than RDF. Document-centric XML data are such open data that are connected with entities in LOD as supplemental documents for these entities. To utilize document-centric XML data linked from entities in LOD, in this paper, a SPARQL-based seamless access method on RDF and XML data is proposed. In particular, an extension to SPARQL, XQueryFILTER, which enables XQuery as a filter in SPARQL is proposed. For efficient query processing of the combination of SPARQL and XQuery, a query optimization is proposed. Experimental scenarios using real-world data showcase the effectiveness of XQueryFILTER and optimization efficiency.

Scopus
Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier" 査読有り

Takahiro Yamakoshi, Takahiro Komamizu, Yasuhiro Ogawa, Katsuhiko Toyama

the 3rd Annual Workshop on Applications of Artificial Intelligence in the Legal Industry (LegalAI 2019) 頁： 4342-4351 2019年12月

　詳細を見る

記述言語：英語
Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier Matching 査読有り

Takahiro Yamakoshi, Takahiro Komamizu, Yasuhiro Ogawa, Katsuhiko Toyama

Proc. 3rd Annual Workshop on Applications of Artificial Intelligence in the Legal Industry 頁： - - 4351 2019年12月

　詳細を見る

記述言語：英語掲載種別：研究論文（その他学術会議資料等）出版者・発行元：Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

We propose a method that assists legislative drafters in locating inappropriate legal terms in Japanese statutory sentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms using a classifier based on a BERT (Bidirectional Encoder Representations from Transformers) model. We apply three techniques in training the BERT classifier, specifically, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. These techniques cope with two levels of infrequency: legal term-level infrequency that causes class imbalance and legal term set-level infrequency that causes underfitting. Concretely, preliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences, repetitive soft undersampling improves performance on infrequent legal terms without sacrificing performance on frequent legal terms, and classifier unification improves performance on infrequent legal term sets by sharing common knowledge among legal term sets. Our experiments show that our classifier outperforms conventional classifiers using Random Forest or a language model, and that all three training techniques contribute to performance improvement.

DOI： 10.1109/BigData47090.2019.9006511

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/bigdataconf/bigdataconf2019.html#YamakoshiKOT19
Exploring Relevant Parts between Legal Documents using Substructure Matching 査読有り

Takahiro Komamizu, Kazuya Fujioka, Yasuhiro Ogawa, Katsuhiko Toyama

the Thirteenth International Workshop on Juris-informatics (JURISIN 2019) 頁： 16-28 2019年11月

　詳細を見る

担当区分：筆頭著者記述言語：英語
言い換えによる自然言語－SPARQL対訳コーパスの拡張

李偉嘉, 小川泰弘, 駒水孝裕, 外山勝彦

第17回情報学ワークショップ論文集頁： - 2019年11月

　詳細を見る

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）
Analyzing Japanese Law History through Modeling Multi-versioned Entity 査読有り

Takahiro Komamizu, Yushi Uchida, Yasuhiro Ogawa, Katsuhiko Toyama

the 2nd International Workshop on Contextualized Knowledge Graphs (CKG 2019) 頁： - 2019年10月

　詳細を見る

担当区分：筆頭著者記述言語：英語
利用規約中の不公平文の自動検出

青山恵子, 小川泰弘, 駒水孝裕, 外山勝彦

第15回テキストアナリティクス・シンポジウム NLC2019-8(2019-9) 頁： 1-6 2019年9月

　詳細を見る

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）
Thai Legal Term Correction using Random Forests with Outside-the-sentence Features 査読有り

Takahiro Yamakoshi, Vee Satayamas, Hutchatai Chanlekha, Yasuhiro Ogawa, Takahiro Komamizu, Asanee Kawtrakul, Katsuhiko Toyama

the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33) 頁： 161-170 2019年9月

　詳細を見る

記述言語：英語
弱分類器の調整に基づく不均衡データ向けアンサンブル・フレームワーク査読有り Open Access

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

第12回Webとデータベースに関するフォーラム頁： 81-84 2019年9月

　詳細を見る

記述言語：日本語

Open Access
共通BERT分類器による紛らわしい法令用語の校正

山腰貴大, 駒水孝裕, 小川泰弘, 外山勝彦

言語処理学会NLP若手の会第14回シンポジウム頁： - 2019年8月

　詳細を見る

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）
nagoy Team's Summarization System at the NTCIR-14 QA-Lab PoliInfo 査読有り

Yasuhiro Ogawa, Michiaki Satou, Takahiro Komamizu, Katsuhiko Toyama

the Fourteenth NTCIR conference (NTCIR-14) , Revised Selected Papers 頁： 110-121 2019年6月

　詳細を見る

記述言語：英語
法律の要約のためのランダムフォレストを用いた重要文抽出 Open Access

小川泰弘, 佐藤充晃, 駒水孝裕, 外山勝彦

人工知能学会全国大会(第33回)論文集 JSAI2019 巻 ( 0 ) 頁： 4E2OS7a02 - 4E2OS7a02 2019年6月

　詳細を見る

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：一般社団法人人工知能学会

本研究の目標は，日本法令の要約を提供することである．そのためにランダムフォレストによる重要文抽出に基づく自動要約を提案する．従来の自動要約に関する研究においては，原文書の情報のみが用いられてきた．近年では機械学習に基づく手法なども提案されている．しかし，そうした機械学習において利用される学習データの量は，特に日本語においては，充分でなかった．それに対し，本研究の法令の要約においては，政府が作成する「法令のあらまし」を利用することにより，この問題を解決する．さらに，従来利用されてきた決定木やSVMを使った手法に代えて，ランダムフォレストを用いた重要文抽出を提案し，その性能が従来手法を上回ることを示す．本論文の貢献は，従来よりもサイズの大きな要約用コーパスを作成した点と，重要文抽出におけるランダムフォレストの有効性を確認した点にある．

DOI： 10.11517/pjsai.jsai2019.0_4e2os7a02

Open Access

CiNii Research
Detecting Communities and Correlated Attribute Clusters on Multi-Attributed Graphs 査読有り Open Access

Ito, H; Komamizu, T; Amagasa, T; Kitagawa, H

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E102D 巻 ( 4 ) 頁： 810 - 820 2019年4月

　詳細を見る

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：IEICE Transactions on Information and Systems

Multi-attributed graphs, in which each node is characterized by multiple types of attributes, are ubiquitous in the real world. Detection and characterization of communities of nodes could have a significant impact on various applications. Although previous studies have attempted to tackle this task, it is still challenging due to difficulties in the integration of graph structures with multiple attributes and the presence of noises in the graphs. Therefore, in this study, we have focused on clusters of attribute values and strong correlations between communities and attribute-value clusters. The graph clustering methodology adopted in the proposed study involves Community detection, Attribute-value clustering, and deriving Relationships between communities and attribute-value clusters (CAR for short). Based on these concepts, the proposed multi-attributed graph clustering is modeled as CAR-clustering. To achieve CAR-clustering, a novel algorithm named CARNMF is developed based on non-negative matrix factorization (NMF) that can detect CAR in a cooperative manner. Results obtained from experiments using real-world datasets show that the CARNMF can detect communities and attribute-value clusters more accurately than existing comparable methods. Furthermore, clustering results obtained using the CARNMF indicate that CARNMF can successfully detect informative communities with meaningful semantic descriptions through correlations between communities and attribute-value clusters.

DOI： 10.1587/transinf.2018DAP0022

Open Access

Web of Science

Scopus

CiNii Research
Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier

Yamakoshi, T; Komamizu, T; Ogawa, Y; Toyama, K

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) 頁： 4342 - 4351 2019年

　詳細を見る

DOI： 10.1109/BigData47090.2019.9006511

Web of Science
Graph Analytical Re-ranking for Entity Search

Komamizu T.

Ceur Workshop Proceedings 2482 巻 2019年

　詳細を見る

出版者・発行元：Ceur Workshop Proceedings

Entity search is a fundamental task in Linked Data (LD). The task is, given a keyword search query, to retrieve a set of entities in LD which are relevant to the query. The state-of-the-art approaches for entity search are based on information retrieval technologies such as TF-IDF vectorization and ranking models. This paper examines the approaches by applying a traditional evaluation metrics, recall@k, and shows ranking qualities still room left for improvements. In order to improve the ranking qualities, this paper explores possibilities of graph analytical methods. LD is regarded as a large graph, graph analytical approaches are therefore appropriate for this purpose. Since query-based graph analytical approaches fit to entity search tasks, this paper proposes a personalized PageRank-based re-ranking method, PPRSD (Personalized PageRank based Score Distribution), for retrieved results by the state-of-the-art. The experimental evaluation recognizes improvements but its results are not satisfactory, yet. For further improvements, this paper reports investigations about relationship between queries and entities in terms of path lengths on the graph, and discusses future directions for graph analytical approaches.

Scopus
Analyzing Japanese law history through modeling multi-versioned entity

Komamizu T., Uchida Y., Ogawa Y., Toyama K.

Ceur Workshop Proceedings 2599 巻 2019年

　詳細を見る

出版者・発行元：Ceur Workshop Proceedings

As law is a blueprint of a society and is changed over time as social environments changed, analyzing histories (change provenances) of laws can reveal important facts such as legislative facts and critical events for the society. Linked Open Data (LOD) has emerged as a preferred method for publishing and sharing open data, however, there is an ontological barrier for publishing law history data as LOD. To break through the barrier, this paper proposes an ontology for law history data of the Japanese statute law. The ontology is inspired from PROV-O and SIOC ontologies. The LOD dataset based on the proposed ontology enables wide variety of analyses on the law history data by simple SPARQL queries. The analyses include simple search, visualization, temporal analysis, data mining, etc. This paper displays parts of the analyses which indicate several legislative facts behind changes of laws. The analyses demonstrate the proposed ontology and LOD dataset are useful for legal data analysis. The proposed ontology is comparable with ELI (European Legislation Identifier) which is designed for EU laws, this paper thus discusses the comparability and future directions of the proposed ontology.

Scopus
nagoy Team's Summarization System at the NTCIR-14 QA Lab-PoliInfo 査読有り

Yasuhiro Ogawa, Michiaki Satou, Takahiro Komamizu, Katsuhiko Toyama

Post-conference Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies 11966 LNCS 巻頁： to appear - 121 2019年

　詳細を見る

記述言語：英語掲載種別：研究論文（その他学術会議資料等）出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

The nagoy team participated in the NTCIR-14 QA Lab-PoliInfo’s summarization subtask. This paper describes our summarization system for assembly member speeches using random forest classifiers. Since we encountered an imbalance in the data, we were unable to achieve good results in this subtask when training on all data. To solve this problem, we developed a new summarization system that applies multiple random forest classifiers training on different-sized data sets step by step. As a result, our system achieved good performance, especially in the evaluation by ROUGE scores. In this paper, we also compare our system with a single random forest classifier using probability.

DOI： 10.1007/978-3-030-36805-0_9

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/ntcir/ntcir2019.html#OgawaSKT19
Thai legal term correction using random forests with outside-the-sentence features

Yamakoshi T., Satayamas V., Chanlekha H., Ogawa Y., Komamizu T., Kawtrakul A., Toyama K.

Proceedings of the 33rd Pacific Asia Conference on Language Information and Computation Paclic 2019 頁： 279 - 287 2019年

　詳細を見る

出版者・発行元：Proceedings of the 33rd Pacific Asia Conference on Language Information and Computation Paclic 2019

We propose a method for finding and correct- ing misused Thai legal terms in Thai statu- tory sentences. Our method predicts legal terms using Random Forest classifiers, each of which is optimized for each set of similar legal terms. Each classifier utilizes outside- the-sentence features, namely, promulgation year, title keywords, and section keywords of statutes, in addition to words adjacent to the targeted legal term. Our experiment shows that our method outperformed not only a Ran- dom Forest method without the outside-the- sentence features, but also BERT (Bidirec- tional Encoder Representations from Trans- formers), a powerful language representation model, in overall accuracy.

Scopus
Graph Analytical Re-ranking for Entity Search

Takahiro Komamizu

Proceedings of the 1st International Workshop on EntitY REtrieval (EYRE 2018) 頁： (to appear) 2018年10月

　詳細を見る

記述言語：英語
Learning Interpretable Entity Representation in Linked Data 査読有り

Takahiro Komamizu

Proc. the 29th International Conference on Database and Expert Systems Applications 頁： 153-168 2018年9月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1007/978-3-319-98809-2_10
Community Detection and Correlated Attribute Cluster Analysis on Multi-Attributed Graphs 査読有り

Hiroyoshi Ito, Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 2nd International workshop on Data Analytics solutions for Real-LIfe APplications (DARLI-AP 2018) co-located with the 21st International Conference on Extending Database Technology (EDBT 2018) 2018年3月

　詳細を見る

記述言語：英語

DOI： 2-9
Network-Word Embedding for Dynamic Text Attributed Networks 査読有り

Hiroyoshi Ito, Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 6th International Workshop on Semantic Computing for Social Networks and Organization Sciences (SCSN 2018) co-located with the 12th IEEE International Conference on Semantic Computing (ICSC 2018) 2018年1月

　詳細を見る

記述言語：英語

DOI： 334-339
Analytical toolbox for smart city applications: Garbage collection log use case 査読有り

Takahiro Komamizu, Jin Nakazawa, Toshiyuki Amagasa, Hiroyuki Kitagawa, Hideyuki Tokuda

Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 2018- 巻頁： 4105 - 4110 2018年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Institute of Electrical and Electronics Engineers Inc.

Analyzing and feeding back the results on real-world services are important missions in the Big Data era to realize smart city. However, analyzing real-world data is still challenging because of dirtiness of data and large variety of analytic requirements. To cope with the challenges, this paper proposes and develops an analytical toolbox for smart city applications. The analytical toolbox consists of three phases: preparation, analysis, and visualization. The preparation phase deals with the dirtiness of the data by including fundamental data cleansing techniques and data integration techniques. The analysis phase is responsible for ETL (extract, transform and load) process and analytical query processing from the next phase. The visualization phase deals with analytical requirements from users and visualization of analytical results. This paper showcases a real-world use case of the proposed analytical toolbox. The use case is now open in public with help of Fujisawa city, Japan, and this fact indicates that the proposed analytical toolbox is feasible for real-world data analysis and feeding back to citizens.

DOI： 10.1109/BigData.2017.8258429

Scopus
Implicit order join: Joining log data with property data by discovering implicit order-oriented keys with human assistance 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 2018- 巻頁： 4400 - 4406 2018年1月

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Institute of Electrical and Electronics Engineers Inc.

Data integration is still laboursome task when integrating data are not consistently managed. Such inconsistency can happen easily in real-world situations, such as properties of objects are managed by a central organization and trajectories (or logs) of the objects are recorded by other peripheral organizations. This paper deals with a case of missing ordering information. Integrating property data and log data without ordering information causes duplicated results. In order to solve this problem, this paper proposes a join algorithm, called implicit order join, which discovers implicit ordering information from both property data and log data with help of partial true integrated results from human assistance. With the discovered ordering information, the implicit order join enables to integrate the property data and log data. In order to discover the implicit ordering information, ordering correlation between attribute sequences of property data and log data should be found from comprehensive examination of possible attribute sequence pairs. The potential number of sequence pairs is as high as factorial order of the number of attributes. Therefore, this paper develops a heuristic approach to prune unnecessary examinations based on ordering dependency between attribute sequences. Experimental evaluation in this paper indicates that implicit order join can reduce 77% labouring tasks for integration and the pruning method reduces the number of attribute sequences in orders of magnitude.

DOI： 10.1109/BigData.2017.8258474

Scopus
Japanese Legal Term Correction Using Random Forests 査読有り

Yamakoshi, T; Komamizu, T; Ogawa, Y; Toyama, K

LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2018) 313 巻頁： 161 - 170 2018年

　詳細を見る

記述言語：英語出版者・発行元：Frontiers in Artificial Intelligence and Applications

We propose a method that assists legislation officers in finding inappropriate Japanese legal terms in Japanese statutory sentences and suggests corrections. In particular, we focus on sets of similar legal terms whose usages are defined in legislation drafting rules. Our method predicts suitable legal terms in statutory sentences using Random Forest classifiers, each of which is optimized for each set of similar legal terms. Our experiment shows that our method outperformed existing modern word prediction methods using neural language models.

DOI： 10.3233/978-1-61499-935-5-161

Web of Science

Scopus
Learning Interpretable Entity Representation in Linked Data

Komamizu, T

DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I 11029 巻頁： 153 - 168 2018年

　詳細を見る

掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

Linked Data has become a valuable source of factual records. However, because of its simple representations of records (i.e., a set of triples), learning representations of entities is required for various applications such as information retrieval and data mining. Entity representations can be roughly classified into two categories; (1) interpretable representations, and (2) latent representations. Interpretability of learned representations is important for understanding relationship between two entities, like why they are similar. Therefore, this paper focuses on the former category. Existing methods are based on heuristics which determine relevant fields (i.e., predicates and related entities) to constitute entity representations. Since the heuristics require laboursome human decisions, this paper aims at removing the labours by applying a graph proximity measurement. To this end, this paper proposes RWRDoc, an RWR (random walk with restart)-based representation learning method which learns representations of entities by weighted combinations of minimal representations of whole reachable entities w.r.t. RWR. Comprehensive experiments on diverse applications (such as ad-hoc entity search, recommender system using Linked Data, and entity summarization) indicate that RWRDoc learns proper interpretable entity representations.

DOI： 10.1007/978-3-319-98809-2_10

Web of Science

Scopus

その他リンク： https://dblp.uni-trier.de/db/conf/dexa/dexa2018-1.html#Komamizu18
CROISSANT: Centralized Relational Interface for Web-scale SPARQL Endpoints 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 19th International Conference on Information Integration and Web-based Applications & Services (iiWAS2017) 2017年12月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 284-288
SOLA: Stream OLAP-based Analytical Framework for Roadway Maintenance 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Salman Ahmed Shaikh, Hiroaki Shiokawa, Hiroyuki Kitagawa

Proc. the 9th International Conference on Management of Digital EcoSystems (MEDES 2017) 2017年11月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 35-42
GitHubとStack Overflowの開発者の活動記録を併用したリポジトリ推薦査読有り Open Access

永野真知, 早瀬康裕, 駒水孝裕, 北川博之

ソフトウェアエンジニアリングシンポジウム 2017論文集 2017年8月

　詳細を見る

記述言語：日本語

DOI： 138-145

Open Access
FORK: Feedback-Aware ObjectRank-Based Keyword Search over Linked Data 査読有り

Takahiro Komamizu, Sayami Okumura, Toshiyuki Amagasa, Hiroyuki Kitagawa

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10648 巻頁： 58 - 70 2017年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Springer Verlag

Ranking quality for keyword search over Linked Data (LD) is crucial when users look for entities from LD, since datasets in LD have complicated structures as well as much contents. This paper proposes a keyword search method, FORK, which ranks entities in LD by ObjectRank, a well-known link-structure analysis algorithm that can deal with different types of nodes and edges. The first attempt of applying ObjectRank to LD search reveals that ObjectRank with inappropriate settings gives worse ranking results than PageRank which is equivalent to ObjectRank with all the same authority transfer weights. Therefore, deriving appropriate authority transfer weights is the most important issue for encouraging ObjectRank in LD search. FORK involves a relevance feedback algorithm to modify the authority transfer weights according with users’ relevance judgements for ranking results. The experimental evaluation of ranking qualities using an entity search benchmark showcases the effectiveness of FORK, and it proves ObjectRank is more feasible raking method for LD search than PageRank and other comparative baselines including information retrieval techniques and graph analytic methods.

DOI： 10.1007/978-3-319-70145-5_5

Scopus
Exploring identical users on GitHub and stack overflow 査読有り

Takahiro Komamizu, Yasuhiro Hayase, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE 頁： 584 - 589 2017年

　詳細を見る

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Knowledge Systems Institute Graduate School

Analyzing behaviours of developers in different platforms (in particular, GitHub and Stack Overflow in this paper) can reveal interesting facts related to development activities. There are only few datasets for analysing crossplatform user behaviours, especially across GitHub and Stack Overflow. Users on GitHub and Stack Overflow are identifiable by equivalences of email addresses. In order to increase the number of identifiable users on these datasets, this paper retrieves potentially identifiable users between GitHub and Stack Overflow not relying only on email addresses. This paper employs a classification-based link prediction, which design the user identification problem as a link prediction problem on the bipartite graph consisting of users of GitHub and those of Stack Overflow. With the identification method, this paper generates a probabilistic dataset containing pairs of users with probabilities (or confidences). This paper, as well, publishes the identification tool in order to enable further data generation on appearing datasets of GitHub, Stack Overflow and others. The generated dataset and tool are highly helpful to accelerate researches on mining software repositories.

DOI： 10.18293/SEKE2017-109

Scopus
Towards Real-time Analysis of Smart City Data: A Case Study on City Facility Utilizations 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Salman Ahmed Shaikh, Hiroaki Shiokawa, Hiroyuki Kitagawa

Proc. the 14th IEEE International Conference on Smart City (SmartCity 2016) 頁： 1357-1364 2016年12月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1109/HPCC-SmartCity-DSS.2016.0192
Interleaving Clustering of Classes and Properties for Disambiguating Linked Data 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 18th International Conference on Asia-Pacific Digital Libraries (ICADL 2016) 2016年12月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 251-256
Visual Spatial-OLAP for Vehicle Recorder Data on Micro-sized Electric Vehicles 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 20th International Database Engineering & Applications Symposium (IDEAS 2016) 頁： 358-363 2016年7月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1145/2938503.2938532
H-SPOOL: A SPARQL-based ETL Framework for OLAP over Linked Data with Dimension Hierarchy Extraction 招待有り査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

12 巻 ( 3 ) 頁： 359-378 2016年6月

　詳細を見る

担当区分：筆頭著者記述言語：英語掲載種別：研究論文（学術雑誌）

DOI： 10.1108/IJWIS-03-2016-0014
SPOOL: A SPARQL-based ETL Framework for OLAP over Linked Data 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 17th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2015) ( 49 ) 頁： 1-10 2015年12月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1145/2837185.2837230
Facet-value Extraction Scheme from Textual Contents in XML Data 招待有り査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

11 巻 ( 3 ) 頁： 270-290 2015年6月

　詳細を見る

担当区分：筆頭著者記述言語：英語掲載種別：研究論文（学術雑誌）

DOI： 10.1108/IJWIS-04-2015-0012
Extracting Facets from Textual Contents for Faceted Search over XML Data 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 16th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2014) 頁： 420-429 2014年12月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1145/2684200.2684294
Frequent-Pattern based Facet Extraction from Graph Data 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 17th International Conference on Network-Based Information Systems (NBiS 2014) 頁： 318-323 2014年9月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1109/NBiS.2014.77
A Scheme of Automated Object and Facet Extraction for Faceted Search over XML Data 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 18th International Database Engineering & Applications Symposium (IDEAS 2014) 頁： 338-341 2014年7月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1145/2628194.2628241
A Scheme of Fragment-Based Faceted Image Search 査読有り

Takahiro Komamizu, Mariko Kamie, Kazuhiro Fukui, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 23rd International Conference on Database and Expert Systems Applications (DEXA 2012) 頁： 450-457 2012年9月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1007/978-3-642-32597-7_40
Faceted Navigation Framework for XML Data 招待有り査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

8 巻 ( 4 ) 頁： 348-370 2012年8月

　詳細を見る

担当区分：筆頭著者記述言語：英語掲載種別：研究論文（学術雑誌）

DOI： 10.1108/17440081211282865
A Framework of Faceted Navigation for XML Data 査読有り

Takahiro Komamizu, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 13th International Conference on Information Integration and Web-based Applications & Services (iiWAS 2011) 頁： 28-35 2011年12月

　詳細を見る

担当区分：筆頭著者記述言語：英語

DOI： 10.1145/2095536.2095544

▼全件表示

論文の先頭へ▲

MISC 6

大学連携によるデータサイエンス人材育成の共通ガイドラインとその実践

松原, 茂樹, 中岩, 浩巳, 駒水, 孝裕, 鈴木, 優, 井手, 一郎, 西村, 訓弘, 速水, 悟, 武田, 一哉

電子情報通信学会技術研究報告 : 教育工学（ET）122 巻 ( 431 ) 頁： 117 - 127 2023年3月

　詳細を見る

記述言語：日本語出版者・発行元：電子情報通信学会

データサイエンス人材育成を大学連携で推進するための共通ガイドラインの作成とその実践について述べる。このガイドラインは、実データを用いてグループワークで課題を解決する実世界データ演習を中心とする教育プログラムを、効果的かつ効率的に提供するための指針を定めている。共通ガイドラインに基づきデータサイエンティスト育成プログラムを設計し、2019年度より大学院生及び社会人を対象に提供している。
産学コンソーシアムによるデータサイエンス人材育成の実践

松原, 茂樹, 中岩, 浩巳, 駒水, 孝裕, 鈴木, 優, 井手, 一郎, 西村, 訓弘, 速水, 悟, 武田, 一哉

情報処理学会研究報告コンピュータと教育（CE）2022-CE-167 巻 ( 16 ) 頁： 1 - 6 2022年11月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人情報処理学会

データサイエンス人材育成を産学連携で実践する教育プログラムの設計と運用について述べる。本プログラムは、大学院生および社会人を対象とし、データサイエンティストに必要な能力である「実世界データ知識」「ツール活用スキル」「異分野協業マインド」を涵養することを目的とする。企業や地方公共団体が提供するデータを用いてグループワークにより課題を解決する「実世界データ演習」を中核とするプログラムを設計した。複数の大学が産業界と連携して教育するためのガイドラインを整備することで、課題、データ、ツール、メンタなどの教育資源を大学間で共用することを可能としている。本プログラムを開講した2019年度から2021年度までの3年間で165名の修了生を輩出している。
コンピュータと教育（CE）研究発表会（2022年12月3日(土)～4日(日) 福岡工業大学 and オンライン）
法令間の関係を利用したモビリティ関連法令検索に関する一考察

駒水孝裕, 外山勝彦, 河口信夫, 佐野智也

人工知能学会第二種研究会資料2022 巻 ( SWO-057 ) 頁： 04 2022年8月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人人工知能学会

本稿では，法令間の関係を用いた法令検索について述べる．特に，モビリティに関連する法令を対象として，その検索方法および検索結果を示す．法令データのオープンデータ化は徐々に進みつつある．これまでに，法令オープンデータのハブとなる法令の Linked Open Data (LOD) としてのデータ化をはじめに，種々の法令文書，国会での会議録や議案に関するデータのオープン化が進められてきた．一方で，その応用については十分に研究されていない．本稿では，モビリティを題材に，関連する法令を検索する方法について示す．具体的には，法令に関する LOD から法令間の関係を抽出し，グラフにおける検索技術であるPersonalized PageRank を用いて，関連法令を検索する．この検索を通して，現状の法令オープンデータの限界を明らかにするとともに，今後の展開について議論する．

DOI： 10.11517/jsaisigtwo.2022.swo-057_04

CiNii Research
The Web Conference 2020 参加報告

駒水, 孝裕

情報処理61 巻 ( 10 ) 頁： 1078 - 1079 2020年9月

　詳細を見る

記述言語：日本語

CiNii Research
法令沿革 LOD 構築のための DBpedia における法令エンティティの同定

駒水孝裕, 小川泰弘, 外山勝彦

人工知能学会第二種研究会資料2020 巻 ( SWO-051 ) 頁： 06 2020年7月

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人人工知能学会

本稿では，法令沿革オントロジーの設計および DBpedia 中の法令エンティティへの紐付けについて述べる．社会のさまざまな事柄が法令に関係しているにもかかわらず，日本の法令に関するデータのオープン化はほとんど進んでいない．特に，再利用性の高い LOD (Linked Open Data) としてのオープンデータはほとんどない．本稿で提案する法令沿革オントロジーは次の2点を達成するために設計されたオントロジーである．(1) 日本の法令を網羅できる．(2) ある時点で有効であった法令（法令バージョン）を特定できる．前者は，法令に関するオープンデータの LOD としての公開を促進することがねらいである．後者は，法令における不遡及の原則や経過措置などの理由から必要である．法令の内容の変更は，その変更内容を記述する法令を施行することにより実現される．このような法令の内容変更の履歴を法令の沿革と呼ぶ．本稿では，法律を対象に設計した先行研究のオントロジーを拡張し，法律以外の法令の沿革も扱えるようにした．拡張したオントロジーをもとに，国立国会図書館・日本法令索引からデータを取得し，法令沿革 LOD を構築した．構築した LOD は，3,412,748 個のトリプルで構成され，106,341 法令を含む．外部ドメインのLODと接続するために，DBpedia 内の法令エンティティとの紐付けを行った．法令名を用いた単純な紐付けにより，99%の適合率，96%の再現率が達成可能であることを明らかにした．

DOI： 10.11517/jsaisigtwo.2020.swo-051_06

CiNii Research
事前学習モデルBERTによる法令用語の校正

山腰貴大, 駒水孝裕, 小川泰弘, 外山勝彦

人工知能学会全国大会論文集JSAI2020 巻 ( 0 ) 頁： 4P3OS805 - 4P3OS805 2020年

　詳細を見る

記述言語：日本語出版者・発行元：一般社団法人人工知能学会

法令文書には，「者」「物」「もの」や「規定」「規程」のように互いに類似している法令用語が出現する．このような法令用語は，法制執務（法令の起草・制定・改廃など法令文書の作成・管理に関する業務）の慣習や規則によって使用法とともに定義されている．法令において，これらの法令用語はそれに従い，厳密に書き分ける必要がある．契約書や約款などの広義の法令文書においても，誤解を防ぐために，法令に準じて正しく書き分けることが望ましい．そこで，本研究では，与えられた法令文から法令用語を検出し，誤用と思われるものに対してその修正案を出力することにより，法令文書の作成を支援する手法を提案する．本手法では，このタスクを選択肢付き穴埋め問題とみなし，分類器により解決する．分類器は，一般文によって事前学習したBERTモデルから構築する．このとき，（１）法令文によるドメイン適応，（２）訓練データのアンダーサンプリング，（３）分類器の統一の三つの工夫を施すことにより性能向上を図る．実験の結果，ランダムフォレストやニューラル言語モデルによる分類器よりも本手法の方が高い性能を発揮することを明らかにした．

DOI： 10.11517/pjsai.jsai2020.0_4p3os805

CiNii Research

▼全件表示

MISCの先頭へ▲

講演・口頭発表等 32

法令沿革オントロジーの設計

内田勇志, 駒水孝裕, 小川泰弘, 外山勝彦

第47回人工知能学会セマンティックウェブとオントロジー（SWO）研究会

　詳細を見る

開催年月日： 2019年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
グラフ構造を利用したエンティティ検索

駒水孝裕

第11回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2019年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
部分構造を用いた類似例規の検索

藤岡和弥, 駒水孝裕, 小川泰弘, 外山勝彦

第11回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2019年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
nagoy Team’s Summarization System at the NTCIR-14 QA Lab-PoliInfo

　詳細を見る

開催年月日： 2019年

記述言語：英語会議種別：口頭発表（一般）

DOI： 10.1007/978-3-030-36805-0_9

Scopus
並列構造の分割による法令文の読解性向上

青山恵子, 駒水孝裕, 小川泰弘, 外山勝彦

平成30年度電気・電子・情報関係学会東海支部連合大会

　詳細を見る

開催年月日： 2018年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
ランダムフォレストによる法令用語の校正

山腰貴大, 駒水孝裕, 小川泰弘, 外山勝彦

平成30年度電気・電子・情報関係学会東海支部連合大会

　詳細を見る

開催年月日： 2018年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
ニューラルモデルと翻訳メモリを併用した機械翻訳

重野泰和, 駒水孝裕, 小川泰弘, 外山勝彦

平成30年度電気・電子・情報関係学会東海支部連合大会

　詳細を見る

開催年月日： 2018年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
分類器を用いた法令要約に利用する法令文の自動抽出

佐藤充晃, 駒水孝裕, 小川泰弘, 外山勝彦

平成30年度電気・電子・情報関係学会東海支部連合大会

　詳細を見る

開催年月日： 2018年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
単語の分散表現を用いた法令用語間の関係の獲得

植原リサ, 駒水孝裕, 小川泰弘, 外山勝彦

平成30年度電気・電子・情報関係学会東海支部連合大会

　詳細を見る

開催年月日： 2018年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
Zipf の法則は例規文の出現数においても成立する

藤岡和弥, 駒水孝裕, 小川泰弘, 外山勝彦

平成30年度電気・電子・情報関係学会東海支部連合大会

　詳細を見る

開催年月日： 2018年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
グラフ集約に基づくRDFデータに対するOLAP分析

仁木美来, 天笠俊之, 駒水孝裕, 北川博之

情報処理学会第80回全国大会

　詳細を見る

開催年月日： 2018年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
ノードがテキスト情報を持つ動的ネットワークにおけるノードと単語の分散表現学習

伊藤寛祥, 駒水孝裕, 天笠俊之, 北川博之

第10回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2018年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
ゴミ減量G1グランプリ招待有り

駒水孝裕

第1回地域IoTと情報力シンポジウム

　詳細を見る

開催年月日： 2017年4月

記述言語：日本語会議種別：ポスター発表

国名：日本国
GitHubとStack Overflowにおけるユーザ行動の統一的な分析

永野真知, 早瀬康裕, 駒水孝裕, 北川博之

情報処理学会第79回全国大会

　詳細を見る

開催年月日： 2017年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
ノードが複数の属性を持つグラフにおけるコミュニティ検出

伊藤寛祥, 駒水孝裕, 天笠俊之, 北川博之

第9回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2017年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
XMLデータに対するファセット検索のためのファセット抽出の自動化

駒水孝裕, 天笠俊之, 北川博之

第13回情報科学技術フォーラム

　詳細を見る

開催年月日： 2014年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
グラフデータに対するファセット探索のための頻出パターンを利用したオブジェクト抽出手法

駒水孝裕, 天笠俊之, 北川博之

第4回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2012年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
データ工学分野における技術と研究招待有り

駒水孝裕

科目「ICT活用」

　詳細を見る

開催年月日： 2012年2月

記述言語：日本語会議種別：公開講演，セミナー，チュートリアル，講習，講義等

国名：日本国
ソフトウェア部品検索に適したファセット探索の一考察

駒水孝裕, 早瀬康裕, 北川博之

ソフトウェアサイエンス研究会

　詳細を見る

開催年月日： 2011年10月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
FACTUS: Faceted Twitter User Search Using Twitter Lists 国際会議

Takahiro Komamizu, Yuto Yamaguchi, Toshiyuki Amagasa, Hiroyuki Kitagawa

Proc. the 12th International Conference on Web Information System Engineering (WISE 2011)

　詳細を見る

開催年月日： 2011年10月

記述言語：英語会議種別：ポスター発表

国名：オーストラリア連邦
XMLデータに対するファセット検索のユーザビリティ評価

駒水孝裕, 天笠俊之, 北川博之

情報処理学会第73回全国大会

　詳細を見る

開催年月日： 2011年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
キーワード検索が可能なXMLデータに対するファセット探索

駒水孝裕, 天笠俊之, 北川博之

第3回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2011年2月 - 2011年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
異種XMLデータに対するファセット検索システムの性能評価

駒水孝裕, 天笠俊之, 北川博之

情報処理学会第72回全国大会

　詳細を見る

開催年月日： 2010年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
異種XMLデータに対するファセット検索における多様な検索

駒水孝裕, 天笠俊之, 北川博之

第2回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2010年2月 - 2010年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
異種XMLデータに対するファセット検索手法の提案

駒水孝裕, 天笠俊之, 北川博之

デジタルドキュメント研究会

　詳細を見る

開催年月日： 2009年9月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
XMLデータに対するファセットナビゲーションのためのフレームワークFoXの提案

駒水孝裕, 天笠俊之, 北川博之

第1回データ工学と情報マネジメントに関するフォーラム

　詳細を見る

開催年月日： 2009年3月

記述言語：日本語会議種別：口頭発表（一般）

国名：日本国
Thai legal term correction using random forests with outside-the-sentence features

Yamakoshi T.

Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019 2019年 Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019

　詳細を見る

Scopus
SPARQL with XQuery-based filtering

Komamizu T.

CEUR Workshop Proceedings 2020年 CEUR Workshop Proceedings

　詳細を見る

Scopus
Muensemble: Multi-ratio undersampling-based ensemble framework for imbalanced data

Komamizu T.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020年 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

　詳細を見る

Scopus
Learning Interpretable Entity Representation in Linked Data

Komamizu T.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2018年 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

　詳細を見る

Scopus
Exploring Relevant Parts Between Legal Documents Using Substructure Matching

Komamizu T.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020年 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

　詳細を見る

Scopus
Analyzing Japanese law history through modeling multi-versioned entity

Komamizu T.

CEUR Workshop Proceedings 2019年 CEUR Workshop Proceedings

　詳細を見る

Scopus

▼全件表示

講演・口頭発表等の先頭へ▲

共同研究・競争的資金等の研究課題 1

公益財団法人人工知能研究振興財団研究助成

2019年1月 - 2020年9月

共同研究・競争的資金等の研究課題の先頭へ▲

科研費 13

異種データを活用した高精度な知識抽出と提供のための情報統合基盤の研究

研究課題/研究課題番号：25K00161 2025年4月 - 2029年3月

科学研究費助成事業基盤研究(B)

駒水孝裕, 井手一郎, KASTNER MarcAurel, 石川佳治, 波多野賢治

　詳細を見る

担当区分：研究代表者

配分額：18850000円（直接経費：14500000円、間接経費：4350000円）

本研究は、オープンデータの活用と生成系AIの発展を背景に、テキスト・画像・映像など異種マルチメディアデータをLinked Open Data（LOD）の枠組みで統合・構造化し、RAG（Retrieval-Augmented Generation）による情報提供手法の高度化を目指す。特に、マルチモーダルデータを用いた知識グラフの構築とGraphRAGの実装・検証を通して、生成系AIの幻覚（Hallucination）問題に対応し、正確かつ信頼性の高い情報提供の実現を図る。異種データ間の統合的利活用技術の確立を目指す先進的な研究である。
冷戦終結と米国宇宙政策―学際融合的アプローチによる再検証―

研究課題/研究課題番号：24K00227 2024年4月 - 2027年3月

日本学術振興会科学研究費助成事業基盤研究(B)

渡邉浩崇, 養老真一, 外山勝彦, 小塚荘一郎, 上田真二, 駒水孝裕

　詳細を見る

担当区分：研究分担者

本研究は、冷戦終結が米国宇宙政策にどのような影響を与えたかを明らかにするために、政治外交史を主としながらも国際法、国内法、科学技術史、法情報学などの学際融合的アプローチによって、冷戦期、冷戦終結前後、冷戦後の米国宇宙政策を再検証するものである。米国の一次資料（政府内部文書等）を徹底して収集・分析・整理することで、米国宇宙政策の歴史と資料の一つの総括を行う。その成果を発表・共有し発展させる場として国際研究会を開催するとともに、日本宇宙政策に関して構築した「宇宙政策法文書データベース（リンクド・オープン・データ、LOD）」を、米国宇宙政策に関しても収録や関連付けを行うことで拡大・改良して公開する。
本研究は、「冷戦終結が米国宇宙政策にどのような影響を与えたか」を明らかにするために、政治外交史を主としながらも国際法、国内法、科学技術史、法情報学などの学際融合的アプローチによって、冷戦期、冷戦終結前後、冷戦後の米国宇宙政策を再検証するものである。本年度は1年目として、全体研究打合せを年2回（6月に対面、3月にオンライン）開催するとともに、公開・連絡用のウェブサイトを2月に日本語版と英語版を更新・公開して、以下の内容に取り組んだ。
まず、米国の一次資料の収集・分析・整理として、アメリカ大統領図書館と米国外交関係資料集（FRUS）に関し、ウェブサイトや関連書籍等を利用して、宇宙政策法文書の所蔵・公開状況等の調査を行った。また、日本の外務省外交史料館や国立公文書館などにおいて、日米宇宙政策に関する資料収集を引き続き行った。
次に、宇宙政策法文書データベース（リンクド・オープン・データ、LOD）の構築と公開に関して、第一段階として、すでに日本宇宙政策に関して構築した、資料の全文検索・文脈検索（Bilingual KWIC）・日英対訳表示が可能なデータベースの全体設計を確認するとともに、インターフェース（トップ画面等）の改良、収録文書の付帯情報（日付、略称・通称、種類、作成者（組織、国）、原語、翻訳言語、出典、説明、関連文書等）の編集、宇宙に関する標準対訳辞書の編集・調整等を行った。また第二段階として、資料（文書）同士の関係の可視化が可能なリンクド・オープン・データ（LOD）を構築する準備として、収録文書同士の関係の分析を進めた。
以上の研究活動・作業を行いながら、本年度の成果として、欧米諸国の状況を踏まえた「日本の宇宙保険関係法制度の形成過程」について、またこれまでの「宇宙政策法文書データベースシステムの構築」について、学会発表（ペーパー作成と口頭発表）等を行った。
本年度は、可能であれば米国への資料収集・研究打合せを実施する予定であったが、スケジュールの調整がうまくできなかったため、その準備作業として、アメリカ大統領図書館と米国外交関係資料集（FRUS）に関し、ウェブサイトや関連書籍等を利用して、宇宙政策法文書の所蔵・公開状況等の調査を行った。また、宇宙政策法文書データベース（リンクド・オープン・データ、LOD）の全体設計の確認と改良に注力することができた。これらにより今後、米国を訪問して資料収集を行うとともに、本データベースに米国宇宙政策に関する文書を収録していくことができるため、進捗状況はおおむね順調に進展していると言える。
本年度と同様に次年度（令和7年度）も、本研究組織の事務局を中心として、全体研究打合せを対面もしくはオンラインで年3回開催するとともに、公開・連絡用のウェブサイトを活用しながら、米国の一次資料の収集・分析・整理、宇宙政策法文書データベース（リンクド・オープン・データ、LOD）の構築と公開、それらを通じた論文作成や学会発表等に、引き続き取り組む予定である。
品質を保証するEnd-to-Endビッグデータ近似処理技術に関する研究

研究課題/研究課題番号：23K24850 2024年4月 - 2026年3月

日本学術振興会科学研究費助成事業基盤研究(B)

石川佳治, 杉浦健人, 駒水孝裕

　詳細を見る

担当区分：研究分担者

近年大いに着目されている近似的問合せのアプローチをビッグデータ処理のワークフロー全体に展開する，End-to-Endの近似的ビッグデータ処理の技術を確立する．コンパクトな要約情報を活用することと，ビッグデータ処理プロセスを通じて近似的データ処理の統合モデルを用いることで，従来型のビッグデータ処理に比べ大幅な速度向上を達成し，システム全体での近似品質の統一的な管理を可能とする．また，近似の品質と処理効率のトレードオフを適切に制御できることが重要であるため，本研究では求められた近似品質を満たすようにビッグデータ処理のワークフローを制御する品質駆動型の近似的データ処理技術を開発する．
本研究は，(A)近似的データ処理の統合モデルの開発，(B)機械学習を導入した近似的データ処理法の開発，(C)ビッグデータ処理システムにおけるEnd-to-End近似処理技術の開発，(D)品質駆動型の近似的データ処理技術の開発，(E)システムプロトタイプの実現と評価，の5つのサブテーマからなる．令和6年度は特に(C)と(D)を中心に研究を進めると予定をしていた．ただし，(A), (B)についても継続した研究の結果，研究の進展があった．
学術雑誌論文として採録された論文のうち2編は(A)および(C)に関わるものであり，大規模なデータベースにおけるデータ要約による問合せ処理の高速化の技術をシステム全体に渡る形で確立した．シノプシスと呼ばれる要約データを作成する技術であるが，本研究の提案はオリジナリティがあり，また，性能が優れていることが評価された．また，(C)および(D)に関する学術雑誌論文として，ストリーム環境においてセンシングなどの入り口から最終的な処理に至るまで，品質を管理しながら近似的にデータの集計等を行う問合せ処理技術を開発した．また，(B)および(D)に関して，機械学習のコア技術である最近傍探索の高速化のため，近似的ではあるが非常に高速である索引および探索技術の開発を行った．この成果は機械学習のトップレベルの会議に採録された．
さらに，(B)に関連するトピックとして，因果推論の技術をデータベースに適用し，データベースに対して背景知識も含めた形でのより精度の高い推論を含むデータベース問合せを実現する手法や，機械学習の手法である多腕バンディット問題のアプローチを近似処理のためのサンプル選択に適用する手法についても研究を進めた．これ以外に，(E)についてはデータベース管理システム(DBMS）の性能を引き出すためのアルゴリズム開発などを進めた．
今年度は，近似的問合せ処理に関する基礎技術として開発を進めてきたシノプシス構築およびそれを用いた問合せ手法について学術論文2編の採録に至った．また，近似的ストリーム処理に関しても論文の採録を果たしている．また，近似的最近傍処理に関する論文がトップ会議ICMLに採択された点も評価できると考えられる．本研究のねらいとしていた課題について，論文という形で成果が得られた点は大いに評価できる．また，新しいトピックについての研究の展開も進んでおり，今後の発展も期待できると考えられる．
令和7年度は最終年度であることから，研究成果のとりまとめと，サブテーマ(E)で示したプロトタイプシステムの構築に関する取り組みを行う．一方で，因果推論に基づくデータベースに対するより高度な問合せの実現や近似的問合せ処理に対する多腕バンディット問題の活用などの，本研究の過程で出現した新たなトピックについても進展があることから，これらをさらに発展させたいと考えている．
これらの研究を進めることにより生まれた知見をもとに，次年度の科学研究費の獲得のためのテーマ立案を行いたいと考えている．
言語情報がもつ視覚的性質の分析とそのマルチメディア統合処理への応用

研究課題/研究課題番号：23K24868 2024年4月 - 2026年3月

日本学術振興会科学研究費助成事業基盤研究(B)

井手一郎, 平山高嗣, 駒水孝裕, 川西康友, 道満恵介, KASTNER MarcAurel

　詳細を見る

担当区分：研究分担者

いわゆる「セマンティックギャップ」を越えて言語情報と視覚情報を関連付けるための方法論を提案する．従来，視覚情報から言語情報を表現する特徴を抽出する方法論，いわば「視覚情報がもつ言語的性質」の解明について取り組まれてきたのと逆に，「言語情報がもつ視覚的性質」の解明に取り組む．これは従来，高コストの主観評価実験によって定量化されてきたが，画像生成技術を用いたデータ駆動型手法で，これを低コストで定量化する．また，印象の程度に応じて挙動が変化する応用事例を通じて，視覚情報の言語的性質及び言語情報の視覚的性質の両者に基づいてセマンティックギャップを縮小したうえで，マルチメディア統合処理の効果を実証する．
本研究課題では，言語情報がもつ様々な視覚的性質として，事象に内在する静的印象と，事象の動きに関する動的印象に分けて分析し，与えられた言語情報がそれらをどの程度もっているか定量化する手法を提案する．さらに，それらの印象の程度に基づいて挙動が変化するマルチメディア統合処理による応用事例を提案する．具体的には，言語情報がもつ視覚的性質を明らかにするために，【課題1】名詞に注目した，事象に内在する静的印象の定量化，【課題2】動詞に注目した，事象の動きに関する動的印象の定量化，という2つの課題に取り組む．また，印象の程度に応じて挙動が変化するマルチメディア統合処理による応用事例において，提案する方法論の有効性を実証的に明らかにする．
<BR>
令和6年度は，【課題1】について，過年度の成果をとりまとめて，未知語の印象を反映した画像生成を通じて，単語に対する静的印象の推定手法を実現した．さらに，近年急速に進歩してきた大規模言語モデルを用いた推定手法についても実証的に検証した．
また，【課題2】について，動詞について直接同様の方法論を適用するのが困難であることが判明したため，まずは動的印象をもつ名詞や形容詞について検討した．
【課題1】は当初の計画以上に進展し，その成果の一部は，マルチメディア分野のトップ会議で賞候補にも挙がった．
一方【課題2】については，動的印象を静止画像から推定することが困難であることが判明したため，まずは，動的印象も含むものもある形容詞に対して，提案する方法論を検証することにした．
特に【課題2】について，当初想定していた動詞に対する印象を直接推定することから，動的印象も含むものもある形容詞に対して，提案する方法論を検証することにする．
地方自治体における法情報のDX化と発信

研究課題/研究課題番号：23K25155 2024年4月 - 2026年3月

日本学術振興会科学研究費助成事業基盤研究(B)

小川泰弘, 駒水孝裕, 木村泰知

　詳細を見る

担当区分：研究分担者

本研究では、条例とその改正履歴や要約、さらには審議内容が記された議会会議録を結びつけた条例沿革を、リンクトオープンデータ（LOD）の形式で構築する。さらに、条例・会議録の要約システムおよび条例の翻訳システムを開発し、それらで作成したデータを先述の条例沿革LODと連携させる。本研究では、法令や条例の内容だけでなく、改正履歴などのメタ情報や、その審議内容を示す議会会議録をまとめて法情報と呼ぶ。システムの実現においては、国の法令を対象に我々が開発してきた技術を応用するだけでなく、会議録から条例の沿革情報を抽出するなどの、新しい課題にも取り組む。
本研究は、地方自治体の活動における住民参加を促進するために、法情報のデジタル・トランスフォーメーション（DX）を進め、分かりやすい法情報を提供するシステムの開発を目的としている。具体的には、条例とその改正履歴、さらには議会会議録を結びつけた条例沿革をリンクトオープンデータ（LOD）形式で構築し、そのデータを利用して、条例や議会会議録の要約システムや翻訳システムを開発する。
今年度の主な進展として、名古屋市の条例沿革情報の整備に向けた取り組みが挙げられる。名古屋市役所の行政DX推進部法制課が発注したReiki-Base検索システムから得たデータをもとに、条例の沿革情報の解析を行い、その結果を基に詳細設計を進めている。これにより、条例の沿革情報と議会会議録の連携をより精緻に進めることが可能となった。
なお、当初の計画では条例のあらましについては、名古屋市の公報に掲載されたものを利用する予定であったが、2024年10月に市公報の形式が変更されあらましが掲載されなくなった。そのため、新たにあらましデータを作成する必要性が生じた。この対応策として、あらましデータを新たに作成する方法や公開方法を検討しており、設計の修正を進めている。
今後の研究活動としては、取得した沿革情報のさらなる分析を進め、名古屋市公報の新しい形式に対応した設計変更を実施する予定である。また、研究の遅れを取り戻し、条例沿革LODの構築を加速させるため、今後の活動を強化していく。
本研究の進捗は、当初の計画よりも遅れている。その最大の要因は、研究代表者が2024年度より学長補佐に就任したことである。研究立案時には学長補佐の職務に就くことを想定しておらず、研究に十分な時間を確保できると考えていた。しかし、学長補佐としての業務は想定以上に負担が大きく、本研究を計画通りに進めることが難しくなっている。
今後は、研究の遅れを挽回するために、以下の対策を講じる予定である。まず、研究チームとの連携を強化し、各メンバーの役割を再確認して、効率的に作業を分担する。また、研究活動の優先順位を明確にし、学長補佐業務と研究活動を両立させるための時間管理を見直す。加えて、進捗状況を定期的にチェックし、必要に応じて計画を柔軟に調整することで、研究活動の遅れを取り戻すことを目指す。
これらの対応策を実行することで、今後の研究進行を加速させ、予定の成果を達成できるよう尽力する。
2025年度の研究では、名古屋市が使用する「Reiki-Base 検索システム」から提供された条例沿革データの解析を進め、このデータをLOD形式に変換するシステムを構築する。まず、試験的に一部のデータを用いてLODの試作モデルを作成し、適切なリンク構造を検討。その後、名古屋市全体の条例データに拡張する。システムが完全自動化できた場合は、「Reiki-Base 検索システム」の全データに適用し、改正条例間のリンク付けにおいては、サンプリングと精度評価を実施する。
当初の計画では各自治体への適用を予定していたが、研究の遅れにより、まず名古屋市のデータに注力し、うまくいけば愛知県に拡張する方針に変更する。さらに、条例要約（あらまし）の作成が終了したため、要約システムの実現も検討する。大規模言語モデル（LLM）を使用した自動要約技術の性能を評価し、条例の専門的な文体に適した手法を選定する予定である。
議会データについては、名古屋市との交渉を進め、議会会議録と議案データの提供可能性を調査する。交渉が成立した場合、それらのデータをLODに統合し、条例改正プロセスの可視化手法を開発する。これにより、地方自治体における法情報のDX化を促進し、名古屋市で得た成果を基に、他の自治体への展開を目指す。
異種オープンデータ活用のためのデータ統合・管理基盤の研究開発

研究課題/研究課題番号：23K21726 2024年4月 - 2025年3月

日本学術振興会科学研究費助成事業基盤研究(B)

駒水孝裕, 井手一郎, 石川佳治, 波多野賢治

　詳細を見る

担当区分：研究代表者

配分額：17160000円（直接経費：13200000円、間接経費：3960000円）

昨年度までに，異種データの統合に関する基礎技術（特に，データにラベルを付与する分類問題における不均衡性の問題や大量のデータから必要な情報を効果的かつ効率的に検索する手法）を構築してきた．依然として課題はあるため，これらを更に改善することが今年度の研究内容の一つである．
一方で，これらの成果を受け，さらなる発展として，特に異種モーダルにおいてデータの統合するための基礎技術から応用技術に向けて研究を進行する．
本研究では、異種オープンデータの統合的活用を目的として、マルチモーダル情報やグラフ構造を用いた統合・管理基盤の構築に取り組んだ。画像コレクションのキャプション生成や要約では、シーングラフと外部知識の連携により意味的統合を実現した。また、処方薬情報やレシピ推薦、農業分野での病害検出など、異種データを結ぶ応用的課題に対応するモデルを提案した。さらに、ゼロショット学習やマルチタスク学習を活用し、柔軟かつ信頼性の高い情報処理手法を開発し、データ横断的な利活用に貢献した。
本研究は、異種オープンデータを横断的に結び付け、意味的に統合するための基盤技術を構築した点で学術的意義がある。特に、シーングラフやマルチモーダル情報処理、メトリック学習などの先端技術を応用し、従来困難であったデータ連携を可能にした点が新しい。社会的には、医療・農業・行政など多様な分野でのオープンデータ活用を促進し、知識発見や意思決定支援、サービスの効率化に貢献できる可能性を示した。特に、軽量モデルによる省リソース環境での活用や、信頼性の高い情報提供による公共サービス向上が期待される。
オンデマンド型仮想六法の構成方法の研究開発

研究課題/研究課題番号：23K18507 2023年6月 - 2025年3月

日本学術振興会科学研究費助成事業挑戦的研究(萌芽)

駒水孝裕, 外山勝彦, 佐野智也

　詳細を見る

担当区分：研究代表者

配分額：6370000円（直接経費：4900000円、間接経費：1470000円）

法令は社会を安全に運用するために必要なルールを定める．しかし，規定内容が複数の法令にまたがっていることなどの理由から，ある事物に関する規定を網羅的に把握することは容易でない．本提案では，オンデマンド型仮想六法の構成方法を開発する．任意のトピックに関する法令集の編纂は人的労力がかかるため，特化した仮想六法の構築を容易にすることで，特定の事物に関する規定が分散した現状を打開する.これを実現するために，本提案では，法情報管理の新しい方法論として，概念指向法情報管理 COLIM (Concept-Oriented Legal Information Management) を提案する.
本研究では、利用者の目的に応じて柔軟に法情報を構成・提示するオンデマンド型仮想六法の実現に向け、自然言語処理・情報検索・データ基盤構築に関する複数の課題に取り組んだ。具体的には、マルチタスク学習による言語モデルの適応（L3Masking）、画像情報の検索精度を高めるR-DiP、主張の信頼性推定手法、SNS上の誤情報抑制を目的としたハッシュタグ推薦（HashtagMeta）などを開発した。また、過去の法令を全文検索可能なデータベースの構築も行い、法制度の変遷を把握する基盤を整備した。これらの成果は、仮想六法の構築に向けた基盤的技術として大きく貢献するものである。
本研究は、法情報の文脈的抽出と柔軟な構成を可能にする技術を開発し、自然言語処理・情報検索分野における学術的貢献を果たした。特に、L3MaskingやR-DiP、信頼性推定、法令データベース構築などの成果は、法情報処理の高度化に寄与する。一方で、オンデマンド型仮想六法の実現により、法制度の可視化や市民・専門家による法的判断の支援、誤情報対策といった社会的課題の解決にもつながる意義を有する。
言語情報がもつ視覚的性質の分析とそのマルチメディア統合処理への応用

研究課題/研究課題番号：22H03612 2022年4月 - 2026年3月

日本学術振興会科学研究費助成事業基盤研究(B)

井手一郎, 平山高嗣, 駒水孝裕, 川西康友, 道満恵介

　詳細を見る

担当区分：研究分担者

いわゆる「セマンティックギャップ」を越えて言語情報と視覚情報を関連付けるための方法論を提案する．従来，視覚情報から言語情報を表現する特徴を抽出する方法論，いわば「視覚情報がもつ言語的性質」の解明について取り組まれてきたのと逆に，「言語情報がもつ視覚的性質」の解明に取り組む．これは従来，高コストの主観評価実験によって定量化されてきたが，画像生成技術を用いたデータ駆動型手法で，これを低コストで定量化する．また，印象の程度に応じて挙動が変化する応用事例を通じて，視覚情報の言語的性質及び言語情報の視覚的性質の両者に基づいてセマンティックギャップを縮小したうえで，マルチメディア統合処理の効果を実証する．
本研究課題では，言語情報がもつ様々な視覚的性質として，事象に内在する静的印象と，事象の動きに関する動的印象に分けて分析し，与えられた言語情報がそれらをどの程度もっているか定量化する手法を提案する．さらに，それらの印象の程度に基づいて挙動が変化するマルチメディア統合処理による応用事例を提案する．具体的には，言語情報がもつ視覚的性質を明らかにするために，【課題1】名詞に注目した，事象に内在する静的印象の定量化，【課題2】動詞に注目した，事象の動きに関する動的印象の定量化，という2つの課題に取り組む．また，印象の程度に応じて挙動が変化するマルチメディア統合処理による応用事例において，提案する方法論の有効性を実証的に明らかにする．
令和4年度は，【課題1】について，先行して進めている「鋭さ／丸さ」を対象として方法論の検証を進めるとともに，単語に対する静的印象の推定手法を実現する第一段階として，未知語の印象を反映した画像生成手法について検討した．また，【課題2】について，動的印象を推定するモデルを直接構築せずに，【応用事例1】として，生成されるキャプションの動的印象をパラメトリックに制御した画像キャプショニング手法を検討した．また，【応用事例2】として，コメントの印象に基づく映像コンテンツのハイライト検出手法の実現を目指して，そのために必要なデータセットを自動構築するとともに，コメントの印象推定手法を開発した．
【課題2】について，当初の計画では，動的印象を推定するモデルを構築したうえで，【応用事例1】として，生成されるキャプションの動的印象をパラメトリックに制御した画像キャプショニング手法を検討する予定であったが，動的印象を推定するモデルを直接構築せずに【応用事例1】を実現する方法を見出したため，【課題2】については，一旦検討を延期し，代わりに【応用事例1】の検討を優先して進めることにした．
まず，【課題1】について，単語に対する静的印象の推定手法を実現するための第一段階として，令和4年度に検討した未知語の印象を反映した画像生成手法に基づいて，第二段階として，第一段階で生成した画像の印象推定について検討する．
更に，本課題を検討する過程で見出した言語の発音に潜在する印象について，漢字の読みを分析することで，日本語における象徴素の抽出に挑戦する．
また，【課題2】について，当初の計画では，動的印象を推定するモデルを構築したうえで，【応用事例1】として，生成されるキャプションの動的印象をパラメトリックに制御した画像キャプショニング手法を検討する予定であったが，令和4年度に見出した，動的印象を推定するモデルを直接構築せずに【応用事例1】の実現を優先的に進め，その効果を検証する．
地方自治体における法情報のDX化と発信

研究課題/研究課題番号：22H03901 2022年4月 - 2026年3月

日本学術振興会科学研究費助成事業基盤研究(B)

小川泰弘, 駒水孝裕, 木村泰知

　詳細を見る

担当区分：研究分担者

近年進められている地方自治体のDX化においては，行政側が住民にサービスを提供するという視点で進められてきた．しかし，自治体の主役は住民であるのだから，住民側が新しいサービスを簡単に要求・実現できるようにすることが真の自治体DXだと本研究では考える．そこで，地方自治体の条例や議会会議録の情報を分かりやすく発信するシステムを開発し，それらの実現を目指す．
具体的には，条例や会議録の要約システムや，それらの情報を有機的に結合したデータベースを開発し，それらに簡単にアクセスできる仕組みを実現する．
本研究は，地方自治体の活動における住民の積極的な参加を促すために，地方自治体における法情報のデジタル・トランスフォーメーション(DX) を実現し，分かりやすい法情報の発信を支援するシステムを開発することを目的とする．具体的には，条例とその改正履歴や要約，さらには審議内容が記された議会会議録を結びつけた条例沿革をリンクトオープンデータ（LOD）の形式で構築する．さらに，条例・会議録の要約システムおよび条例の翻訳システムを開発し，それらで作成したデータを先述の条例沿革LODと連携させる．
<BR>
本年度は，条例要約データベースの構築のために，まず名古屋市の条例とそのあらましのデータを整形した．具体的には，名古屋市の条例のあらましは名古屋市の公報に掲載されているため，平成16年分から令和5年1月12日分まで，950本の公報から1,313件のあらましを獲得し，対応する条例とのペアのデータを作成した．
<BR>
また，議会会議録に関しては，研究分担者とともに競争型ワークショップNTCIR-17においてQA Lab-PoliInfo-4を企画し，現在実施中である．PoliInfo-4では四つのタスク，Question Answering-2，Answer Verification，Stance Classification-2，Minutes-to-Budget Linking を設計した．その中でもQuestion Answering-2では，議会における質問の要約を入力すると，それに対応する答弁の要約を返すタスクであり，そのための要約システムの開発を進めた．
研究代表者が，名古屋大学から名古屋市立大学に異動することになった．異動先の名古屋市立大学データサイエンス学部は完全に新規の学部のため，学生は学部1年生しか存在しない．当初の計画では，大学院生や学部4年生と一緒に研究を進める予定であったが，その点をどう変更するかを決めるまで時間がかかり，進捗が遅れた．なお，この件に関しては，名古屋大学に招聘教員として通うことで，引き続き学生と一緒に研究できることになった．
また，設備機器の購入についても，異動先の部屋の大きさやサーバ室の利用の可否などが不明だったため，購入時期が遅くなった．
データの作成については名古屋市の条例の要約データの入手とその分析を進めた．こちらも少し遅れたが，今後は他の自治体のデータ収集と分析を進めていく．
本年度は，昨年度に引き続き条例要約データベースの構築を進める．昨年度は名古屋市の条例の要約データを収集したが，今年度は東京都，愛知県，北海道，京都府，神奈川県，長野県，福岡県，名古屋市，鳥取市，函館市，桐生市，京都市，小林市を対象にデータの収集とデータベースの構築を進める．
さらに，それぞれの自治体について条例とその要約の対応付けを実施する．
また，そうして構築した条例および要約データベースと，地方議会会議録を結びつけたLODの構築も進める．LODは雛形を構築した後，それが異なる自治体のデータベースと合致するかを検証し，適合しない場合は適宜修正していく．
また，自動要約および機械翻訳については，既存システムに加えて深層学習に基づくシステムを構築し，性能向上を図る．
品質を保証するEnd-to-Endビッグデータ近似処理技術に関する研究

研究課題/研究課題番号：22H03594 2022年4月 - 2026年3月

日本学術振興会科学研究費助成事業基盤研究(B)

石川佳治, 杉浦健人, 駒水孝裕

　詳細を見る

担当区分：研究分担者

近年大いに着目されている近似的問合せのアプローチをビッグデータ処理のワークフロー全体に展開する，End-to-Endの近似的ビッグデータ処理の技術を確立する．コンパクトな要約情報を活用することと，ビッグデータ処理プロセスを通じて近似的データ処理の統合モデルを用いることで，従来型のビッグデータ処理に比べ大幅な速度向上を達成し，システム全体での近似品質の統一的な管理を可能とする．また，近似の品質と処理効率のトレードオフを適切に制御できることが重要であるため，本研究では求められた近似品質を満たすようにビッグデータ処理のワークフローを制御する品質駆動型の近似的データ処理技術を開発する．
異種オープンデータ活用のためのデータ統合・管理基盤の研究開発

研究課題/研究課題番号：21H03555 2021年4月 - 2025年3月

日本学術振興会科学研究費助成事業基盤研究(B)

駒水孝裕, 井手一郎, 石川佳治, 波多野賢治

　詳細を見る

担当区分：研究代表者

配分額：17160000円（直接経費：13200000円、間接経費：3960000円）

本年度は，(1) データ統合の際に課題となる属性推定における不均衡性への対処と，(2) テキストデータ処理における``フレーズ''についての一考察を行った．
(1) データ統合の際に，異なるデータソースに存在する同一のエンティティが異なる情報を属性として持っていることで，データ統合の性能に影響を与えている．望ましい状況としては，両エンティティが同じ属性を持ち，その属性の一致度合いから，エンティティの同一性を判定することである．しかし，世の中のデータがこの望ましい性質を持っていることは稀である．これに対する解決方法として，クラス分類を用いた属性推定である．これは，エンティティのクラスを属性として用いることであり，そのためにエンティティのクラスを分類するモデルを構築する必要がある．このクラス分類において，データの偏りによって分類性能が十分に向上させられない，という問題がある．これを不均衡性問題という．本研究では，これに対する手法として，昨年度に提案したアンダーサンプリングをベースとしたアンサンブル手法に，距離学習と呼ばれる，特徴量の変換手法を組み合わせることで，性能を向上させた．
(2) (1) と関連し，テキストデータの分類に焦点をあて，分類性能を向上させる方法を模索した．昨今では，サブワードと呼ばれる単位でテキストデータを扱うことが多い．また，テキストデータ分類においては，特定の意味を表すフレーズを明示的に扱うことで，その性能が向上することが知られている．一方で，フレーズの考え方をサブワードの文脈ではほとんど考えられていない．本研究では，サブワードの列を明示的に扱うことがどのような効果をもたらすかについて，検証・考察を行った．具体的には，高頻度のサブワード列をストップワードとして扱い，分類性能の向上に寄与することを示した．
大規模データ分析のための多視点分析管理システムの研究開発

2018年4月 - 2021年3月

科学研究費補助金

　詳細を見る

担当区分：研究代表者
大規模データ分析のための多視点分析管理システムの研究開発

研究課題/研究課題番号：18K18056 2018年4月 - 2021年3月

日本学術振興会科学研究費助成事業若手研究

駒水孝裕

　詳細を見る

担当区分：研究代表者

配分額：4160000円（直接経費：3200000円、間接経費：960000円）

オープンデータ化が推進される中で，公開されたデータを以下に活用するかが未だに課題である．本研究では，複雑に構造化されたデータを効率的に検索する技術，独立に作成されたデータを横断的に扱うために統合する技術を開発した．これらにより，活用のための分析技術を用いるためのデータの抽出が可能となった．また，関連する情報を結びつけることでより高度で精緻な分析が可能となった．
オープンデータやデジタルトランスフォーメーションが進行している現状において，デジタル化・オープン化したデータを活用することは重要である．一方で，データを作成する組織は別々であることも多く，横断的な活用には障害が残る．本研究では，異なる組織が公開したデータの関連性に基づいたデータ統合や複雑化したデータから必要な情報を一般的な検索方法を用いて検索できるようにした．これらは今後のオープンデータ活用における基礎的な技術である

▼全件表示

科研費の先頭へ▲

担当経験のある科目 (本学) 3

データ処理ツール演習

2018
数理科学基礎演習

2018
情報工学実験

2018

担当経験のある科目 (本学)の先頭へ▲