Minkuk Kim

Info

Ph.D Student
at Augmented Intelligence(AI) Laboratory
from Kyung Hee University
contact with asdjklfgh97 at khu.ac.kr

CV / Github / LinkedIn / Google Scholar

For any suggestion, please contact me with Email.

Short Bio

My name is Minkuk Kim, and I am currently a Ph.D. student at Kyung Hee University's Department of Artificial Intelligence, where I am part of the Augmented Intelligence (AMI) Laboratory. Under the supervision of Prof. Seong-Tae Kim, my research lies at the intersection of computer vision and natural language processing, with a particular focus on multi-modal learning and memory-augmented reasoning.

My recent work has explored how external memory and cross-modal retrieval can enhance dense video captioning. At CVPR 2024, I presented a novel framework that retrieves relevant textual cues from memory to improve both event localization and caption generation in untrimmed videos. At AAAI 2025, I further introduced a hierarchical compact memory structure inspired by human cognition, which organizes memory information across multiple levels of abstraction. This approach enables both improved semantic recall and efficient retrieval in long and complex video contexts. Through these studies, I aim to bridge episodic memory modeling with multi-modal sequence understanding in video-language tasks.

In the short term, I am particularly interested in advancing multi-modal AI systems to handle long-form, real-world video content—such as hour-long narratives—by integrating structured memory mechanisms with scalable temporal reasoning.

My goal is to continue advancing the field of AI by developing innovative solutions that bridge the gap between visual and textual data, ultimately contributing to the creation of intelligent systems that understand and interact with the world more naturally and effectively.

Recent News

Dec. 2024 One paper on Vision-Language Models got accepted to AAAI 2025!
Aug. 2024 Completed my MS program and start as PhD student!
Jun. 2024 One paper got accepted to ICIP Workshop 2024!
Feb. 2024 One paper on Vision-Language Models got accepted to CVPR 2024!
Nov. 2023 One paper got accepted to Image and Vision Computing!