@article{M16DBD3CC, title = "Context-Dependent Video Data Augmentation for Human Instance Segmentation", journal = "KIPS Transactions on Software and Data Engineering", year = "2023", issn = "2287-5905", doi = "https://doi.org/10.3745/KTSDE.2023.12.5.217", author = "HyunJin Chun/JongHun Lee/InCheol Kim", keywords = "Drama Video, Human Instance Segmentation, Class Imbalance, Video Data Augmentation, Spatio-Temporal Context", abstract = "Video instance segmentation is an intelligent visual task with high complexity because it not only requires object instance segmentation for each image frame constituting a video, but also requires accurate tracking of instances throughout the frame sequence of the video. In special, human instance segmentation in drama videos has an unique characteristic that requires accurate tracking of several main characters interacting in various places and times. Also, it is also characterized by a kind of the class imbalance problem because there is a significant difference between the frequency of main characters and that of supporting or auxiliary characters in drama videos. In this paper, we introduce a new human instance datatset called MHIS, which is built upon drama videos, Miseang, and then propose a novel video data augmentation method, CDVA, in order to overcome the data imbalance problem between character classes. Different from the previous video data augmentation methods, the proposed CDVA generates more realistic augmented videos by deciding the optimal location within the background clip for a target human instance to be inserted with taking rich spatio-temporal context embedded in videos into account. Therefore, the proposed augmentation method, CDVA, can improve the performance of a deep neural network model for video instance segmentation. Conducting both quantitative and qualitative experiments using the MHIS dataset, we prove the usefulness and effectiveness of the proposed video data augmentation method." }