This is a key research topic currently being pursued by our team, which are conducted by
several PhD and graduate
students. This topic is also a widely recognized area in the industry, with significant applications in the
industrial
sector. In this domain, we are focused on developing advanced multi-modality models/systems to achieve a
comprehensive
understanding of video content, encompassing tasks such as video question answering, referring video
segmentation, and
video grounding, etc. These studies often require the joint processing of various modalities, including
video, images,
textual descriptions, and speech data.
Generating unobserved motion sequences based on given conditions, such as textual
descriptions, music, and partially
observed sequences, is an important research direction with significant applications in human-computer
interaction,
virtual reality, and other fields. A key challenge in this area is developing a prediction system that can
effectively
comprehend and align motion with different modalities of input, while also addressing the issue of error
accumulation
during long-term motion propagation. We are dedicated to solving the above problems and develop a set of
good-performing
models including Continual Prior Compensation algorithm and action-guided motion prediction model.
Please refer to this for more visulization results!