Our groupÕs research focuses on human identification and activity recognition. We are also interested in developing large-scale machine learning methods in order to process large-scale data.
The person re-identification is to match the same personÕs images captured at different space and different time across non-overlapping camera views in a visual surveillance system.
Relative Distance Comparison: At an early stage, we first proposed a relative distance comparison model which is a soft discriminant model in order to alleviate the over-fitting problem due to the large variations of intra-class appearance across non-overlapping camera views.
Open-world Re-ID: Based on the relative comparison model, we further generalized its ability on processing the open-world person re-identification. In this work, we assume only a short list (i.e. watch list) of people is our concerns for tracking in a camera networks, while the others are actually imposters to the re-id system. We model this by developing a transfer local relative distance comparison, and our model can utilize source dataset to assist the re-id on a limited target data from the people on the watch list.
In addition to the open-world problem, we also consider a general re-id transfer problem for re-id in order to reduce the labeling cost when deploying the system in a new scenario.
Modeling view-specific transforms: In our research we find that the distribution of images of different views from the same group of people are different, and modeling view-specific transforms is useful to fully explore the feature characteristic of each view and learn a common feature space to align different viewsÕ data distribution as well.We have two recent works on this perspective. In TCSVT, we presented an asymmetric distance modeling on different views. In IJCAI 2005, we further proposed a more effective and easy-to-do view specific domain adaptation method.
Partial Re-ID: Recently, we also consider more challenging re-id problems, including the partial re-id problem and the low resolution problem. Especially, the partial re-id addressed the partial observation of a person in a real-world crowded scenario.
Occlusion & Low resolution: Recently, we also consider more challenging re-id problems, including the partial re-id problem and the low resolution problem. Especially, the partial re-id addressed the partial observation of a person in a real-world crowded scenario.
Deep RE-ID: In WACV, we proposed a deep fusion neural networks in order to make deep neural networks learning complementary features to the hand-crafted features.
Context: We have ever developed transfer context learning for object and human detection. Context is critical for minimising ambiguity in object detection. In this work, a novel context modelling framework is proposed without the need of any prior scene segmentation or context annotation. This is achieved by exploring a new polar geometric histogram descriptor for context representation. In order to quantify context, we formulate a new context risk function and a maximum margin context (MMC) model to solve the minimization problem of the risk function. Crucially, the usefulness and goodness of contextual information is evaluated directly and explicitly through a discriminant context inference method and a context confidence function, so that only reliable contextual information that is relevant to object detection is utilised.
Interestingly, we have also explored the group context for assisting person re-identification. In a crowded public space, people often walk in groups, either with people they know or strangers. Associating a group of people over space and time can assist understanding individual's behaviours as it provides vital visual context for matching individuals within the group. Seemingly an `easier' task compared with person matching, this problem is in fact very challenging because a group of people can be highly non-rigid with changing relative position of people within the group and severe self-occlusions. For the first time, the problem of matching/associating groups of people over large space and time captured in multiple non-overlapping camera views is addressed by us. Specifically, a novel people group representation and a group matching algorithm are proposed. The former addresses changes in the relative positions of people in a group and the latter deals with variations in illumination and viewpoint across camera views. We also demonstrate a notable enhancement on individual Person matching by utilising the group description as visual context.
For this topic, we are interested in the interaction recognition, either between human and object or between human and human.
Human-Object-Interaction (HOI): The first work we did is to present an exemplar based HOI model in order to make the recognition system tolerant to inaccurate object detection.
Later, in order to alleviate the lighting impact and utilize heterogeneous features for achieving a more robust recognition, we developed a RGB-D based HOI methods by presenting a joint learning on heterogeneous features
Collective Activity Recognition: For learning interaction between people in a group, we presented a graph-based interaction learning model.
Discriminant subspace learning: There is some argument for principal component selection in PCA+LDA. This work shows small principal components (corresponding to small eigenvalues) are useful and should be carefully selected in PCA+LDA. A undation of principal component selection in LDA is established. New GA technique is used for implementation.
From 2003 to 2008, lots of work have shown that algorithms with (2D) matrix-based representation perform better than the traditional (1D) vector-based ones. Specially, 2D-LDA was widely reported to outperform 1D-LDA. However, would the matrix-based linear discriminant analysis be always superior and when would 1D-LDA be better? This work gives some impressive theoretical analysis and experimental comparison between 1D-LDA and 2D-LDA. Different from existing views, we find that there is no convinced evidence that 2D-LDA would always outperform 1D-LDA when the number of training samples for each class is small or when the number of discriminant features used is small.
In deriving the FisherÕs LDA formulation, there is an assumption that the class empirical mean is equal to its expectation. However, this may not be valid in practice and this problem has been rarely discussed before. From the "perturbation" perspective, we develop a new algorithm, called perturbation LDA (P-LDA), in which perturbation random vectors are introduced to learn the effect of the difference between the class empirical mean and its expectation in Fisher criterion.
Sparse Feature Learning:
NMF, which is a two-sided non-negativity based matrix factorization, is popular for extraction of sparse features. However, why non-negativity should be imposed on both components and coefficients? What is case if some constraint is released? In this work, we find releasing the non-negativity constraint on the coefficient term in NMF would help extract equally/much sparser and more reconstrutive components/features as compared to the two-sided non-negativity matrix factorization techniques. The exact 17 local components of Swimmer data set are successfully extracted for the first time (to our best knowledge).
We present a sparse correntropy framework for computing robust sparse representations of face images for recognition. Compared with the state-of-the-art l1norm-based sparse representation classifier (SRC), which assumes that noise also has a sparse representation, our sparse algorithm is developed based on the maximum correntropy criterion, which is much more insensitive to outliers. In the proposed correntropy frameworks, several new methods have been developed for face recognition and object recognition.
VIS-NIR Face Recognition: Visual versus near infrared (VIS-NIR) face image matching uses a NIR face image as the probe and conventional VIS face images as enrollment. Existing VIS-NIR techniques assume that during classifier learning, the VIS images of each target people have their NIR counterparts. However, since corresponding VIS-NIR image pairs of the same people are not always available. To address this problem, we propose a transductive method named transductive heterogeneous face matching (THFM) to adapt the VIS-NIR matching learned from training with available image pairs to all people in the target set. In addition, we propose a simple feature representation for effective VIS-NIR matching, which can be computed in three steps, namely Log-DoG filtering, local encoding, and uniform feature normalization, to reduce heterogeneities between VIS and NIR images. The transduction approach can reduce the domain difference due to heterogeneous data and learn the discriminative model for target people simultaneously.
KPCA is a promising technique for nonlinear processing of images. A main problem in this approach is how to learn the pre-image of a kernel feature point in the input image space. However, it is always ill-posed. We present a regularized method and introduce the weakly supervised learning in order to alleviate this ill-posed estimation problem.
In solving the illumination problem for face recognition, most (if not all) existing methods either only use extracted small-scale features while discard large-scale features, or perform normalization on the whole image. In the latter case, small-scale features may be distorted when the large-scale features are modified. In this work, we argue that large-scale features of face image are important and contain useful information for face recognition as well as visual quality of normalized image. We suggest that illumination normalization should mainly perform on large-scale features of face image rather than the whole face image. A new framework is therefore developed.
Nowadays, we have more data to process. Recently, our group is working on 1) online classifier; 2) fast search; 3) large-scale clustering.
For online classifier, we developed a locality sensitive online learning method for learn local hyperplanes jointly on a stream data
For fast search, we focus on developing hash models, which search similar thing in Hamming space. Our research goes from single modal hashing to cross modal hashing.
We are also interested in large scale clustering, where we have developed Euler clustering and fast competitive learning.
Jian-sheng Wu (student), Wei-Shi
Jian-sheng Wu (student), Wei-Shi Zheng*, Jian-huang Lai, "Euler Clustering," International Joint Conference on Artificial Intelligence (IJCAI), 2013.
Jiansheng Wu (student), Wei-Shi Zheng*, Jian-Huang Lai. Approximate Kernel Competitive Learning. Neural Networks, pp. 117-132, 2015 [CODE]