Multiple Context Features in Siamese Networks for Visual Object Tracking
Henrique Morimitsu
| |
Siamese networks have been successfully utilized to learn a robust matching function between pairs of images. Visual object tracking methods based on siamese networks have been gaining popularity recently due to their robustness and speed. However, existing siamese approaches are still unable to perform on par with the most accurate trackers. In this paper, we propose to extend the SiamFC tracker to extract features at multiple context and semantic levels from very deep networks. We show that our approach effectively extracts complementary features for siamese matching from different layers, which provides a significant performance boost when fused. Experimental results on VOT and OTB datasets show that our multi-context tracker is comparable to the most accurate methods, while still being faster than most of them. In particular, we outperform several other state-of-the-art siamese methods.
@inproceedings{morimitsu2018multiple,
author = {Henrique Morimitsu},
title = {Multiple Context Features in Siamese Networks for Visual Object Tracking},
booktitle = {ECCV workshops},
year = {2018}
}
|
Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos
Henrique Morimitsu, Isabelle Bloch and Roberto M. Cesar-Jr.
| |
In this paper, we propose a novel approach for exploiting structural relations to track multiple objects that may undergo long-term occlusion and abrupt motion. We use a model-free approach that relies only on annotations given in the first frame of the video to track all the objects online, \ie without knowledge from future frames. We initialize a probabilistic Attributed Relational Graph (ARG) from the first frame, which is incrementally updated along the video. Instead of using the structural information only to evaluate the scene, the proposed approach considers it to generate new tracking hypotheses. In this way, our method is capable of generating relevant object candidates that are used to improve or recover the track of lost objects. The proposed method is evaluated on several videos of table tennis, volleyball, and on the ACASVA dataset. The results show that our approach is very robust, flexible and able to outperform other state-of-the-art methods in sports videos that present structural patterns.
@article{morimitsu2017exploring,
author = {Henrique Morimitsu and Isabelle Bloch and Roberto M. Cesar-Jr.},
title = {Exploring structure for long-term tracking of multiple objects in sports videos},
journal = {Computer Vision and Image Understanding},
year = {2017},
volume = {159},
pages = {89--104},
doi = {http://dx.doi.org/10.1016/j.cviu.2016.12.003}
}
|
Keygraphs: Structured Features for Object Detection and Applications
Marcelo Hashimoto, Henrique Morimitsu, Roberto Hirata Jr. and Roberto M. Cesar-Jr.
| |
Object detection is one of the most important problems in computer vision and it is the base for many others, such as navigation, stereo matching and augmented reality. One of the most popular and powerful choices for performing object detection is using keypoint correspondence approaches. Several keypoint detectors and descriptors has already been proposed but they often extract information from the neighborhood of each point individually, without considering the structure and relationship between them. Exploring structural pattern recognition techniques is a powerful way to fill this gap. In this chapter the concept of keygraphs is explored for extracting structural features from regular keypoints. Keygraphs provide more flexibility to the description process and are more robust than traditional keypoint descriptors, such as SIFT and SURF, because they rely on structural information. The results observed in different tests show that this simplicity significantly improves the time performance, while also keeping them highly discriminative. The effectivity of keygraphs is validated by using them to detect objects in real-time applications on a mobile phone.
@inproceedings{hashimoto2017keygraphs,
author = {Marcelo Hashimoto and Henrique Morimitsu and Roberto Hirata-Jr. and Roberto M. Cesar-Jr.},
title = {Keygraphs: Structured Features for Object Detection and Applications},
booktitle = {Pattern Recognition and Big Data},
publisher={World Scientific},
year = {2017},
editor={Amita Pal and Sankar K. Pal},
}
|
Attributed Graphs for Tracking Multiple Objects in Structured Sports Videos
Henrique Morimitsu, Roberto M. Cesar-Jr. and Isabelle Bloch
| |
In this paper we propose a novel approach for tracking multiple object in structured sports videos using graphs. The objects are tracked by combining particle filter and frame description with Attributed Relational Graphs. We start by learning a probabilistic structural model graph from annotated images and then use it to evaluate and correct the current tracking state. Different from previous studies, our approach is also capable of using the learned model to generate new hypotheses of where the object is likely to be found after situations of occlusion or abrupt motion. We test the proposed method on two datasets: videos of table tennis matches extracted from YouTube and badminton matches from the ACASVA dataset. We show that all the players are successfully tracked even after they occlude each other or when there is a camera cut.
@inproceedings{morimitsu2015attributed,
title={Attributed Graphs for Tracking Multiple Objects in Structured Sports Videos},
author={Morimitsu, Henrique and Cesar-Jr., Roberto M. and Bloch, Isabelle},
booktitle={IEEE International Conference on Computer Vision Workshops},
pages={34--42},
year={2015}
}
|
A Spatio-Temporal Approach for Multiple Object Detection in Videos Using Graphs and Probability Maps
Henrique Morimitsu, Roberto M. Cesar-Jr. and Isabelle Bloch
| |
This paper presents a novel framework for object detection in videos that considers both structural and temporal information. Detection is performed by first applying low-level feature extraction techniques in each frame of the video. Then, additional robustness is obtained by considering the temporal stability of videos, using particle filters and probability maps, which encode information about the expected location of each object. Lastly, structural information of the scene is described using graphs, which allows us to further improve the results. As a practical application, we evaluate our approach on table tennis sport videos databases: the UCF101 table tennis shots and an in-house one. The observed results indicate that the proposed approach is robust, showing a high hit rate on the two databases.
@incollection{morimitsu2014spatio,
title={A Spatio-temporal Approach for Multiple Object Detection in Videos Using Graphs and Probability Maps},
author={Morimitsu, Henrique and Cesar Jr, Roberto M and Bloch, Isabelle},
booktitle={Image Analysis and Recognition},
pages={421--428},
year={2014},
publisher={Springer}
}
|
A Graph-based Approach for Object Detection and Action Recognition in Videos
Henrique Morimitsu, Roberto M. Cesar-Jr. and Isabelle Bloch
| |
@inproceedings{morimitsu2014graph,
title={A graph-based approach for object detection and action recognition in videos},
author={Morimitsu, Henrique and Cesar Jr, Roberto M and Bloch, Isabelle},
booktitle={FEAST Workshop of International Conference on Pattern Recognition},
year={2014},
}
|
Wi-fi and Keygraphs for Localization with Cell Phone
Henrique Morimitsu, Rodrigo B. Pimentel, Marcelo Hashimoto, Roberto M. Cesar-Jr. and Roberto Hirata-Jr.
| |
We present a mobile device application that uses information from Wi-Fi signals and from the device's camera to help the localization estimation in indoor environments. The application runs entirely on the mobile device without relying on an external server to achieve real-time performance. The estimation of the localization using camera information is accomplished by keygraph matching between previously selected sign images whose location are known in the environment. The estimation of the Wi-Fi localization is implemented using a naive Bayes classifier on the signals of existing local wireless networks. The final estimation is achieved by using the latter as a rougher estimation of the device location while no sign is detected and, when the device gets closer to a sign, by using the camera to refine the initial Wi-Fi estimation to obtain a much more precise localization. We show results obtained with our approach on a local indoor environment.
@inproceedings {morimitsu2011wifi,
author = {Henrique Morimitsu and Rodrigo B. Pimentel and Marcelo Hashimoto and Roberto M. Cesar-Jr. and Roberto Hirata-Jr.},
title = {Wi-fi and Keygraphs for Localization with Cell Phone},
booktitle = {IEEE International Conference on Computer Vision Workshops},
pages = {92--99},
year = {2011}
}
|
Keygraphs for Sign Detection in Indoor Environments by Mobile Phones
Henrique Morimitsu, Marcelo Hashimoto, Rodrigo B. Pimentel, Roberto M. Cesar-Jr. and Roberto Hirata-Jr.
| |
We present an application for mobile phones to detect indoor signs and help in localization. Because it depends only on device capabilities, it is flexible and unconstrained. Detection is accomplished online by keygraph matching between sign images collected offline and the image from a mobile camera phone. After detection we apply a simple localization method based on a comparison between the detected sign and a dataset, consisting of images of the whole environment taken at different positions. We show the results obtained using the application in a local indoor environment.
@inproceedings {morimitsu2011keygraphs,
author = {Henrique Morimitsu and Marcelo Hashimoto and Rodrigo B. Pimentel and Roberto M. Cesar-Jr. and Roberto Hirata-Jr.},
title = {Keygraphs for Sign Detection in Indoor Environments by Mobile Phones},
booktitle = {Graph-Based Representations in Pattern Recognition},
series = {Lecture Notes in Computer Science},
editor = {Jiang, Xiaoyi and Ferrer, Miquel and Torsello, Andrea},
publisher = {Springer Berlin / Heidelberg},
pages = {315--324},
volume = {6658},
year = {2011}
}
|
Using Visual Metrics to Selecting ICA Basis for Image Compression: A Comparative Study
Patricia R. Oliveira, Henrique Morimitsu and Esteban F. Tuesta
| |
In order to obtain a good image compression result, it would be appropriate to previously estimate the error between a distorted image and its reference in such a process. Traditionally, the Mean-Squared Error (MSE) has been used as a standard measure for evaluate the effect of dimensionality reduction methods. More recently, other measures for assessing perceptual image quality has been proposed in the literature. In this paper, the main interest relies on a comparative study between the MSE and the Structural Similarity Index (SSIM), which uses structural similarity as a principle for measuring image quality. The basic aiming for such study is the proposal of an ordering and selecting procedure of the transformation basis found by Independent Component Analysis (ICA), which can take one of these measures into account. The principal motivation for this idea is that, in contrast to Principal Component Analysis (PCA), ICA does not have a property that allows a natural ordering for its components (called ICs). For evaluating the efficiency of such approach, a comparative study between PCA and the ICA-based proposal is also carried out for an image dimensionality reduction application. It can been noted that the ICA method, when using hyperbolic tangent function, could provide an efficient method to select the best ICs.
@incollection{oliveira2010using,
title={Using visual metrics to selecting ICA basis for image compression: a comparative study},
author={Oliveira, Patricia R and Morimitsu, Henrique and Tuesta, Esteban F},
booktitle={Advances in Artificial Intelligence--IBERAMIA 2010},
pages={80--89},
year={2010},
publisher={Springer}
}
|