What is this?
=============

This is code for object detection on images. The method is described in 

@inproceedings{cinbis:hal-00873134, 
  AUTHOR = {Cinbis, Ramazan Gokberk and Verbeek, Jakob and Schmid, Cordelia}, 
  TITLE = {{Segmentation Driven Object Detection with Fisher Vectors}}, 
  BOOKTITLE = {{ICCV 2013 - IEEE International Conference on Computer Vision}}, 
  YEAR = {2013}, 
  MONTH = Dec, 
  PUBLISHER = {IEEE}, 
  KEYWORDS = {object detection, fisher vectors}, 
  ADDRESS = {Sydney, Australia}, 
  URL = {http://hal.inria.fr/hal-00873134} 
}

It is applied by default on the Pascal VOC detection challenge, see 

http://pascallin.ecs.soton.ac.uk/challenges/VOC/


Structure
=========

The code is written in Matlab, with critical routines in C/C++,
interfaced with mex. It does not depend on a particular Matlab
toolbox.

Input data
----------

Please download the maskfishdet_1.0_data.tgz package and unpack it in
the same directory (will create a voc07_data subdirecrory). It
contains the Pascal VOC 2007 images and the annotations in a more
manageable format than the original XML.


Data layout
-----------

All the intermediate datafiles are in fixed locations in the data/
subdirectory. You can symlink it to somewhere else if needed.

All datafiles are in .mat format, except the Fisher Vectors.

Masks
-----

The masks for an image are represented as a cell array. Each cell
contains a single matrix with the weight values for the pixels in the
corresponding bounding box.

Fisher Vectors
--------------

Due to their bulkiness the and because they must also be loaded with C
code, the FVs are in a raw format with separate headers (flexdata
format).

For the standard 1 + 4x4-cell spatial pyramid, the FVs for each
candidate box are made of 38 sub-vectors of dimension 8256 (total
313728 D):

local      | sampled | grid  | cell #  | PQ
descriptor | from    | size  | on grid |
-------------------------------------------
SIFT       | box     | 1x1   | 1,1     | 1
SIFT       | box     | 4x4   | 1,1     | 1
  ...
SIFT       | box     | 4x4   | 4,4     | 1
SIFT       | mask    | 1x1   | 1,1     | 2
SIFT       | mask    | 4x4   | 1,1     | 2
  ...
SIFT       | mask    | 4x4   | 4,4     | 2
SIFT       | im      |       |         | 1
color      | box     | 1x1   | 1,1     | 3
color      | mask    | 1x1   | 1,1     | 4
color      | im      |       |         | 3
-------------------------------------------

All local descriptors are reduced by PCA to 64 D and the GMMs have 64
components, hence 64*64 D for the derivatives wrt. mu + 64*64 for the
derivatives wrt. the diagonal sigma + 64 D for the mixture weight. 

The local descrptors can be sampled from: 

- box: detected candidate boxes

- mask: idem, mask weighted with the UVA segmentations

- im: the full image.

Color descriptors do not use a spatial pyramid. Each sub-vector is
compressed with a product quantizer (there are 4 of them) to 1032
bytes. For a given sub-quantizer, the box descriptors of an image are
compressed together with Blosc, that uses a fast implementation of LZ
compression. Power normalization and L2 normalization are done after
decompression.


Multiprocessing
---------------

Multiprocessing is done with (an adapted version of) the multicore
package by Markus Buehren, aka. poor man's parfor. The Matlab program
loads computations off to "slave" Matlab processes. The input/output
data is written to files in a common directory (/tmp/multicore_tmp by
default, see startup.m to adjust). A local directory is faster to
read/write but a global directory makes it possible to run slaves on
several machines.

To start a pool of slaves on the local machine, adjust the Matlab path
in the script and run

  bash run_multicore_slaves.bash <n_instances>

Often, code is also multithreaded, so it is not necessary to run as
many Matlab processes as cores. In the following, we indicate how many
slaves are useful on a 32-core machine (adjust linearly to the actual
number of cores you have available).

Installing
----------

We provide pre-compiled mex files for Linux x86_64 and Matlab R2012b
(8.0.0.783) 64-bit (glnxa64).

You can re-compile with 

  bash compile_all.bash

Adjust paths to OpenCV if required in the script.


How does it run?
================

Edit dataset_definition.m to define the image files, train/test split,
image classes, etc. This file will be called by all subsequent
functions. By default, it contains a 10-image subset of Pascal 2007,
for validation purposes. Definitions for the small subset of Pascal
2007 mentioned in the paper (860 images) and VOC 2007 itself are
provided.

Initialization
--------------

Edit startup.m to set the paths to VLFEAT and Piotr Dollar's
toolbox. 


Computing the masks
-------------------

To compute the Selective search segmentation on all images (train &
test) and generate canditate windows, run:

  compute_segmentations 

output is to data/segmentations and data/masks. Speed can be improved
using 20 computation slaves. 

The segmentation is done with UVA's closed source (.p) code. 

The combination of segments into masks is done in 

  fv_candbox_superpixelsinside.m


Unsupervised training
---------------------

Training the local descriptor PCA, GMM training, and PQ, do:

  unsupervised_training

output is in data/trained. Speed improved with 9 slaves. 

Random training samples (local features or FVs from boxes) are
extracted from descriptors computed on-the-fly on the training images

The PCA is computed with Piotr Dollar's toolbox.

The GMM training is done with an adapted version of Jakob Verbeek's
MFA package. 

The kmeans clustering is in-house code. 

For non-toy datasets, adjust the max_sample value to Inf.


Computing the descriptors
-------------------------

To compute Fisher vectors on train + test and compress them with PQ
and BLOSC:

  compute_descriptors

output is in data/fisher_vectors. Speed improved with 3 slaves. 

This is the most computationally intensive part.

For non-toy datasets, adjust the grids value (for toy, the grid is
only 1 + 2x1 cells).


Training
--------

To train the classifiers, run

  train_classifiers

It does not use multiprocessing.

This is the most memory-intensive part of the processing, because
during hard-negative mining, all training samples have to be
scored. Therefore, there are 3 possible ways of accessing the data,
depending on the cache_level parameter passed to prepare_descriptors.m

- cache_level = 0: all data is loaded in Matlab. For documentation
  purposes.

- cache_level = 1: data loaded and manipulated in mex. Good enough if
  all the training FVs fit in RAM. You can check this by doing 

    du -sh data/fisher_vectors

  dividing the result by 2 (for train/test) and comparing to the
  amount of RAM the machine has.

- cache_level = 2: data distributed over several machines. For this to
  work, the machines should be declared in prepare_descriptors.m. On
  each of the machines, the daemon 

    compression/cache_server <port>

  should be launched from the same NFS-shared directory.

It is also in this file that the components used in the descriptor can
be adjusted. In particular for the non-toy dataset, the cell grid must
be changed (from 1 + 2x1 cells to 1 + 4x4).

The output is to data/classifiers. The "W" vector of the classifier is
expressed as a 256 * D matrix (field alls_pq), so that classification
scores can be looked up using the PQ-quantized descriptors.


Testing and evaluation
----------------------

To test the classifiers on the testing images, run

  evaluate_classifiers

This outputs the detection scores in AP for all classes. The same
script, prepapre_descriptors, is used to access the data, so the same
comments as for train_classifiers apply.

Normally, for the mini dataset with 10 classes, it should output:

*** testing classifier for class aeroplane
  Loading classifier data/classifiers/classifier_aeroplane.mat
   AP=8.92 % maxRecall=75.00 %
*** testing classifier for class bicycle
  Loading classifier data/classifiers/classifier_bicycle.mat
   AP=0.00 % maxRecall=0.00 %

For the 860-image dataset, the output should be (compare to last line
in Tab 1 of the paper):

*** testing classifier for class bus
  Loading classifier data/classifiers/classifier_bus.mat
   AP=46.99 % maxRecall=73.68 %
*** testing classifier for class cat
  Loading classifier data/classifiers/classifier_cat.mat
   AP=53.33 % maxRecall=86.84 %
*** testing classifier for class motorbike
  Loading classifier data/classifiers/classifier_motorbike.mat
   AP=56.98 % maxRecall=84.30 %
*** testing classifier for class sheep
  Loading classifier data/classifiers/classifier_sheep.mat
   AP=42.52 % maxRecall=66.93 %
   mAP=49.95 %

For VOC07, the ouput should be (compare with one-before-last line of
Tab 2 of the paper):

*** testing classifier for class aeroplane
  Loading classifier data/classifiers/classifier_aeroplane.mat
   AP=55.28 % maxRecall=79.30 %
....
*** testing classifier for class tvmonitor
  Loading classifier data/classifiers/classifier_tvmonitor.mat
   AP=52.50 % maxRecall=84.09 % 
   mAP=39.25 % 


Troubleshooting
---------------

Most of the computation stages overwrite the result files if they
exist. If this is not what you want, make sure to comment out the
relevant code.

If you want to use the Matlab debugger (or keyboard) in code called by
the multicore package, add option

 struct('disablemc', 1)

at the end of the startmulticoremaster call. This converts the call to
a simple loop. 

The whole process produces about 20 GB of data for the 860-image
subset, and 135 GB for the full VOC 2007, so make sure that there is
enough disk space in the data/ subdirectory. For VOC 2007, the whole
process (descriptor extraction, training and testing) takes about 4
days on a powerful computer. Be patient! 


Authors & dependencies
======================

Authors
-------

The code was written by Ramazan Gokberk Cinbis between 2010 and 2013,
and adapted for release by Matthijs Douze. 

All code below mytools/ and fishervec/ is from Gokberk, the code in
the root directory and compression/ is mostly by Matthijs.

For questions, bug reports: 

  ramazan.cinbis@inria.fr
  matthijs.douze@inria.fr

Dependencies
------------

Dependencies with code included: 

- Selective Search segmentation by Jasper Uijlings et al., see 

  http://disi.unitn.it/~uijlings/MyHomepage/index.php#page=projects1

- liblinear, by the National Taiwan Univ.

- FalstLZ by Ariya Hidayat, interfaced via Blosc, by Francesc Alted.

External dependencies:

- VLFEAT by A. Vedaldi for dense SIFT computations (works with version
  0.9.17)

- Piotr Dollar's toolbox http://vision.ucsd.edu/~pdollar/toolbox/doc/
  (works with version 3.25)

- OpenCV, used for simple tasks like image resizing (works with
  version 2.4.6).

Legal
-----

Distributed under the GPL.


This is version 1.5 from Tue Mar 11 17:30:06 CET 2014