What is this? ============= This is code for object detection on images. The method is described in @inproceedings{cinbis:hal-00873134, AUTHOR = {Cinbis, Ramazan Gokberk and Verbeek, Jakob and Schmid, Cordelia}, TITLE = {{Segmentation Driven Object Detection with Fisher Vectors}}, BOOKTITLE = {{ICCV 2013 - IEEE International Conference on Computer Vision}}, YEAR = {2013}, MONTH = Dec, PUBLISHER = {IEEE}, KEYWORDS = {object detection, fisher vectors}, ADDRESS = {Sydney, Australia}, URL = {http://hal.inria.fr/hal-00873134} } It is applied by default on the Pascal VOC detection challenge, see http://pascallin.ecs.soton.ac.uk/challenges/VOC/ Structure ========= The code is written in Matlab, with critical routines in C/C++, interfaced with mex. It does not depend on a particular Matlab toolbox. Input data ---------- Please download the maskfishdet_1.0_data.tgz package and unpack it in the same directory (will create a voc07_data subdirecrory). It contains the Pascal VOC 2007 images and the annotations in a more manageable format than the original XML. Data layout ----------- All the intermediate datafiles are in fixed locations in the data/ subdirectory. You can symlink it to somewhere else if needed. All datafiles are in .mat format, except the Fisher Vectors. Masks ----- The masks for an image are represented as a cell array. Each cell contains a single matrix with the weight values for the pixels in the corresponding bounding box. Fisher Vectors -------------- Due to their bulkiness the and because they must also be loaded with C code, the FVs are in a raw format with separate headers (flexdata format). For the standard 1 + 4x4-cell spatial pyramid, the FVs for each candidate box are made of 38 sub-vectors of dimension 8256 (total 313728 D): local | sampled | grid | cell # | PQ descriptor | from | size | on grid | ------------------------------------------- SIFT | box | 1x1 | 1,1 | 1 SIFT | box | 4x4 | 1,1 | 1 ... SIFT | box | 4x4 | 4,4 | 1 SIFT | mask | 1x1 | 1,1 | 2 SIFT | mask | 4x4 | 1,1 | 2 ... SIFT | mask | 4x4 | 4,4 | 2 SIFT | im | | | 1 color | box | 1x1 | 1,1 | 3 color | mask | 1x1 | 1,1 | 4 color | im | | | 3 ------------------------------------------- All local descriptors are reduced by PCA to 64 D and the GMMs have 64 components, hence 64*64 D for the derivatives wrt. mu + 64*64 for the derivatives wrt. the diagonal sigma + 64 D for the mixture weight. The local descrptors can be sampled from: - box: detected candidate boxes - mask: idem, mask weighted with the UVA segmentations - im: the full image. Color descriptors do not use a spatial pyramid. Each sub-vector is compressed with a product quantizer (there are 4 of them) to 1032 bytes. For a given sub-quantizer, the box descriptors of an image are compressed together with Blosc, that uses a fast implementation of LZ compression. Power normalization and L2 normalization are done after decompression. Multiprocessing --------------- Multiprocessing is done with (an adapted version of) the multicore package by Markus Buehren, aka. poor man's parfor. The Matlab program loads computations off to "slave" Matlab processes. The input/output data is written to files in a common directory (/tmp/multicore_tmp by default, see startup.m to adjust). A local directory is faster to read/write but a global directory makes it possible to run slaves on several machines. To start a pool of slaves on the local machine, adjust the Matlab path in the script and run bash run_multicore_slaves.bash Often, code is also multithreaded, so it is not necessary to run as many Matlab processes as cores. In the following, we indicate how many slaves are useful on a 32-core machine (adjust linearly to the actual number of cores you have available). Installing ---------- We provide pre-compiled mex files for Linux x86_64 and Matlab R2012b (8.0.0.783) 64-bit (glnxa64). You can re-compile with bash compile_all.bash Adjust paths to OpenCV if required in the script. How does it run? ================ Edit dataset_definition.m to define the image files, train/test split, image classes, etc. This file will be called by all subsequent functions. By default, it contains a 10-image subset of Pascal 2007, for validation purposes. Definitions for the small subset of Pascal 2007 mentioned in the paper (860 images) and VOC 2007 itself are provided. Initialization -------------- Edit startup.m to set the paths to VLFEAT and Piotr Dollar's toolbox. Computing the masks ------------------- To compute the Selective search segmentation on all images (train & test) and generate canditate windows, run: compute_segmentations output is to data/segmentations and data/masks. Speed can be improved using 20 computation slaves. The segmentation is done with UVA's closed source (.p) code. The combination of segments into masks is done in fv_candbox_superpixelsinside.m Unsupervised training --------------------- Training the local descriptor PCA, GMM training, and PQ, do: unsupervised_training output is in data/trained. Speed improved with 9 slaves. Random training samples (local features or FVs from boxes) are extracted from descriptors computed on-the-fly on the training images The PCA is computed with Piotr Dollar's toolbox. The GMM training is done with an adapted version of Jakob Verbeek's MFA package. The kmeans clustering is in-house code. For non-toy datasets, adjust the max_sample value to Inf. Computing the descriptors ------------------------- To compute Fisher vectors on train + test and compress them with PQ and BLOSC: compute_descriptors output is in data/fisher_vectors. Speed improved with 3 slaves. This is the most computationally intensive part. For non-toy datasets, adjust the grids value (for toy, the grid is only 1 + 2x1 cells). Training -------- To train the classifiers, run train_classifiers It does not use multiprocessing. This is the most memory-intensive part of the processing, because during hard-negative mining, all training samples have to be scored. Therefore, there are 3 possible ways of accessing the data, depending on the cache_level parameter passed to prepare_descriptors.m - cache_level = 0: all data is loaded in Matlab. For documentation purposes. - cache_level = 1: data loaded and manipulated in mex. Good enough if all the training FVs fit in RAM. You can check this by doing du -sh data/fisher_vectors dividing the result by 2 (for train/test) and comparing to the amount of RAM the machine has. - cache_level = 2: data distributed over several machines. For this to work, the machines should be declared in prepare_descriptors.m. On each of the machines, the daemon compression/cache_server should be launched from the same NFS-shared directory. It is also in this file that the components used in the descriptor can be adjusted. In particular for the non-toy dataset, the cell grid must be changed (from 1 + 2x1 cells to 1 + 4x4). The output is to data/classifiers. The "W" vector of the classifier is expressed as a 256 * D matrix (field alls_pq), so that classification scores can be looked up using the PQ-quantized descriptors. Testing and evaluation ---------------------- To test the classifiers on the testing images, run evaluate_classifiers This outputs the detection scores in AP for all classes. The same script, prepapre_descriptors, is used to access the data, so the same comments as for train_classifiers apply. Normally, for the mini dataset with 10 classes, it should output: *** testing classifier for class aeroplane Loading classifier data/classifiers/classifier_aeroplane.mat AP=8.92 % maxRecall=75.00 % *** testing classifier for class bicycle Loading classifier data/classifiers/classifier_bicycle.mat AP=0.00 % maxRecall=0.00 % For the 860-image dataset, the output should be (compare to last line in Tab 1 of the paper): *** testing classifier for class bus Loading classifier data/classifiers/classifier_bus.mat AP=46.99 % maxRecall=73.68 % *** testing classifier for class cat Loading classifier data/classifiers/classifier_cat.mat AP=53.33 % maxRecall=86.84 % *** testing classifier for class motorbike Loading classifier data/classifiers/classifier_motorbike.mat AP=56.98 % maxRecall=84.30 % *** testing classifier for class sheep Loading classifier data/classifiers/classifier_sheep.mat AP=42.52 % maxRecall=66.93 % mAP=49.95 % For VOC07, the ouput should be (compare with one-before-last line of Tab 2 of the paper): *** testing classifier for class aeroplane Loading classifier data/classifiers/classifier_aeroplane.mat AP=55.28 % maxRecall=79.30 % .... *** testing classifier for class tvmonitor Loading classifier data/classifiers/classifier_tvmonitor.mat AP=52.50 % maxRecall=84.09 % mAP=39.25 % Troubleshooting --------------- Most of the computation stages overwrite the result files if they exist. If this is not what you want, make sure to comment out the relevant code. If you want to use the Matlab debugger (or keyboard) in code called by the multicore package, add option struct('disablemc', 1) at the end of the startmulticoremaster call. This converts the call to a simple loop. The whole process produces about 20 GB of data for the 860-image subset, and 135 GB for the full VOC 2007, so make sure that there is enough disk space in the data/ subdirectory. For VOC 2007, the whole process (descriptor extraction, training and testing) takes about 4 days on a powerful computer. Be patient! Authors & dependencies ====================== Authors ------- The code was written by Ramazan Gokberk Cinbis between 2010 and 2013, and adapted for release by Matthijs Douze. All code below mytools/ and fishervec/ is from Gokberk, the code in the root directory and compression/ is mostly by Matthijs. For questions, bug reports: ramazan.cinbis@inria.fr matthijs.douze@inria.fr Dependencies ------------ Dependencies with code included: - Selective Search segmentation by Jasper Uijlings et al., see http://disi.unitn.it/~uijlings/MyHomepage/index.php#page=projects1 - liblinear, by the National Taiwan Univ. - FalstLZ by Ariya Hidayat, interfaced via Blosc, by Francesc Alted. External dependencies: - VLFEAT by A. Vedaldi for dense SIFT computations (works with version 0.9.17) - Piotr Dollar's toolbox http://vision.ucsd.edu/~pdollar/toolbox/doc/ (works with version 3.25) - OpenCV, used for simple tasks like image resizing (works with version 2.4.6). Legal ----- Distributed under the GPL. This is version 1.5 from Tue Mar 11 17:30:06 CET 2014