Although instance segmentation methods have improved considerably, the dominant
paradigm is to rely on fully-annotated training images, which are tedious to
obtain. To alleviate this reliance, and boost results, semi-supervised
approaches leverage unlabeled data as an additional training signal that limits
overfitting to the labeled samples. In this context, we present novel design
choices to significantly improve teacher-student distillation models. In
particular, we (i) improve the distillation approach by introducing a novel
"guided burn-in" stage, and (ii) evaluate different instance segmentation
architectures, as well as backbone networks and pre-training strategies.
Contrary to previous work which uses only supervised data for the burn-in
period of the student model, we also use guidance of the teacher model to
exploit unlabeled data in the burn-in period. Our improved distillation
approach leads to substantial improvements over previous state-of-the-art
results. For example, on the Cityscapes dataset we improve mask-AP from 23.7 to
33.9 when using labels for 10\% of images, and on the COCO dataset we improve
mask-AP from 18.3 to 34.1 when using labels for only 1% of the training data.