Self-supervised learning models have been shown to learn rich visual
representations without requiring human annotations. However, in many
real-world scenarios, labels are partially available, motivating a recent line
of work on semi-supervised methods inspired by self-supervised principles. In
this paper, we propose a conceptually simple yet empirically powerful approach
to turn clustering-based self-supervised methods such as SwAV or DINO into
semi-supervised learners. More precisely, we introduce a multi-task framework
merging a supervised objective using ground-truth labels and a self-supervised
objective relying on clustering assignments with a single cross-entropy loss.
This approach may be interpreted as imposing the cluster centroids to be class
prototypes.  Despite its simplicity, we provide empirical evidence that our
approach is highly effective and achieves state-of-the-art performance on
CIFAR100 and ImageNet.