Despite significant advances, the performance of state-of-the-art continual
learning approaches hinges on the unrealistic scenario of fully labeled data.
In this paper, we tackle this challenge and propose an approach for continual
semi-supervised learning---a setting where not all the data samples are
labeled. A primary issue in this scenario is the model forgetting
representations of unlabeled data and overfitting the labeled samples. We
leverage the power of nearest-neighbor classifiers to nonlinearly partition the
feature space and flexibly model the underlying data distribution thanks to its
non-parametric nature. This enables the model to learn a strong representation
for the current task, and distill relevant information from previous tasks. We
perform a thorough experimental evaluation and show that our method outperforms
all the existing approaches by large margins, setting a solid state of the art
on the continual semi-supervised learning paradigm. For example, on
CIFAR-100 we surpass several others even when using at least 30 times less
supervision (0.8% vs. 25% of annotations). Finally, our method works well on
both low and high resolution images and scales seamlessly to more complex
datasets such as ImageNet-100. Our source code is publicly available at
https://github.com/kangzhiq/NNCSL.