Color and strokes are the salient features of text regions in an image. In this
work, we use both these features as cues, and introduce a novel energy function
to formulate the text binarization problem. The minimum of this energy function
corresponds to the optimal binarization. We minimize the energy function with
an iterative graph cut based algorithm. Our model is robust to variations in
foreground and background as we learn Gaussian mixture models for color and
strokes in each iteration of the graph cut. We show results on word images from
the challenging ICDAR 2003/2011, born-digital image and street view text
datasets, as well as full scene images containing text from ICDAR 2013
datasets, and compare our performance with state-of-the-art methods. Our
approach shows significant improvements in performance under a variety of
performance measures commonly used to assess text binarization schemes. In
addition , our method adapts to diverse document images, like text in videos,
handwritten text images.