Tamil Character Recognition Using Pyteseract and NLTK

L Suriya Kala

L Suriya Kala Research Scholar, Department of Computer Science, Mother Teresa Women's University, Kodaikanal, Tamil Nadu, India

Keywords: Pyteseract, NLTK, Machine Learning, Artifical Intelligence

Abstract

India is a multilingual multiscript nation with in excess of 18 languages and 10 distinctive significant contents. Insufficient research progress in the direction of recognition of transcribed characters of these Indian contents has been finished. Tamil, an official just as well known content of the southern piece of India, Singapore, Malaysia, and Sri Lanka has an enormous character set which incorporates many compound characters. A way to deal with gather utilize full data from an assortment of pictures, in 2014, as indicated by Mary Meeker's yearly Internet Trends report, individuals transferred a normal of 1.8 billion advanced pictures each and every day. That is 657 billion photographs for each year. These pictures can be gathered, put away, handled and broke down for utilizing full information. In this paper proposed to Tamil character recognition utilizing Pyteseract and NLTK process.