Kannada

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 7296 Experts worldwide ranked by ideXlab platform

Deepa Gupta - One of the best experts on this subject based on the ideXlab platform.

  • Regular Expression Tagger for Kannada Parts of Speech Tagging
    Proceedings of the Second International Conference on Computational Intelligence and Informatics, 2018
    Co-Authors: K. M. Shiva Kumar, Deepa Gupta
    Abstract:

    Part of speech tagging for Indian languages in general and Kannada in particular is not a very widely explored territory. There have been many attempts at developing a good POS tagger for Kannada, but the morphological complexity of the language makes it a hard nut to crack. Some of the best taggers available for Indian languages employ hybrids of machine learning or stochastic methods and linguistic knowledge. Though the results achieved using such methods are good, their practicability for other inflective Indian languages is reduced due to their heavy dependence on linguistic knowledge. Even though taggers can achieve very good results if provided good morphological information, the cost of creating these resources renders such methods impractical. In this paper, we present regular expression parts of speech tagger for Kannada. We apply 100 patterns incorporating the TDIL tags for Kannada and tested for accuracy with manual tagged corpus.

  • Kannada speech to text conversion using CMU Sphinx
    2016 International Conference on Inventive Computation Technologies (ICICT), 2016
    Co-Authors: K. M. Shivakumar, K G Aravind, T V Anoop, Deepa Gupta
    Abstract:

    This paper investigates the complex problem of speech to text conversion of Kannada Language. We propose a novel Kannada Automated Speech to Text conversion System (ASTC). We train and test the Speech Processing System using CMUSphinx framework. CMU Sphinx is dynamic in nature with support for other languages along with English. We train the Acoustic model for Kannada speech with 1000 general spoken sentences and tested 150 sentences. We build our system utilizing features available in CMU Sphinx, thus showcasing the conceivable flexibility of this framework for Kannada voice to text conversion. In this paper, Kannada sentences with four to ten word length is researched. The speech conversion system permits ordinary people to speak to the computer in order to retrieve information in textual form. The number of alphabets in Kannada are 52. The system investigates extensibility of recognizing all letters and morphological variants of spoken Kannada words.

  • Comparative study of factored SMT with baseline SMT for English to Kannada
    2016 International Conference on Inventive Computation Technologies (ICICT), 2016
    Co-Authors: K. M. Shivakumar, N. Shivaraju, Vighnesh Sreekanta, Deepa Gupta
    Abstract:

    Dravidian languages are highly agglutinative and morphologically rich in their features. Language processing for these languages requires more annotating data compared to European or Indo-European languages. In this paper we present the comparison between Statistical Machine Translation (SMT) model with linguistic and non-linguistic data models for English to Kannada languages. The experiments shows an improvement in Bleu-Score for Factored MT system against Baseline MT system for English to Kannada SMT. Kannada fonts can take ten different forms in representing a word any change of a font variant in word leads to change in meaning of the word. We model these morphological variants of Kannada lemma words, their variants and PoS as Factors in our MT System.

K. P. Soman - One of the best experts on this subject based on the ideXlab platform.

  • kernel based part of speech tagger for Kannada
    International Conference on Machine Learning and Cybernetics, 2010
    Co-Authors: P J Antony, K. P. Soman
    Abstract:

    The proposed paper presents the development of a part-of-speech tagger for Kannada language that can be used for analyzing and annotating Kannada texts. POS tagging is considered as one of the basic tool and component necessary for many Natural Language Processing (NLP) applications like speech recognition, natural language parsing, information retrieval and information extraction of a given language. In order to alleviate problems for Kannada language, we proposed a new machine learning POS tagger approach. Identifying the ambiguities in Kannada lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. We have developed our own tagset which consist of 30 tags and built a part-of-speech Tagger for Kannada Language using Support Vector Machine (SVM). A corpus of texts, extracted from Kannada news papers and books, is manually morphologically analyzed and tagged using our developed tagset. The performance of the system is evaluated and we found that the result obtained was more efficient and accurate compared with earlier methods for Kannada POS tagging.

  • ICMLC - Kernel based part of speech tagger for Kannada
    2010 International Conference on Machine Learning and Cybernetics, 2010
    Co-Authors: P J Antony, K. P. Soman
    Abstract:

    The proposed paper presents the development of a part-of-speech tagger for Kannada language that can be used for analyzing and annotating Kannada texts. POS tagging is considered as one of the basic tool and component necessary for many Natural Language Processing (NLP) applications like speech recognition, natural language parsing, information retrieval and information extraction of a given language. In order to alleviate problems for Kannada language, we proposed a new machine learning POS tagger approach. Identifying the ambiguities in Kannada lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. We have developed our own tagset which consist of 30 tags and built a part-of-speech Tagger for Kannada Language using Support Vector Machine (SVM). A corpus of texts, extracted from Kannada news papers and books, is manually morphologically analyzed and tagged using our developed tagset. The performance of the system is evaluated and we found that the result obtained was more efficient and accurate compared with earlier methods for Kannada POS tagging.

  • Kernel Method for English to Kannada Transliteration
    2010 International Conference on Recent Trends in Information Telecommunication and Computing, 2010
    Co-Authors: P J Antony, V. P. Ajith, K. P. Soman
    Abstract:

    Language transliteration is one of the important area in natural language processing. Accurate transliteration of named entities plays an important role in the performance of machine translation and cross-language information retrieval processes. The transliteration model must be design in such a way that the phonetic structure of words should be preserve as closely as possible. This paper addresses the problem of transliterating English to Kannada language using a publicly available structured output Support Vector Machines (SVM). The proposed transliteration scheme uses sequence labeling method to model the transliteration problem. This transliteration technique was demonstrated for English to Kannada Transliteration and achieved exact Kannada transliterations for 87.28% of English names.

B. V. Dhandra - One of the best experts on this subject based on the ideXlab platform.

  • Handwritten Kannada Characters Recognition using Curvelet Transform
    2015
    Co-Authors: Shashikala Parameshwarppa, B. V. Dhandra
    Abstract:

    The Selection of a feature extraction method for recognition of an object/character is probably the single most factors in achieving high recognition accuracy. Therefore, in this paper an effort is made to identify the Second Generation Discrete Curvelet Transform (DCTG2) as the potential features for recognition of handwritten Kannada character system .Images are made noise free by median filter and images are normalized into 64x64 pixels. Curvelet transform with different scales are applied to the input images to generate the curvelet coefficients .Then the standard deviation are computed for the curvelet coefficients to form feature vector of size 20.The total of 2800 Kannada vowels and 6800 handwritten Kannada consonants of sample images are used for classification based on the KNN classifier. To test the performance of the proposed algorithm two fold cross validation is used. The average recognition accuracy of 90.57% is obtained for handwritten basic Kannada characters respectively. The proposed algorithm is independent of thinning and skew of the character images.

  • A Zone Based Character Recognition Engine for Kannada and English Scripts
    Procedia Engineering, 2012
    Co-Authors: Gururaj Mukarambi, B. V. Dhandra, Mallikarjun Hangarge
    Abstract:

    Abstract In this paper, an Optical Character Recognition engine for Kannada and English character recognition is proposed based on zone features. The zone is one of the old concepts in case of document image analysis research. But this method is good in case of Kannada and English character recognition. The total of 2800 Kannada consonants and 2300 English lowercase alphabets sample images are classified based on the SVM classifier. All preprocessed images are normalized into 32 x 32 dimensions, it is optimum. Then the preprocessed image is divided into 64 zones of non overlapping and zone based pixel density is calculated for each of the 64 zones, there by generating 64 features. These features are fed to the SVM classifier for classification of character images. To test the performance of an algorithm 2 fold cross validation is used. The average recognition accuracy of 73.33% and 96.13% is obtained for Kannada consonants and English lowercase alphabets respectively. Further the average percentage of recognition accuracy of 83.02% is obtained for mixture input of both Kannada and English characters. The recognition accuracy obtained for Kannada consonants is low, because most of the characters are similar in shape. Hence, one may need to add some more dominating features to discriminating the characters. In this direction, the work is in progress. It is an initial attempt for mixture of Kannada and English characters recognition with single algorithm. The novelty of the algorithm is independent of thinning and slant of the characters.

  • Handwritten Kannada Vowels and English Character Recognition System
    International Journal of Image Processing and Vision Science, 2012
    Co-Authors: B. V. Dhandra, Gururaj Mukarambi, Mallikarjun Hangarge
    Abstract:

    In this paper, a zone based features are extracted from handwritten Kannada Vowels and English uppercase Character images for their recognition. A Total of 4,000 handwritten Kannada and English sample images are collected for classifications. The collected images are normalized into 32 x 32 dimensions. Then the normalized images are divided into 64 zones and their pixel densities are calculated, generating a total of 64 features. These 64 features are submitted to KNN and SVM classifiers with 2 fold cross validation for recognition of the said characters. The proposed algorithm works for individual Kannada vowels, English uppercase alphabets and mixture of both the characters. The recognition accuracy of 92.71% for KNN and 96.00% for SVM classifiers are achieved in case of handwritten Kannada vowels and 97.51% for KNN and 98.26% for SVM classifiers are obtained in case of handwritten English uppercase alphabets. Further, the recognition accuracy of 95.77% and 97.03% is obtained for mixed characters (i.e. Kannada Vowels and English uppercase alphabets). Hence, the proposed algorithm is efficient for the said characters recognition. The proposed algorithm is independent of thinning and slant of the characters and is the novelty of the proposed work.

  • Kannada and English Numeral Recognition System
    International Journal of Computer Applications, 2011
    Co-Authors: B. V. Dhandra, Mukarambi Gururaj, Hangarge Mallikarjun
    Abstract:

    In this Paper, zone based features are used for recognition of handwritten and printed Kannada and English numerals. The handwritten and printed Kannada and English numeral images are normalized into 32 x 32 dimensions. Then normalized images are divided into 64 zones and their pixel densities are used as feature vector. Thus, the dimension of feature vector is 64. The handwritten and printed Kannada and English numerals are tested for classifications on 4,000 sample images as an experiment and obtained an accuracy of 95.25% for KNN classifier and 97.05% for SVM classifier for mixed numeral inputs with 2-Fold cross validation for handwritten and printed Kannada and English numerals. A total of 40 classes have been reduced to 19 classes pertaining to handwritten and printed Kannada numerals and handwritten and printed English numerals to enable to increase the recognition accuracy. The novelty of the proposed algorithm is thinning free, independent of slant of the characters. General Terms Document Image Analysis

  • A recognition system for handwritten Kannada and English characters
    International Journal of Computational Vision and Robotics, 2011
    Co-Authors: B. V. Dhandra, Gururaj Mukarambi, Mallikarjun Hangarge
    Abstract:

    In multilingual countries like India, majority of the documents may contain text information in more than one script/language forms. For automatic processing of such documents through optical character recognition (OCR), it is necessary to design multilingual OCR. With reference to Karnataka state, this paper proposed handwritten Kannada and English character recognition system. The proposed zone based pixel density features are employed for classification of Kannada and English characters. A total of 6,000 handwritten Kannada and English sample images are used for classification. The character images are normalised into 32 × 32 dimensions. Then the normalised images are divided into 64 zones and their pixel densities are calculated and generated a total of 64 features. Further, these features are fed to KNN and SVM classifiers for recognition of the said characters. To measure the performance of the classifiers two-fold cross validation is employed. The proposed algorithm classifies Kannada numerals, vowels and English numerals, uppercase alphabets independently and in combination of these. The average recognition accuracy of 89.21% with KNN and 93.22% with SVM classifiers are achieved. The novelty of the proposed algorithm is free from characters thinning and slants of the characters.

Mallikarjun Hangarge - One of the best experts on this subject based on the ideXlab platform.

  • A Zone Based Character Recognition Engine for Kannada and English Scripts
    Procedia Engineering, 2012
    Co-Authors: Gururaj Mukarambi, B. V. Dhandra, Mallikarjun Hangarge
    Abstract:

    Abstract In this paper, an Optical Character Recognition engine for Kannada and English character recognition is proposed based on zone features. The zone is one of the old concepts in case of document image analysis research. But this method is good in case of Kannada and English character recognition. The total of 2800 Kannada consonants and 2300 English lowercase alphabets sample images are classified based on the SVM classifier. All preprocessed images are normalized into 32 x 32 dimensions, it is optimum. Then the preprocessed image is divided into 64 zones of non overlapping and zone based pixel density is calculated for each of the 64 zones, there by generating 64 features. These features are fed to the SVM classifier for classification of character images. To test the performance of an algorithm 2 fold cross validation is used. The average recognition accuracy of 73.33% and 96.13% is obtained for Kannada consonants and English lowercase alphabets respectively. Further the average percentage of recognition accuracy of 83.02% is obtained for mixture input of both Kannada and English characters. The recognition accuracy obtained for Kannada consonants is low, because most of the characters are similar in shape. Hence, one may need to add some more dominating features to discriminating the characters. In this direction, the work is in progress. It is an initial attempt for mixture of Kannada and English characters recognition with single algorithm. The novelty of the algorithm is independent of thinning and slant of the characters.

  • Handwritten Kannada Vowels and English Character Recognition System
    International Journal of Image Processing and Vision Science, 2012
    Co-Authors: B. V. Dhandra, Gururaj Mukarambi, Mallikarjun Hangarge
    Abstract:

    In this paper, a zone based features are extracted from handwritten Kannada Vowels and English uppercase Character images for their recognition. A Total of 4,000 handwritten Kannada and English sample images are collected for classifications. The collected images are normalized into 32 x 32 dimensions. Then the normalized images are divided into 64 zones and their pixel densities are calculated, generating a total of 64 features. These 64 features are submitted to KNN and SVM classifiers with 2 fold cross validation for recognition of the said characters. The proposed algorithm works for individual Kannada vowels, English uppercase alphabets and mixture of both the characters. The recognition accuracy of 92.71% for KNN and 96.00% for SVM classifiers are achieved in case of handwritten Kannada vowels and 97.51% for KNN and 98.26% for SVM classifiers are obtained in case of handwritten English uppercase alphabets. Further, the recognition accuracy of 95.77% and 97.03% is obtained for mixed characters (i.e. Kannada Vowels and English uppercase alphabets). Hence, the proposed algorithm is efficient for the said characters recognition. The proposed algorithm is independent of thinning and slant of the characters and is the novelty of the proposed work.

  • A recognition system for handwritten Kannada and English characters
    International Journal of Computational Vision and Robotics, 2011
    Co-Authors: B. V. Dhandra, Gururaj Mukarambi, Mallikarjun Hangarge
    Abstract:

    In multilingual countries like India, majority of the documents may contain text information in more than one script/language forms. For automatic processing of such documents through optical character recognition (OCR), it is necessary to design multilingual OCR. With reference to Karnataka state, this paper proposed handwritten Kannada and English character recognition system. The proposed zone based pixel density features are employed for classification of Kannada and English characters. A total of 6,000 handwritten Kannada and English sample images are used for classification. The character images are normalised into 32 × 32 dimensions. Then the normalised images are divided into 64 zones and their pixel densities are calculated and generated a total of 64 features. Further, these features are fed to KNN and SVM classifiers for recognition of the said characters. To measure the performance of the classifiers two-fold cross validation is employed. The proposed algorithm classifies Kannada numerals, vowels and English numerals, uppercase alphabets independently and in combination of these. The average recognition accuracy of 89.21% with KNN and 93.22% with SVM classifiers are achieved. The novelty of the proposed algorithm is free from characters thinning and slants of the characters.

  • Spatial Features for Handwritten Kannada and English Character Recognition
    International Journal of Computer Applications, 2010
    Co-Authors: B. V. Dhandra, Mallikarjun Hangarge, Gururaj Mukarambi
    Abstract:

    This paper presents a handwritten Kannada and English Character recognition system based on spatial features. Directional spatial features viz stroke density, stroke length and the number of stokes are employed as potential features to characterize the handwritten Kannada numerals/vowels and English uppercase alphabets. KNN classifier is used to classify the characters based on these features with four fold cross validation. The proposed system achieves the recognition accuracy as 96.2%, 90.1% and 91.04% for handwritten Kannada numerals, vowels and English uppercase alphabets respectively. General Terms Pattern Recognition, Document Image Analysis

  • Kannada telugu and devanagari handwritten numeral recognition with probabilistic neural network a novel approach
    International Journal of Computer Applications, 2010
    Co-Authors: B. V. Dhandra, R G Benne, Mallikarjun Hangarge
    Abstract:

    In this paper, a novel approach for Kannada, Telugu and Devanagari handwritten numerals recognition based on global and local structural features is proposed. Probabilistic Neural Network (PNN) Classifier is used to classify the Kannada, Telugu and Devanagari numerals separately. Algorithm is validated with Kannada, Telugu and Devanagari numerals dataset by setting various radial values of PNN classifier under different experimental setup. The experimental results obtained are encouraging and comparable with other methods found in literature survey. The novelty of the proposed method is free from thinning and size normalization. General Terms Pattern Recognition, Document image processing.

Bellur Rajashekar - One of the best experts on this subject based on the ideXlab platform.

  • Specific language impairment in a morphologically complex agglutinative Indian language-Kannada.
    Journal of communication disorders, 2017
    Co-Authors: Shivani Tiwari, Prathibha Karanth, Bellur Rajashekar
    Abstract:

    Specific Language Impairment (SLI) remains an underinvestigated disorder in morphologically complex agglutinative languages such as Kannada. Currently, only a few case reports are available on SLI in Dravidian languages. The morphological complexity inherent to Dravidian languages such as Kannada provides a potential avenue to verify one of the two prevailing accounts of SLI: the morphological richness theory and CGC (Computational Grammatical Complexity) hypothesis. While the previous theory predicts the relatively spared performance of children with SLI (CwSLI) on syntactic morphology in morphologically complex languages, the latter predicts a diametrically opposite performance. Data from a group of 15 Kannada-speaking CwSLI supported the morphological richness theory, and further revealed five distinct profiles of SLI. The results of this study reflected that CwSLI learning the agglutinative language (Kannada) as compared with language-matched children without SLI, displayed some shared deficits (e.g., in phonological processing on a non-word repetition task) with CwSLI learning English. However, CwSLI learning the morphosyntactically rich language Kannada differed remarkably from English-learning CwSLI by not showing deficits in syntactic morphology relative to language-matched peers (e.g., PNG, verb, tense, case, and pronoun).