Hindi

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Pushpak Bhattacharyya - One of the best experts on this subject based on the ideXlab platform.

  • a fall back strategy for sentiment analysis in Hindi a case study
    2010
    Co-Authors: Aditya Joshi, Pushpak Bhattacharyya
    Abstract:

    Sentiment Analysis (SA) research has gained tremendous momentum in recent times. However, there has been little work in this area for an Indian language. We propose in this paper a fall-back strategy to do sentiment analysis for Hindi documents, a problem on which, to the best of our knowledge, no work has been done until now. (A) First of all, we study three approaches to perform SA in Hindi. We have developed a sentiment annotated corpora in the Hindi movie review domain. The first of our approaches involves training a classifier on this annotated Hindi corpus and using it to classify a new Hindi document. (B) In the second approach, we translate the given document into English and use a classifier trained on standard English movie reviews to classify the document. (C) In the third approach, we develop a lexical resource called Hindi-SentiWordNet (H-SWN) and implement a majority score based strategy to classify the given document.

  • a hybrid model for urdu Hindi transliteration
    International Joint Conference on Natural Language Processing, 2009
    Co-Authors: M Abbas G Malik, Laurent Besacier, Christian Boitet, Pushpak Bhattacharyya
    Abstract:

    We report in this paper a novel hybrid approach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of diacritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of diacritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only approach, especially when diacritic marks are not present in the Urdu input.

  • case markers and morphology addressing the crux of the fluency problem in english Hindi smt
    International Joint Conference on Natural Language Processing, 2009
    Co-Authors: Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek Ghosh, Pushpak Bhattacharyya
    Abstract:

    We report in this paper our work on accurately generating case markers and suffixes in English-to-Hindi SMT. Hindi is a relatively free word-order language, and makes use of a comparatively richer set of case markers and morphological suffixes for correct meaning representation. From our experience of large-scale English-Hindi MT, we are convinced that fluency and fidelity in the Hindi output get an order of magnitude facelift if accurate case markers and suffixes are produced. Now, the moot question is: what entity on the English side encodes the information contained in case markers and suffixes on the Hindi side? Our studies of correspondences in the two languages show that case markers and suffixes in Hindi are predominantly determined by the combination of suffixes and semantic relations on the English side. We, therefore, augment the aligned corpus of the two languages, with the correspondence of English suffixes and semantic relations with Hindi suffixes and case markers. Our results on 400 test sentences, translated using an SMT system trained on around 13000 parallel sentences, show that suffix + semantic relation → case marker/suffix is a very useful translation factor, in the sense of making a significant difference to output quality as indicated by subjective evaluation as well as BLEU scores.

  • simple syntactic and morphological processing can help english Hindi statistical machine translation
    International Joint Conference on Natural Language Processing, 2008
    Co-Authors: Ananthakrishnan Ramanathan, Pushpak Bhattacharyya, Jayprasad Hegde, Ritesh Shah, M Sasikumar
    Abstract:

    In this paper, we report our work on incorporating syntactic and morphological information for English to Hindi statistical machine translation. Two simple and computationally inexpensive ideas have proven to be surprisingly effective: (i) reordering the English source sentence as per Hindi syntax, and (ii) using the suffixes of Hindi words. The former is done by applying simple transformation rules on the English parse tree. The latter, by using a simple suffix separation program. With only a small amount of bilingual training data and limited tools for Hindi, we achieve reasonable performance and substantial improvements over the baseline phrase-based system. Our approach eschews the use of parsing or other sophisticated linguistic tools for the target language (Hindi) making it a useful framework for statistical machine translation from English to Indian languages in general, since such tools are not widely available for Indian languages currently.

Gurpreet Singh Lehal - One of the best experts on this subject based on the ideXlab platform.

  • web based Hindi to punjabi machine translation system
    Journal of Emerging Technologies in Web Intelligence, 2010
    Co-Authors: Vishal Goyal, Gurpreet Singh Lehal
    Abstract:

    Hindi and Punjabi are closely related languages with lots of similarities in syntax and vocabulary Both Punjabi and Hindi languages have originated from Sanskrit which is one of the oldest language. In terms of speakers, Hindi is third most widely spoken language and Punjabi is twelfth most widely spoken language. Punjabi language is mostly used in the Northern India and in some areas of Pakistan as well as in UK, Canada and USA. Hindi is the national language of India and is spoken and used by the people all over the country. In the present research, Basic Hindi to Punjabi machine translation system using direct translation approach has been developed. The results of this translation system are surprisingly good. The system includes lexicon based translation, transliteration and continuously improving the system through machine learning module. It also takes care of basic word sense disambiguation.

  • Evaluation of Hindi to Punjabi Machine Translation System
    arXiv: Computation and Language, 2009
    Co-Authors: Vishal Goyal, Gurpreet Singh Lehal
    Abstract:

    Machine Translation in India is relatively young. The earliest efforts date from the late 80s and early 90s. The success of every system is judged from its evaluation experimental results. Number of machine translation systems has been started for development but to the best of author knowledge, no high quality system has been completed which can be used in real applications. Recently, Punjabi University, Patiala, India has developed Punjabi to Hindi Machine translation system with high accuracy of about 92%. Both the systems i.e. system under question and developed system are between same closely related languages. Thus, this paper presents the evaluation results of Hindi to Punjabi machine translation system. It makes sense to use same evaluation criteria as that of Punjabi to Hindi Punjabi Machine Translation System. After evaluation, the accuracy of the system is found to be about 95%.

  • a two stage word segmentation system for handling space insertion problem in urdu script
    2009
    Co-Authors: Gurpreet Singh Lehal
    Abstract:

    Hindi and Urdu are variants of the same language, but while Hindi is written in the Devanagari script from left to right, Urdu is written in a script derived from a Persian modification of Arabic script written from right to left. To break the script barrier an Urdu-Devnagri transliteration system has been developed. The transliteration system faced many problems related to word segmentation of Urdu script, as in many cases space is not properly put between Urdu words. Sometimes it is deleted resulting in many Urdu words being jumbled together and many other times extra space is put in word resulting in over segmentation of that word. In this paper, a two-stage system for handling the extra space insertion problem in Urdu has been presented. In the first stage, Urdu grammar rules have been applied, while a statistical based approach has been employed in the second stage. For statistical analysis, lexical resources from both Urdu and Hindi languages, including Urdu and Hindi unigram and bigram probabilities have been used. In addition the Urdu-Devnagri transliteration module is also executed in parallel to help in decision making. The system was tested on 1.84 million word Urdu corpus and the success rate was 98.57%. This is the first time such a system has been developed for Urdu script.

  • a punjabi to Hindi machine translation system
    International Conference on Computational Linguistics, 2008
    Co-Authors: Gurpreet Singh Josan, Gurpreet Singh Lehal
    Abstract:

    Punjabi and Hindi are two closely related languages as both originated from the same origin and having lot of syntactic and semantic similarities. These similarities make direct translation methodology an obvious choice for Punjabi-Hindi language pair. The purposed system for Punjabi to Hindi translation has been implemented with various research techniques based on Direct MT architecture and language corpus. The output is evaluated by already prescribed methods in order to get the suitability of the system for the Punjabi Hindi language pair.

Vishal Goyal - One of the best experts on this subject based on the ideXlab platform.

  • rule based Hindi part of speech tagger
    International Conference on Computational Linguistics, 2012
    Co-Authors: Navneet Garg, Vishal Goyal, Suman Preet
    Abstract:

    Part of Speech Tagger is an important tool that is used to develop language translator and information extraction. The problem of tagging in natural language processing is to find a way to tag every word in a sentence. In this paper, we present a Rule Based Part of Speech Tagger for Hindi. Our System is evaluated over a corpus of 26,149 words with 30 different standard part of speech tags for Hindi. The evaluation of the system is done on the different domains of Hindi Corpus. These domains include news, essay, and short stories. Our system achieved the accuracy of 87.55%.

  • development of Hindi punjabi parallel corpus using existing Hindi punjabi machine translation system and using sentence alignments
    International Journal of Computer Applications, 2010
    Co-Authors: Pardeep Kumar, Vishal Goyal
    Abstract:

    ABSTACTIn this survey paper, we have taken problem of “development of Hindi-Punjabi parallel corpus using existing Hindi to Punjabi machine translation system and using sentence alignment”. The alignment based on the length based technique, location based technique and lexical techniques. We will use Hindi-Punjabi machine translation system (i.e h2p.learnpunjabi.org). These tasks are need to Hindi-Punjabi parallel corpus. Sentence alignment is useful to developing Hindi-Punjabi parallel corpus and Hindi-Punjabi dictionary. The accuracy is basically depending upon the complexity of the corpus, more the complexity less the accuracy. Complexity means how to distribution of sentence in the target file. If any of these categories 1:1, 1:2, 2:1, 1:3, 3:1 sentences occur simultaneously in a paragraph. Our objective in this research paper is to developed Hindi-Punjabi parallel corpus using latest and existing techniques and method with a high accuracy and time efficiency.

  • web based Hindi to punjabi machine translation system
    Journal of Emerging Technologies in Web Intelligence, 2010
    Co-Authors: Vishal Goyal, Gurpreet Singh Lehal
    Abstract:

    Hindi and Punjabi are closely related languages with lots of similarities in syntax and vocabulary Both Punjabi and Hindi languages have originated from Sanskrit which is one of the oldest language. In terms of speakers, Hindi is third most widely spoken language and Punjabi is twelfth most widely spoken language. Punjabi language is mostly used in the Northern India and in some areas of Pakistan as well as in UK, Canada and USA. Hindi is the national language of India and is spoken and used by the people all over the country. In the present research, Basic Hindi to Punjabi machine translation system using direct translation approach has been developed. The results of this translation system are surprisingly good. The system includes lexicon based translation, transliteration and continuously improving the system through machine learning module. It also takes care of basic word sense disambiguation.

  • Evaluation of Hindi to Punjabi Machine Translation System
    arXiv: Computation and Language, 2009
    Co-Authors: Vishal Goyal, Gurpreet Singh Lehal
    Abstract:

    Machine Translation in India is relatively young. The earliest efforts date from the late 80s and early 90s. The success of every system is judged from its evaluation experimental results. Number of machine translation systems has been started for development but to the best of author knowledge, no high quality system has been completed which can be used in real applications. Recently, Punjabi University, Patiala, India has developed Punjabi to Hindi Machine translation system with high accuracy of about 92%. Both the systems i.e. system under question and developed system are between same closely related languages. Thus, this paper presents the evaluation results of Hindi to Punjabi machine translation system. It makes sense to use same evaluation criteria as that of Punjabi to Hindi Punjabi Machine Translation System. After evaluation, the accuracy of the system is found to be about 95%.

  • Hindi morphological analyzer and generator
    International Conference on Emerging Trends in Engineering and Technology, 2008
    Co-Authors: Vishal Goyal, Singh G Lehal
    Abstract:

    Morphology is the field of the linguistics that studies the internal structure of the words. Morphological analysis and generation are essential steps in any NLP application. Morphological analysis means taking a word as input and identifying their stems and affixes. Morphological analysis provides information about a wordpsilas semantics and the syntactic role it plays in a sentence. Morphological analysis is essential for Hindi it has a rich system of inflectional morphology as like other Indo-Aryan family languages. Morphological analyzer and generator is a tool for analyzing the given word and generator for generating word given the stem and its features (like affixes). This paper presents the morphological analysis and generator tool for Hindi language using paradigm approach for Windows platform having GUI. This project has been developed as part of the development of a machine translation system from Hindi to Punjabi Language.

M Sasikumar - One of the best experts on this subject based on the ideXlab platform.

  • simple syntactic and morphological processing can help english Hindi statistical machine translation
    International Joint Conference on Natural Language Processing, 2008
    Co-Authors: Ananthakrishnan Ramanathan, Pushpak Bhattacharyya, Jayprasad Hegde, Ritesh Shah, M Sasikumar
    Abstract:

    In this paper, we report our work on incorporating syntactic and morphological information for English to Hindi statistical machine translation. Two simple and computationally inexpensive ideas have proven to be surprisingly effective: (i) reordering the English source sentence as per Hindi syntax, and (ii) using the suffixes of Hindi words. The former is done by applying simple transformation rules on the English parse tree. The latter, by using a simple suffix separation program. With only a small amount of bilingual training data and limited tools for Hindi, we achieve reasonable performance and substantial improvements over the baseline phrase-based system. Our approach eschews the use of parsing or other sophisticated linguistic tools for the target language (Hindi) making it a useful framework for statistical machine translation from English to Indian languages in general, since such tools are not widely available for Indian languages currently.

Dipti Misra Sharma - One of the best experts on this subject based on the ideXlab platform.

  • joining hands exploiting monolingual treebanks for parsing of code mixing data
    arXiv: Computation and Language, 2017
    Co-Authors: Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti Misra Sharma
    Abstract:

    In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.

  • the Hindi urdu treebank project
    2017
    Co-Authors: Riyaz Ahmad Bhat, Bhuvana Narasimhan, Rajesh Bhatt, Dipti Misra Sharma, Annahita Farudi, Prescott Klassen, Martha Palmer, Owen Rambow, Ashwini Vaidya, Sri Ramagurumurthy Vishnu
    Abstract:

    The goal of Hindi/Urdu treebanking project is to build multi-layered treebanks that will provide both syntactic and semantic annotations. In the past two decades, dozens of treebanks have been created for languages such as Arabic, Chinese, Czech, English, French, German, and many more. Our treebanks differ from the previous treebanks in two important aspects: they are multi-representational, i.e., they include several layers of representation from the initial design; and they cover two standardized registers that are often considered separate languages: Hindi and Urdu.

  • exploring semantic information in Hindi wordnet for Hindi dependency parsing
    International Joint Conference on Natural Language Processing, 2013
    Co-Authors: Sambhav Jain, Naman Jain, Aniruddha Tammewar, Riyaz Ahmad Bhat, Dipti Misra Sharma
    Abstract:

    In this paper, we present our efforts towards incorporating external knowledge from Hindi WordNet to aid dependency parsing. We conduct parsing experiments on Hindi, an Indo-Aryan language, utilizing the information from concept ontologies available in Hindi WordNet to complement the morpho-syntactic information already available. The work is driven by the insight that concept ontologies capture a specific real world aspect of lexical items, which is quite distinct and unlikely to be deduced from morpho-syntactic information such as morph, POS-tag and chunk. This complementing information is encoded as an additional feature for data driven parsing and experiments are conducted. We perform experiments over datasets of different sizes. We achieve an improvement of 1.1% (LAS) when training on 1,000 sentences and 0.2% (LAS) on 13,371 sentences over the baseline. The improvements are statistically significant at p<0.01. The higher improvements on 1,000 sentences suggest that the semantic information could address the data sparsity problem.

  • Hindi derivational morphological analyzer
    Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology, 2012
    Co-Authors: Nikhil Kanuparthi, Abhilash Inumella, Dipti Misra Sharma
    Abstract:

    Hindi is an Indian language which is relatively rich in morphology. A few morphological analyzers of this language have been developed. However, they give only inflectional analysis of the language. In this paper, we present our Hindi derivational morphological analyzer. Our algorithm upgrades an existing inflectional analyzer to a derivational analyzer and primarily achieves two goals. First, it successfully incorporates derivational analysis in the inflectional analyzer. Second, it also increases the coverage of the inflectional analysis of the existing inflectional analyzer.

  • two methods to incorporate local morphosyntactic features in Hindi dependency parsing
    North American Chapter of the Association for Computational Linguistics, 2010
    Co-Authors: Bharat Ram Ambati, Samar Husain, Sambhav Jain, Dipti Misra Sharma, Rajeev Sangal
    Abstract:

    In this paper we explore two strategies to incorporate local morphosyntactic features in Hindi dependency parsing. These features are obtained using a shallow parser. We first explore which information provided by the shallow parser is most beneficial and show that local morphosyntactic features in the form of chunk type, head/non-head information, chunk boundary information, distance to the end of the chunk and suffix concatenation are very crucial in Hindi dependency parsing. We then investigate the best way to incorporate this information during dependency parsing. Further, we compare the results of various experiments based on various criterions and do some error analysis. All the experiments were done with two data-driven parsers, MaltParser and MSTParser, on a part of multi-layered and multi-representational Hindi Treebank which is under development. This paper is also the first attempt at complete sentence level parsing for Hindi.