Slavic Languages

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3939 Experts worldwide ranked by ideXlab platform

Janez Brest - One of the best experts on this subject based on the ideXlab platform.

  • Slavic Languages in phrase-based statistical machine translation: a survey
    Artificial Intelligence Review, 2019
    Co-Authors: Mirjam Sepesy Maučec, Janez Brest
    Abstract:

    The demand for translations is increasing at a rate far beyond the capacity of professional translators. It is too difficult, time consuming and expensive to translate everything from scratch in each language. Machine translation offers a solution, as it provides translation automatically. Until recently, statistical machine translation has proved to be one of the most successful approaches. However, a new approach to machine translation based on neural networks has emerged with promising results. The present paper concerns phrase-based statistical machine translation, an area that has been extensively studied in the literature. The translation system consists of many components built on the premise of probabilities. Each component is described separately. Although high quality translation systems have been developed for certain language pairs, there is still a large number of Languages that cause many translation errors. Languages with a rich morphology pose an especially difficult challenge for research. We address one group of morphologically rich Languages: Slavic Languages, which constitute a relatively homogeneous family of Languages characterized by rich, inflectional morphology. The present paper offers a comprehensive survey of approaches to coping with Slavic Languages in different aspects of statistical machine translation. We observe that the interest of the community in research of more difficult Languages is increasing and we believe that the translation quality of those Languages will reach the level of practical use in the near future.

Nikola Ljubesic - One of the best experts on this subject based on the ideXlab platform.

  • exploring cross language statistical machine translation for closely related south Slavic Languages
    Empirical Methods in Natural Language Processing, 2014
    Co-Authors: Maja Popovic, Nikola Ljubesic
    Abstract:

    This work investigates the use of crosslanguage resources for statistical machine translation (SMT) between English and two closely related South Slavic Languages, namely Croatian and Serbian. The goal is to explore the effects of translating from and into one language using an SMT system trained on another. For translation into English, a loss due to cross-translation is about 13% of BLEU and for the other translation direction about 15%. The performance decrease for both Languages in both translation directions is mainly due to lexical divergences. Several language adaptation methods are explored, and it is shown that very simple lexical transformations already can yield a small improvement, and that the most promising adaptation method is using a Croatian-Serbian SMT system trained on a very small corpus.

Mirjam Sepesy Maučec - One of the best experts on this subject based on the ideXlab platform.

  • Slavic Languages in phrase-based statistical machine translation: a survey
    Artificial Intelligence Review, 2019
    Co-Authors: Mirjam Sepesy Maučec, Janez Brest
    Abstract:

    The demand for translations is increasing at a rate far beyond the capacity of professional translators. It is too difficult, time consuming and expensive to translate everything from scratch in each language. Machine translation offers a solution, as it provides translation automatically. Until recently, statistical machine translation has proved to be one of the most successful approaches. However, a new approach to machine translation based on neural networks has emerged with promising results. The present paper concerns phrase-based statistical machine translation, an area that has been extensively studied in the literature. The translation system consists of many components built on the premise of probabilities. Each component is described separately. Although high quality translation systems have been developed for certain language pairs, there is still a large number of Languages that cause many translation errors. Languages with a rich morphology pose an especially difficult challenge for research. We address one group of morphologically rich Languages: Slavic Languages, which constitute a relatively homogeneous family of Languages characterized by rich, inflectional morphology. The present paper offers a comprehensive survey of approaches to coping with Slavic Languages in different aspects of statistical machine translation. We observe that the interest of the community in research of more difficult Languages is increasing and we believe that the translation quality of those Languages will reach the level of practical use in the near future.

Maja Popovic - One of the best experts on this subject based on the ideXlab platform.

  • exploring cross language statistical machine translation for closely related south Slavic Languages
    Empirical Methods in Natural Language Processing, 2014
    Co-Authors: Maja Popovic, Nikola Ljubesic
    Abstract:

    This work investigates the use of crosslanguage resources for statistical machine translation (SMT) between English and two closely related South Slavic Languages, namely Croatian and Serbian. The goal is to explore the effects of translating from and into one language using an SMT system trained on another. For translation into English, a loss due to cross-translation is about 13% of BLEU and for the other translation direction about 15%. The performance decrease for both Languages in both translation directions is mainly due to lexical divergences. Several language adaptation methods are explored, and it is shown that very simple lexical transformations already can yield a small improvement, and that the most promising adaptation method is using a Croatian-Serbian SMT system trained on a very small corpus.

Erin Pappas - One of the best experts on this subject based on the ideXlab platform.