Natural Languages

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 309 Experts worldwide ranked by ideXlab platform

Eduardo G Altmann - One of the best experts on this subject based on the ideXlab platform.

  • stochastic model for the vocabulary growth in Natural Languages
    Physical Review X, 2013
    Co-Authors: Martin Gerlach, Eduardo G Altmann
    Abstract:

    Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany(Received 4 December 2012; revised manuscript received 20 March 2013; published 14 May 2013)Weproposeastochasticmodelforthenumberofdifferentwordsinagivendatabasewhichincorporatesthe dependenceonthedatabasesizeandhistoricalchanges.Themain featureofourmodelistheexistenceof two different classes of words: (i) a finite number of core words, which have higher frequency and donot affect the probability of a new word to be used, and (ii) the remaining virtually infinite number ofnoncore words, which have lower frequency and, once used, reduce the probability of a new word to beused in the future. Our model relies on a careful analysis of the Google Ngram database of bookspublished in the last centuries, and its main consequence is the generalization of Zipf’s and Heaps’ law totwo-scaling regimes. We confirm that these generalizations yield the best simple description of the dataamong generic descriptive models and that the two free parameters depend only on the language but noton the database. From the point of view of our model, the main change on historical time scales is thecomposition of the specific words included in the finite list of core words, which we observe to decayexponentially in time with a rate of approximately 30 words per year for English.

  • Stochastic model for the vocabulary growth in Natural Languages
    Physical Review X, 2013
    Co-Authors: Martin Gerlach, Eduardo G Altmann
    Abstract:

    We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually infinite number of noncore-words which have lower frequency and once used reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the google-ngram database of books published in the last centuries and its main consequence is the generalization of Zipf's and Heaps' law to two scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model the main change on historical time scales is the composition of the specific words included in the finite list of core-words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.

Martin Gerlach - One of the best experts on this subject based on the ideXlab platform.

  • stochastic model for the vocabulary growth in Natural Languages
    Physical Review X, 2013
    Co-Authors: Martin Gerlach, Eduardo G Altmann
    Abstract:

    Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany(Received 4 December 2012; revised manuscript received 20 March 2013; published 14 May 2013)Weproposeastochasticmodelforthenumberofdifferentwordsinagivendatabasewhichincorporatesthe dependenceonthedatabasesizeandhistoricalchanges.Themain featureofourmodelistheexistenceof two different classes of words: (i) a finite number of core words, which have higher frequency and donot affect the probability of a new word to be used, and (ii) the remaining virtually infinite number ofnoncore words, which have lower frequency and, once used, reduce the probability of a new word to beused in the future. Our model relies on a careful analysis of the Google Ngram database of bookspublished in the last centuries, and its main consequence is the generalization of Zipf’s and Heaps’ law totwo-scaling regimes. We confirm that these generalizations yield the best simple description of the dataamong generic descriptive models and that the two free parameters depend only on the language but noton the database. From the point of view of our model, the main change on historical time scales is thecomposition of the specific words included in the finite list of core words, which we observe to decayexponentially in time with a rate of approximately 30 words per year for English.

  • Stochastic model for the vocabulary growth in Natural Languages
    Physical Review X, 2013
    Co-Authors: Martin Gerlach, Eduardo G Altmann
    Abstract:

    We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core-words which have higher frequency and do not affect the probability of a new word to be used; and (ii) the remaining virtually infinite number of noncore-words which have lower frequency and once used reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the google-ngram database of books published in the last centuries and its main consequence is the generalization of Zipf's and Heaps' law to two scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model the main change on historical time scales is the composition of the specific words included in the finite list of core-words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.

L.a. Zadeh - One of the best experts on this subject based on the ideXlab platform.

  • The concept of a generalized constraint - a bridge from Natural Languages to mathematics
    NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society, 2005
    Co-Authors: L.a. Zadeh
    Abstract:

    The concept of a generalized constraint was introduced close to two decades ago. For a number of years, it lay dormant and unused. But then, in the mid-nineties, it found an important application as a basis for the methodology of computing with words (CW). More recently, an idea which began to crystallize is that the concept of a generalized constraint may serve an important function as a bridge from Natural Languages to mathematics. In the role, it may find many applications, ranging from formalization of legal reasoning to enhancement of Web intelligence and Natural language understanding. The basis for the expectation is that as we move further into the age of machine intelligence and mechanized reasoning, the problem of Natural language understanding looms larger and larger in importance and visibility. Traditional approaches to Natural language understanding are based on classical, Aristotelian, bivalent logic. So far, use of traditional approaches has met with limited successes. The principal problem is that, basically, Natural Languages are systems for describing perceptions, and as such are intrinsically imprecise in ways that put them beyond the reach of bivalent logic and probability theory.

  • Precisiated Natural language (PNL) - toward an enlargement of the role of Natural Languages in scientific theories
    2004 IEEE International Conference on Fuzzy Systems (IEEE Cat. No.04CH37542), 2004
    Co-Authors: L.a. Zadeh
    Abstract:

    This work discusses the precisiated Natural language, which is a subset of a Natural language. This subset is equipped with constraint-centered semantics (CSNL) and is translatable into what is called the generalized constraint language. The ways does the concept of a precisiated Natural language can serve to enlarge the role of the Natural Languages in scientific theories were also discussed.

  • Precisiated Natural language (PNL) - toward an enlargement of the role of Natural Languages in computation, deduction, definition and decision
    International Conference on Natural Language Processing and Knowledge Engineering 2003. Proceedings. 2003, 2003
    Co-Authors: L.a. Zadeh
    Abstract:

    It is a deep-seated tradition in science to view the use of Natural Languages in scientific theories as a manifestation of mathematical immaturity. In particular, a direct consequence is that existing scientific theories do not have the capability to operate on perception-based information. Such information is usually described in a Natural language and is intrinsically imprecise, reflecting a fundamental limitation on the cognitive ability of humans to resolve detail and store information. In this paper we describe that the high expressive power of Natural Languages is harnessed by constructing what is called a precisiated Natural language (PNL). In essence, PNL is a subset of a Natural language (NL) - a subset, which is equipped with constraint-centered semantics (CSNL) and is translatable into what is called the generalized constraint language (GCL).

  • Precisiated Natural language - toward a radical enlargement of the role of Natural Languages in information processing, decision and control
    Proceedings of the 9th International Conference on Neural Information Processing 2002. ICONIP '02., 2002
    Co-Authors: L.a. Zadeh
    Abstract:

    It is a deep-seated tradition in science to view the use of Natural Languages in scientific theories as a manifestation of mathematical immaturity. The rationale for this tradition is that Natural Languages are lacking in precision. In a related way, the restricted expressive power of predicate-logic-based Languages rules out the possibility of defining many basic concepts such as causality, resemblance, smoothness and relevance in realistic terms. In this instance, as in many others, the price of precision is over-idealization and lack of robustness. In a significant departure from existing methods, in the approach which is described in this talk the high expressive power of Natural Languages is harnessed by constructing what is called a precisiated Natural language (PNL). In essence, PNL is a subset of a Natural language (NL): a subset which is equipped with constraint-centered semantics (CSNL) and is translatable into what is called the Generalized Constraint Language (GCL).

Mohammad R K Mofrad - One of the best experts on this subject based on the ideXlab platform.

  • comparing fifty Natural Languages and twelve genetic Languages using word embedding language divergence weld as a quantitative measure of language distance
    North American Chapter of the Association for Computational Linguistics, 2016
    Co-Authors: Ehsaneddin Asgari, Mohammad R K Mofrad
    Abstract:

    Author(s): Asgari, Ehsaneddin; Mofrad, Mohammad RK | Abstract: We introduce a new measure of distance between Languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between Languages. Using such a measure, we perform language comparison for fifty Natural Languages and twelve genetic Languages. Our Natural language dataset is a collection of sentence-aligned parallel corpora from bible translations for fifty Languages spanning a variety of language families. Although we use parallel corpora, which guarantees having the same content in all Languages, interestingly in many cases Languages within the same family cluster together. In addition to Natural Languages, we perform language comparison for the coding regions in the genomes of 12 different organisms (4 plants, 6 animals, and two human subjects). Our result confirms a significant high-level difference in the genetic language model of humans/animals versus plants. The proposed method is a step toward defining a quantitative measure of similarity between Languages, with applications in Languages classification, genre identification, dialect identification, and evaluation of translations.

  • comparing fifty Natural Languages and twelve genetic Languages using word embedding language divergence weld as a quantitative measure of language distance
    arXiv: Computation and Language, 2016
    Co-Authors: Ehsaneddin Asgari, Mohammad R K Mofrad
    Abstract:

    We introduce a new measure of distance between Languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between Languages. Using such a measure, we perform language comparison for fifty Natural Languages and twelve genetic Languages. Our Natural language dataset is a collection of sentence-aligned parallel corpora from bible translations for fifty Languages spanning a variety of language families. Although we use parallel corpora, which guarantees having the same content in all Languages, interestingly in many cases Languages within the same family cluster together. In addition to Natural Languages, we perform language comparison for the coding regions in the genomes of 12 different organisms (4 plants, 6 animals, and two human subjects). Our result confirms a significant high-level difference in the genetic language model of humans/animals versus plants. The proposed method is a step toward defining a quantitative measure of similarity between Languages, with applications in Languages classification, genre identification, dialect identification, and evaluation of translations.

Rolf Schwitter - One of the best experts on this subject based on the ideXlab platform.

  • controlled Natural Languages for knowledge representation
    International Conference on Computational Linguistics, 2010
    Co-Authors: Rolf Schwitter
    Abstract:

    This paper presents a survey of research in controlled Natural Languages that can be used as high-level knowledge representation Languages. Over the past 10 years or so, a number of machine-oriented controlled Natural Languages have emerged that can be used as high-level interface Languages to various kinds of knowledge systems. These Languages are relevant to the area of computational linguistics since they have two very interesting properties: firstly, they look informal like Natural Languages and are therefore easier to write and understand by humans than formal Languages; secondly, they are precisely defined subsets of Natural Languages and can be translated automatically (and often deterministically) into a formal target language and then be used for automated reasoning. We present and compare the most mature of these novel Languages, show how they can balance the disadvantages of Natural Languages and formal Languages for knowledge representation, and discuss how domain specialists can be supported writing specifications in controlled Natural language.

  • writing support for controlled Natural Languages
    Proceedings of the Australasian Language Technology Association Workshop 2008, 2008
    Co-Authors: Tobias Kuhn, Rolf Schwitter
    Abstract:

    In this paper we present interface techniques that support the writing process of machine-oriented controlled Natural Languages which are well-defined and tractable fragments of English that can be translated unambiguously into a formal target language. Since these Languages have grammatical and lexical restrictions, it is important to provide a text editor that assists the writing process by using lookahead information derived from the grammar. We will discuss the requirements to such a lookahead text editor and introduce the semantic wiki AceWiki as an application where this technology plays an important role. We investigate two different approaches how lookahead information can be generated dynamically while a text is written and compare the runtimes and practicality of these approaches in detail.

  • a comparison of three controlled Natural Languages for owl 1 1
    OWL: Experiences and Directions, 2008
    Co-Authors: Rolf Schwitter, Kaarel Kaljurand, Anne Cregan, Catherine Dolbear, Glen Hart
    Abstract:

    At OWLED2007 a task force was formed to work towards a common Controlled Natural Language Syntax for OWL 1.1. In this paper members of the task force compare three controlled Natural Languages (CNLs) — Attempto Controlled English (ACE), Ordnance Survey Rabbit (Rabbit), and Sydney OWL Syntax (SOS) — that have been designed to express the logical content of OWL 1.1 ontologies. The common goal of these three Languages is to make OWL ontologies accessible to people with no training in formal logics. We briefly introduce these three CNLs and discuss a number of requirements to an OWL-compatible CNL that have emerged from the present work. We then summarise the similarities and differences of the three CNLs and make some preliminary recommendations to an OWL-compatible CNL.

  • controlled Natural Languages meets the semantic web
    Australian Language Technology Workshop 2004, 2004
    Co-Authors: Rolf Schwitter, Marc Tilbrook
    Abstract:

    In this paper we present PENG-D, a proposal for a controlled Natural language that can be used for expressing knowledge about resources in the Semantic Web and for specifying ontologies in a human-readable way. After a brief overview of the main Semantic Web enabling technologies (and their deficiencies), we will show how statements and rules written in PENG-D are related to (a subset of) RDFS and OWL and how this knowledge can be translated into an expressive fragment of first-order logic. The resulting information can then be further processed by third-party reasoning services and queried in PENG-D.