Dialectology

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Martijn Wieling - One of the best experts on this subject based on the ideXlab platform.

  • Dialectology for computational linguists
    Similar languages varieties and dialects, 2021
    Co-Authors: John Nerbonne, Wilbert Heeringa, Jelena Prokic, Martijn Wieling
    Abstract:

    This paper provides an overview of computational work in Dialectology. We have published similar surveys in the not-too-distant past (Heeringa and Prokic, 2018; Wieling and Nerbonne, 2015), but these were aimed at dialectologists and general linguists, respectively. This article is aimed at computational linguists, so that we will focus less on the nuts and bolts of exploiting the computer in research on dialects (which is documented in the articles we cite) and more on background assumptions and emerging issues and opportunities.

  • Applying the Levenshtein Distance to Catalan dialects: A brief comparison of two dialectometric approaches
    2012
    Co-Authors: Esteve Valls, John Nerbonne, Martijn Wieling, Jelena Prokić, Esteve Clua, Maria-rosa Lloret
    Abstract:

    In recent years, dialectometry has gained interest among Catalan dialectologists. As a consequence, a specific dialectometric approach has been developed at the University of Barcelona, which aims at increasing the accuracy of final groupings by means of discriminating the predictable components of the language from its unpredictable ones. Another popular method to obtain dialect distances is the Levenshtein distance (LD) which has never been applied to a Catalan corpus so far. The goal of this paper is to present the results of applying the LD to a corpus of Catalan linguistic data, and to compare the results from this analysis both with the results from Barcelona and the traditional classifications of Catalan Dialectology.

  • quantitative social Dialectology explaining linguistic variation geographically and socially
    PLOS ONE, 2011
    Co-Authors: Martijn Wieling, John Nerbonne, Harald R Baayen
    Abstract:

    In this study we examine linguistic variation and its dependence on both social and geographic factors. We follow dialectometry in applying a quantitative methodology and focusing on dialect distances, and social Dialectology in the choice of factors we examine in building a model to predict word pronunciation distances from the standard Dutch language to 424 Dutch dialects. We combine linear mixed-effects regression modeling with generalized additive modeling to predict the pronunciation distance of 559 words. Although geographical position is the dominant predictor, several other factors emerged as significant. The model predicts a greater distance from the standard for smaller communities, for communities with a higher average age, for nouns (as contrasted with verbs and adjectives), for more frequent words, and for words with relatively many vowels. The impact of the demographic variables, however, varied from word to word. For a majority of words, larger, richer and younger communities are moving towards the standard. For a smaller minority of words, larger, richer and younger communities emerge as driving a change away from the standard. Similarly, the strength of the effects of word frequency and word category varied geographically. The peripheral areas of the Netherlands showed a greater distance from the standard for nouns (as opposed to verbs and adjectives) as well as for high-frequency words, compared to the more central areas. Our findings indicate that changes in pronunciation have been spreading (in particular for low-frequency words) from the Hollandic center of economic power to the peripheral areas of the country, meeting resistance that is stronger wherever, for well-documented historical reasons, the political influence of Holland was reduced. Our results are also consistent with the theory of lexical diffusion, in that distances from the Hollandic norm vary systematically and predictably on a word by word basis.

  • hierarchical spectral partitioning of bipartite graphs to cluster dialects and identify distinguishing features
    Workshop on Graph Based Methods for Natural Language Processing, 2010
    Co-Authors: Martijn Wieling, John Nerbonne
    Abstract:

    In this study we apply hierarchical spectral partitioning of bipartite graphs to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative Dialectology. Besides showing that the results of the hierarchical clustering improve over the flat spectral clustering method used in an earlier study (Wieling and Nerbonne, 2009), the values of the second singular vector used to generate the two-way clustering can be used to identify the most important sound correspondences for each cluster. This is an important advantage of the hierarchical method as it obviates the need for external methods to determine the most important sound correspondences for a geographical cluster.

  • bipartite spectral graph partitioning to co cluster varieties and sound correspondences in Dialectology
    Graph-based Methods for Natural Language Processing, 2009
    Co-Authors: Martijn Wieling, John Nerbonne
    Abstract:

    In this study we used bipartite spectral graph partitioning to simultaneously cluster varieties and sound correspondences in Dutch dialect data. While clustering geographical varieties with respect to their pronunciation is not new, the simultaneous identification of the sound correspondences giving rise to the geographical clustering presents a novel opportunity in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant sound correspondences which co-varied with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual sound correspondences which are associated, even while seeking groups of sites which share sound correspondences. We show that the application of this method results in clear and sensible geographical groupings and discuss the concomitant sound correspondences.

Harald R Baayen - One of the best experts on this subject based on the ideXlab platform.

  • quantitative social Dialectology explaining linguistic variation geographically and socially
    PLOS ONE, 2011
    Co-Authors: Martijn Wieling, John Nerbonne, Harald R Baayen
    Abstract:

    In this study we examine linguistic variation and its dependence on both social and geographic factors. We follow dialectometry in applying a quantitative methodology and focusing on dialect distances, and social Dialectology in the choice of factors we examine in building a model to predict word pronunciation distances from the standard Dutch language to 424 Dutch dialects. We combine linear mixed-effects regression modeling with generalized additive modeling to predict the pronunciation distance of 559 words. Although geographical position is the dominant predictor, several other factors emerged as significant. The model predicts a greater distance from the standard for smaller communities, for communities with a higher average age, for nouns (as contrasted with verbs and adjectives), for more frequent words, and for words with relatively many vowels. The impact of the demographic variables, however, varied from word to word. For a majority of words, larger, richer and younger communities are moving towards the standard. For a smaller minority of words, larger, richer and younger communities emerge as driving a change away from the standard. Similarly, the strength of the effects of word frequency and word category varied geographically. The peripheral areas of the Netherlands showed a greater distance from the standard for nouns (as opposed to verbs and adjectives) as well as for high-frequency words, compared to the more central areas. Our findings indicate that changes in pronunciation have been spreading (in particular for low-frequency words) from the Hollandic center of economic power to the peripheral areas of the country, meeting resistance that is stronger wherever, for well-documented historical reasons, the political influence of Holland was reduced. Our results are also consistent with the theory of lexical diffusion, in that distances from the Hollandic norm vary systematically and predictably on a word by word basis.

John Nerbonne - One of the best experts on this subject based on the ideXlab platform.

  • Dialectology for computational linguists
    Similar languages varieties and dialects, 2021
    Co-Authors: John Nerbonne, Wilbert Heeringa, Jelena Prokic, Martijn Wieling
    Abstract:

    This paper provides an overview of computational work in Dialectology. We have published similar surveys in the not-too-distant past (Heeringa and Prokic, 2018; Wieling and Nerbonne, 2015), but these were aimed at dialectologists and general linguists, respectively. This article is aimed at computational linguists, so that we will focus less on the nuts and bolts of exploiting the computer in research on dialects (which is documented in the articles we cite) and more on background assumptions and emerging issues and opportunities.

  • the handbook of Dialectology
    Published in 2018, 2018
    Co-Authors: Charles Boberg, John Nerbonne, Dominic Watt
    Abstract:

    Intro -- Title Page -- Copyright Page -- Contents -- List of Contributors -- Introduction -- 1 The Origins of Dialect Variation and the Status of Dialectology -- 2 Defining Dialects -- 3 The Origins and Development of Dialectology -- 4 The Present and Future State of Dialectology -- 5 Rationale and Plan of This Book -- References -- Section 1 - Theory: Introduction -- References -- Chapter 1 Dialectology, Philology, and Historical Linguistics -- 1.1 Introduction -- 1.2 Dialect Awareness and Attitudes -- 1.3 The Description of Dialects -- 1.4 The Antiquarian Tradition -- 1.5 Dialects in the Age of Prescriptivism -- 1.6 The Denigration of Dialects -- 1.7 From Philology to Linguistics -- 1.8 Features of Indo‐European Studies and Comparative Philology -- 1.9 The Dawn of Modern Dialectology: The Beginnings of a New Discipline -- 1.10 Dialect Societies and Materials -- 1.11 Dialect Studies -- 1.12 Data Collection Methods -- 1.13 Accessibility of Data -- 1.14 Dialectology and General Linguistics -- 1.15 Structuralism and Generativism -- 1.16 Dialectometry -- 1.17 The Rise of New Dialects -- 1.18 Conclusion -- References -- Chapter 2 The Dialect Dictionary -- 2.1 Introduction -- 2.2 The User's Perspective: Meta-Lexicographical Considerations (Why, and for Whom) -- 2.3 Macrostructural Considerations -- 2.4 Onomasiological or Semasiological Arrangement -- 2.5 Onomasiological Arrangements -- 2.6 Semasiological Arrangement -- 2.7 Microstructural Considerations -- 2.8 Data Collection by Fieldwork -- 2.9 Purposive Systematic Fieldwork -- 2.10 Oral Investigation (Direct Method) -- 2.11 Investigation by Correspondence (Indirect Method) -- 2.12 The Questionnaire -- 2.13 The Structure of the Questionnaire -- 2.14 Question Types (+ Examples) -- 2.15 How Much? -- 2.16 New Technologies and Desiderata for the Future -- Notes -- References.

  • Applying the Levenshtein Distance to Catalan dialects: A brief comparison of two dialectometric approaches
    2012
    Co-Authors: Esteve Valls, John Nerbonne, Martijn Wieling, Jelena Prokić, Esteve Clua, Maria-rosa Lloret
    Abstract:

    In recent years, dialectometry has gained interest among Catalan dialectologists. As a consequence, a specific dialectometric approach has been developed at the University of Barcelona, which aims at increasing the accuracy of final groupings by means of discriminating the predictable components of the language from its unpredictable ones. Another popular method to obtain dialect distances is the Levenshtein distance (LD) which has never been applied to a Catalan corpus so far. The goal of this paper is to present the results of applying the LD to a corpus of Catalan linguistic data, and to compare the results from this analysis both with the results from Barcelona and the traditional classifications of Catalan Dialectology.

  • quantitative social Dialectology explaining linguistic variation geographically and socially
    PLOS ONE, 2011
    Co-Authors: Martijn Wieling, John Nerbonne, Harald R Baayen
    Abstract:

    In this study we examine linguistic variation and its dependence on both social and geographic factors. We follow dialectometry in applying a quantitative methodology and focusing on dialect distances, and social Dialectology in the choice of factors we examine in building a model to predict word pronunciation distances from the standard Dutch language to 424 Dutch dialects. We combine linear mixed-effects regression modeling with generalized additive modeling to predict the pronunciation distance of 559 words. Although geographical position is the dominant predictor, several other factors emerged as significant. The model predicts a greater distance from the standard for smaller communities, for communities with a higher average age, for nouns (as contrasted with verbs and adjectives), for more frequent words, and for words with relatively many vowels. The impact of the demographic variables, however, varied from word to word. For a majority of words, larger, richer and younger communities are moving towards the standard. For a smaller minority of words, larger, richer and younger communities emerge as driving a change away from the standard. Similarly, the strength of the effects of word frequency and word category varied geographically. The peripheral areas of the Netherlands showed a greater distance from the standard for nouns (as opposed to verbs and adjectives) as well as for high-frequency words, compared to the more central areas. Our findings indicate that changes in pronunciation have been spreading (in particular for low-frequency words) from the Hollandic center of economic power to the peripheral areas of the country, meeting resistance that is stronger wherever, for well-documented historical reasons, the political influence of Holland was reduced. Our results are also consistent with the theory of lexical diffusion, in that distances from the Hollandic norm vary systematically and predictably on a word by word basis.

  • gabmap a web application for Dialectology
    Dialectologia, 2011
    Co-Authors: John Nerbonne, R Colen, Charlotte Gooskens, Therese Leinonen, Peter Kleiweg
    Abstract:

    Gabmap 2 is a web application aimed especially to facilitat e explorations in quantitative Dialectology — or dialectometry — by enabling researchers in Dialectology to conduct computersupported explorations and calculations even if the y have relatively little computational expertise. Gabmap creates various views of dialect data, from histograms of characters used to spot coding errors , to alignments of phonetic transcriptions used in me asuring pronunciation distance, to colored multidimensional scaling plots intended to illustrate qu antitative results insightfully. Many analyses are accompanied by facilities allowing researchers to p robe further, e.g. seeking the most important lingu istic bases of an areal division, or examining the result s of clustering for statistical reliability. These are also intended to inform the critical discussion of quant itative techniques, i.e. a comparision between quantitative analyses and non-quantitative (qualita tive) work. For this reason Gabmap also includes support for qualitative analyses, such as facilitie s to map the occurrence of individual features. The software is in use, and the source code is openly a vailable.

Trevor Cohn - One of the best experts on this subject based on the ideXlab platform.

  • continuous representation of location for geolocation and lexical Dialectology using mixture density networks
    Empirical Methods in Natural Language Processing, 2017
    Co-Authors: Afshin Rahimi, Timothy Baldwin, Trevor Cohn
    Abstract:

    We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical Dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical Dialectology, and evaluate it using the DARE dataset.

  • a neural model for user geolocation and lexical Dialectology
    Meeting of the Association for Computational Linguistics, 2017
    Co-Authors: Afshin Rahimi, Trevor Cohn, Timothy Baldwin
    Abstract:

    We propose a simple yet effective text-based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods.

Afshin Rahimi - One of the best experts on this subject based on the ideXlab platform.

  • continuous representation of location for geolocation and lexical Dialectology using mixture density networks
    Empirical Methods in Natural Language Processing, 2017
    Co-Authors: Afshin Rahimi, Timothy Baldwin, Trevor Cohn
    Abstract:

    We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical Dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical Dialectology, and evaluate it using the DARE dataset.

  • a neural model for user geolocation and lexical Dialectology
    Meeting of the Association for Computational Linguistics, 2017
    Co-Authors: Afshin Rahimi, Trevor Cohn, Timothy Baldwin
    Abstract:

    We propose a simple yet effective text-based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms. As part of our analysis of dialectal terms, we release DAREDS, a dataset for evaluating dialect term detection methods.