Native Language

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Andrea Lassmann - One of the best experts on this subject based on the ideXlab platform.

  • the causal impact of common Native Language on international trade evidence from a spatial regression discontinuity design
    The Economic Journal, 2015
    Co-Authors: Peter Egger, Andrea Lassmann
    Abstract:

    This paper studies the effect of sharing a common Native Language on inter- national trade. Switzerland hosts three major Native Language groups which adjoin countries sharing the same Native majority Languages. In regions close to the internal Language border the alternate major Language is taught early on in school and not only understood but spoken by the residents. This setting allows for an assessment of the impact of common Native rather than spoken Language on transaction-level imports from neighbouring countries. Our findings point to an effect of common Native Language on extensive rather than on intensive margins of trade.

  • the causal impact of common Native Language on international trade evidence from a spatial regression discontinuity design
    Social Science Research Network, 2015
    Co-Authors: Peter Egger, Andrea Lassmann
    Abstract:

    This article studies the effect of sharing a common Native Language (CNL) on international trade. Switzerland hosts three major Native Language groups which adjoin countries sharing the same Native majority Languages. In regions close to the internal Language border the alternate major Language is taught early on in school and not only understood but spoken by the residents. This setting allows for an assessment of the impact of common Native rather than spoken Language on transaction‐level imports from neighbouring countries. Our findings point to an effect of CNL on extensive rather than on intensive margins of trade.

  • the causal impact of common Native Language on international trade evidence from a spatial regression discontinuity design
    Research Papers in Economics, 2013
    Co-Authors: Peter Egger, Andrea Lassmann
    Abstract:

    This paper studies the causal effect of sharing a common Native Language on international trade. Switzerland is a multilingual country that hosts four official Language groups of which three are major (French, German, and Italian). These groups of Native Language speakers are geographically separated, with the corresponding regions bordering countries which share a majority of speakers of the same Native Language. All of the three main Languages are understood and spoken by most Swiss citizens, especially the ones residing close to internal Language borders in Switzerland. This unique setting allows for an assessment of the impact of common Native (rather than spoken) Language as a cultural aspect of Language on trade from within country-pairs. We do so by exploiting the discontinuity in various international bilateral trade outcomes based on Swiss transaction-level data at historical Language borders within Switzerland. The effect on various margins of imports is positive and significant. The results suggest that, on average, common Native Language between regions biases the regional structure of the value of international imports towards them by 18 percentage points and that of the number of import transactions by 20 percentage points. In addition, regions import 102 additional products from a neighboring country sharing a common Native Language compared to a different Native Language exporter. This effect is considerably lower than the overall estimate (using aggregate bilateral trade and no regression discontinuity design) of common official Language on Swiss international imports in the same sample. The latter subsumes both the effect of common spoken Language as a communication factor and of confounding economic and institutional factors and is quantitatively well in line with the common official (spoken or Native) Language coefficient in many gravity model estimates of international trade.

Peter Egger - One of the best experts on this subject based on the ideXlab platform.

  • the causal impact of common Native Language on international trade evidence from a spatial regression discontinuity design
    The Economic Journal, 2015
    Co-Authors: Peter Egger, Andrea Lassmann
    Abstract:

    This paper studies the effect of sharing a common Native Language on inter- national trade. Switzerland hosts three major Native Language groups which adjoin countries sharing the same Native majority Languages. In regions close to the internal Language border the alternate major Language is taught early on in school and not only understood but spoken by the residents. This setting allows for an assessment of the impact of common Native rather than spoken Language on transaction-level imports from neighbouring countries. Our findings point to an effect of common Native Language on extensive rather than on intensive margins of trade.

  • the causal impact of common Native Language on international trade evidence from a spatial regression discontinuity design
    Social Science Research Network, 2015
    Co-Authors: Peter Egger, Andrea Lassmann
    Abstract:

    This article studies the effect of sharing a common Native Language (CNL) on international trade. Switzerland hosts three major Native Language groups which adjoin countries sharing the same Native majority Languages. In regions close to the internal Language border the alternate major Language is taught early on in school and not only understood but spoken by the residents. This setting allows for an assessment of the impact of common Native rather than spoken Language on transaction‐level imports from neighbouring countries. Our findings point to an effect of CNL on extensive rather than on intensive margins of trade.

  • the causal impact of common Native Language on international trade evidence from a spatial regression discontinuity design
    Research Papers in Economics, 2013
    Co-Authors: Peter Egger, Andrea Lassmann
    Abstract:

    This paper studies the causal effect of sharing a common Native Language on international trade. Switzerland is a multilingual country that hosts four official Language groups of which three are major (French, German, and Italian). These groups of Native Language speakers are geographically separated, with the corresponding regions bordering countries which share a majority of speakers of the same Native Language. All of the three main Languages are understood and spoken by most Swiss citizens, especially the ones residing close to internal Language borders in Switzerland. This unique setting allows for an assessment of the impact of common Native (rather than spoken) Language as a cultural aspect of Language on trade from within country-pairs. We do so by exploiting the discontinuity in various international bilateral trade outcomes based on Swiss transaction-level data at historical Language borders within Switzerland. The effect on various margins of imports is positive and significant. The results suggest that, on average, common Native Language between regions biases the regional structure of the value of international imports towards them by 18 percentage points and that of the number of import transactions by 20 percentage points. In addition, regions import 102 additional products from a neighboring country sharing a common Native Language compared to a different Native Language exporter. This effect is considerably lower than the overall estimate (using aggregate bilateral trade and no regression discontinuity design) of common official Language on Swiss international imports in the same sample. The latter subsumes both the effect of common spoken Language as a communication factor and of confounding economic and institutional factors and is quantitatively well in line with the common official (spoken or Native) Language coefficient in many gravity model estimates of international trade.

Mark Dras - One of the best experts on this subject based on the ideXlab platform.

  • Native Language identification with classifier stacking and ensembles
    Computational Linguistics, 2018
    Co-Authors: Shervin Malmasi, Mark Dras
    Abstract:

    Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. Howe...

  • Native Language identification with classifier stacking and ensembles
    Computational Linguistics, 2018
    Co-Authors: Shervin Malmasi, Mark Dras
    Abstract:

    Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.

  • unsupervised text segmentation based on Native Language characteristics
    Meeting of the Association for Computational Linguistics, 2017
    Co-Authors: Shervin Malmasi, Mark Dras, Mark Johnson, Magdalena Wolska
    Abstract:

    Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or Native Language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact Language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

  • multilingual Native Language identification
    Natural Language Engineering, 2017
    Co-Authors: Shervin Malmasi, Mark Dras
    Abstract:

    We present the first comprehensive study of Native Language Identification (NLI) applied to text written in Languages other than English, using data from six Languages. NLI is the task of predicting an author’s first Language using only their writings in a second Language, with applications in Second Language Acquisition and forensic linguistics. Most research to date has focused on English but there is a need to apply NLI to other Languages, not only to gauge its applicability but also to aid in teaching research for other emerging Languages. With this goal, we identify six typologically very different sources of non-English second Language data and conduct six experiments using a set of commonly used features. Our first two experiments evaluate our features and corpora, showing that the features perform well and at similar rates across Languages. The third experiment compares non-Native and Native control data, showing that they can be discerned with 95 per cent accuracy. Our fourth experiment provides a cross-linguistic assessment of how the degree of syntactic data encoded in part-of-speech tags affects their efficiency as classification features, finding that most differences between first Language groups lie in the ordering of the most basic word categories. We also tackle two questions that have not previously been addressed for NLI. Other work in NLI has shown that ensembles of classifiers over feature types work well and in our final experiment we use such an oracle classifier to derive an upper limit for classification accuracy with our feature set. We also present an analysis examining feature diversity, aiming to estimate the degree of overlap and complementarity between our chosen features employing an association measure for binary data. Finally, we conclude with a general discussion and outline directions for future work.

  • oracle and human baselines for Native Language identification
    Workshop on Innovative Use of NLP for Building Educational Applications, 2015
    Co-Authors: Shervin Malmasi, Joel Tetreault, Mark Dras
    Abstract:

    We examine different ensemble methods, including an oracle, to estimate the upper-limit of classification accuracy for Native Language Identification (NLI). The oracle outperforms state-of-the-art systems by over 10% and results indicate that for many misclassified texts the correct class label receives a significant portion of the ensemble votes, often being the runner-up. We also present a pilot study of human performance for NLI, the first such experiment. While some participants achieve modest results on our simplified setup with 5 L1s, they did not outperform our NLI system, and this performance gap is likely to widen on the standard NLI setup.

Shervin Malmasi - One of the best experts on this subject based on the ideXlab platform.

  • portuguese Native Language identification
    Processing of the Portuguese Language, 2018
    Co-Authors: Shervin Malmasi, Marcos Zampieri
    Abstract:

    This study presents the first Native Language Identification (NLI) study for L2 Portuguese. We used a sub-set of the NLI-PT dataset, containing texts written by speakers of five different Native Languages: Chinese, English, German, Italian, and Spanish. We explore the linguistic annotations available in NLI-PT to extract a range of (morpho-)syntactic features and apply NLI classification methods to predict the Native Language of the authors. The best results were obtained using an ensemble combination of the features, achieving \(54.1\%\) accuracy.

  • Native Language identification with classifier stacking and ensembles
    Computational Linguistics, 2018
    Co-Authors: Shervin Malmasi, Mark Dras
    Abstract:

    Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. Howe...

  • Native Language identification with classifier stacking and ensembles
    Computational Linguistics, 2018
    Co-Authors: Shervin Malmasi, Mark Dras
    Abstract:

    Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.

  • a portuguese Native Language identification dataset
    Workshop on Innovative Use of NLP for Building Educational Applications, 2018
    Co-Authors: Iria Gayo, Marcos Zampieri, Shervin Malmasi
    Abstract:

    In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author’s first Language based on their second Language writing. The dataset includes 1,868 student essays written by learners of European Portuguese, Native speakers of the following L1s: Chinese, English, Spanish, German, Russian, French, Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian, and Swedish. NLI-PT includes the original student text and four different types of annotation: POS, fine-grained POS, constituency parses, and dependency parses. NLI-PT can be used not only in NLI but also in research on several topics in the field of Second Language Acquisition and educational NLP. We discuss possible applications of this dataset and present the results obtained for the first lexical baseline system for Portuguese NLI.

  • unsupervised text segmentation based on Native Language characteristics
    Meeting of the Association for Computational Linguistics, 2017
    Co-Authors: Shervin Malmasi, Mark Dras, Mark Johnson, Magdalena Wolska
    Abstract:

    Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or Native Language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian model that incorporates appropriately compact Language models and alternating asymmetric priors can achieve scores on the standard metrics around halfway to perfect segmentation.

Boris Katz - One of the best experts on this subject based on the ideXlab platform.

  • predicting Native Language from gaze
    Meeting of the Association for Computational Linguistics, 2017
    Co-Authors: Yevgeni Berzak, Chie Nakamura, Suzanne Flynn, Boris Katz
    Abstract:

    In an embodiment, a method includes presenting, on a display, sample text in a given Language to a user. The method further includes recording eye fixation times for each word of the sample text for the user and recording saccade times between each pair of fixations of the sample text. The method further includes comparing features of the gaze pattern of the user to features of a gaze pattern of a plurality of training readers. Each training reader (e.g., training user) has a known Native Language. The method further generates a probability of at least one an estimated Native Language of the user based on the results of the comparison.

  • predicting Native Language from gaze
    arXiv: Computation and Language, 2017
    Co-Authors: Yevgeni Berzak, Chie Nakamura, Suzanne Flynn, Boris Katz
    Abstract:

    A fundamental question in Language learning concerns the role of a speaker's first Language in second Language acquisition. We present a novel methodology for studying this question: analysis of eye-movement patterns in second Language reading of free-form text. Using this methodology, we demonstrate for the first time that the Native Language of English learners can be predicted from their gaze fixations when reading English. We provide analysis of classifier uncertainty and learned features, which indicates that differences in English reading are likely to be rooted in linguistic divergences across Native Languages. The presented framework complements production studies and offers new ground for advancing research on multilingualism.

  • reconstructing Native Language typology from foreign Language usage
    arXiv: Computation and Language, 2014
    Co-Authors: Yevgeni Berzak, Roi Reichart, Boris Katz
    Abstract:

    Linguists and psychologists have long been studying cross-linguistic transfer, the influence of Native Language properties on linguistic performance in a foreign Language. In this work we provide empirical evidence for this process in the form of a strong correlation between Language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained from the typological features of the Native Languages. We leverage this finding to recover Native Language typological similarity structure directly from ESL text, and perform prediction of typological features in an unsupervised fashion with respect to the target Languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.

  • reconstructing Native Language typology from foreign Language usage
    Conference on Computational Natural Language Learning, 2014
    Co-Authors: Yevgeni Berzak, Roi Reichart, Boris Katz
    Abstract:

    This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216.