Simplification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1113003 Experts worldwide ranked by ideXlab platform

Mirella Lapata - One of the best experts on this subject based on the ideXlab platform.

  • sentence Simplification with deep reinforcement learning
    arXiv: Computation and Language, 2017
    Co-Authors: Xingxing Zhang, Mirella Lapata
    Abstract:

    Sentence Simplification aims to make sentences easier to read and understand. Most recent approaches draw on insights from machine translation to learn Simplification rewrites from monolingual corpora of complex and simple sentences. We address the Simplification problem with an encoder-decoder model coupled with a deep reinforcement learning framework. Our model, which we call {\sc Dress} (as shorthand for {\bf D}eep {\bf RE}inforcement {\bf S}entence {\bf S}implification), explores the space of possible Simplifications while learning to optimize a reward function that encourages outputs which are simple, fluent, and preserve the meaning of the input. Experiments on three datasets demonstrate that our model outperforms competitive Simplification systems.

  • learning to simplify sentences with quasi synchronous grammar and integer programming
    Empirical Methods in Natural Language Processing, 2011
    Co-Authors: Kristian Woodsend, Mirella Lapata
    Abstract:

    Text Simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a data-driven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate Simplification from the space of possible rewrites generated by the grammar. We show experimentally that our method creates Simplifications that significantly reduce the reading difficulty of the input, while maintaining grammaticality and preserving its meaning.

Chris Callisonburch - One of the best experts on this subject based on the ideXlab platform.

  • complexity weighted loss and diverse reranking for sentence Simplification
    arXiv: Computation and Language, 2019
    Co-Authors: Reno Kriz, Joao Sedoc, Marianna Apidianaki, Carolina Zheng, Gaurav Kumar, Eleni Miltsakaki, Chris Callisonburch
    Abstract:

    Sentence Simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for Simplification is that these models tend to copy directly from the original sentence, resulting in outputs that are relatively long and complex. We aim to alleviate this issue through the use of two main techniques. First, we incorporate content word complexities, as predicted with a leveled word complexity model, into our loss function during training. Second, we generate a large set of diverse candidate Simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. Here, we measure simplicity through a novel sentence complexity model. These extensions allow our models to perform competitively with state-of-the-art systems while generating simpler sentences. We report standard automatic and human evaluation metrics.

  • simple ppdb a paraphrase database for Simplification
    Meeting of the Association for Computational Linguistics, 2016
    Co-Authors: Ellie Pavlick, Chris Callisonburch
    Abstract:

    We release the Simple Paraphrase Database, a subset of of the Paraphrase Database (PPDB) adapted for the task of text Simplification. We train a supervised model to associate Simplification scores with each phrase pair, producing rankings competitive with state-of-theart lexical Simplification models. Our new Simplification database contains 4.5 million paraphrase rules, making it the largest available resource for lexical Simplification.

  • optimizing statistical machine translation for text Simplification
    Transactions of the Association for Computational Linguistics, 2016
    Co-Authors: Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callisonburch
    Abstract:

    Most recent sentence Simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text Simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual Simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating Simplification systems, which will facilitate iterative development for this task.

  • problems in current text Simplification research new data can help
    Transactions of the Association for Computational Linguistics, 2015
    Co-Authors: Chris Callisonburch, Courtney Napoles
    Abstract:

    Simple Wikipedia has dominated Simplification research in the past 5 years. In this opinion paper, we argue that focusing on Wikipedia limits Simplification research. We back up our arguments with corpus analysis and by highlighting statements that other researchers have made in the Simplification literature. We introduce a new Simplification dataset that is a significant improvement over Simple Wikipedia, and present a novel quantitative-comparative approach to study the quality of Simplification data resources.

Horacio Saggion - One of the best experts on this subject based on the ideXlab platform.

  • Text Simplification
    The Oxford Handbook of Computational Linguistics 2nd edition, 2018
    Co-Authors: Horacio Saggion
    Abstract:

    Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text Simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text Simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text Simplification concerns both the modification of the vocabulary of the text (lexical Simplification) and the modification of the structure of the sentences (syntactic Simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe Simplification applications and full systems also outline language resources and evaluation approaches.

  • RANLP - Automatic text Simplification for Spanish: comparative evaluation of various Simplification strategies
    2015
    Co-Authors: Sanja Štajner, Iacer Calixto, Horacio Saggion
    Abstract:

    In this paper, we explore statistical machine translation (SMT) approaches to automatic text Simplification (ATS) for Spanish. First, we compare the performances of the standard phrase-based (PB) and hierarchical (HIERO) SMT models in this specific task. In both cases, we build two models, one using the TS corpus with “light” Simplifications and the other using the TS corpus with “heavy” Simplifications. Next, we compare the two best systems with the state-of-the-art text Simplification system for Spanish (Simplext). Our results, based on an extensive human evaluation, show that the SMT-based systems perform equally as well as, or better than, Simplext, despite the very small datasets used for training and tuning.

  • making it simplext implementation and evaluation of a text Simplification system for spanish
    ACM Transactions on Accessible Computing, 2015
    Co-Authors: Horacio Saggion, Sanja Štajner, Stefan Bott, Simon Mille, Luz Rello, Biljana Drndarevic
    Abstract:

    The way in which a text is written can be a barrier for many people. Automatic text Simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text Simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text Simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical Simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different Simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text Simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.

  • Text Simplification resources for Spanish
    Language Resources and Evaluation, 2014
    Co-Authors: Stefan Bott, Horacio Saggion
    Abstract:

    In this paper we present the development of a text Simplification system for Spanish. Text Simplification is the adaptation of a text for the special needs of certain groups of readers, such as language learners, people with cognitive difficulties, and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing text is labour-intensive and costly. Automatic Simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no existing Simplification tools for Spanish. We present a corpus study which aims to identify the operations a text Simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify news texts. We also present a first prototype for automatic Simplification, which shows that the most important Simplification operations can be successfully treated.

  • LREC - Text Simplification Tools for Spanish
    2012
    Co-Authors: Stefan Bott, Horacio Saggion, Simon Mille
    Abstract:

    In this paper we describe the development of a text Simplification system for Spanish. Text Simplification is the adaptation of a text to the special needs of certain groups of readers, such as language learners, people with cognitive difficulties and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing texts is labour intensive and costly. Automatic Simplification is a field which attracts growing attention in Natural Language Processing, but, to the best of our knowledge, there are no Simplification tools for Spanish. We present a prototype for automatic Simplification, which shows that the most important structural Simplification operations can be successfully treated with an approach based on rules which can potentially be improved by statistical methods. For the development of this prototype we carried out a corpus study which aims at identifying the operations a text Simplification system needs to carry out in order to produce an output similar to what human editors produce when they simplify texts.

Wim Reddingius - One of the best experts on this subject based on the ideXlab platform.

  • Progressive Simplification of polygonal curves
    Computational Geometry, 2020
    Co-Authors: Kevin Buchin, Maximilian Konzack, Wim Reddingius
    Abstract:

    Abstract Simplifying polygonal curves at different levels of detail is an important problem with many applications. Existing geometric optimization algorithms are only capable of minimizing the complexity of a simplified curve for a single level of detail. We present an O ( n 3 m ) -time algorithm that takes a polygonal curve of n vertices and produces a set of consistent Simplifications for m scales while minimizing the cumulative Simplification complexity. This algorithm is compatible with distance measures such as the Hausdorff, the Frechet and area-based distances, and enables Simplification for continuous scaling in O ( n 5 ) time. To speed up this algorithm in practice, a technique is presented for efficiently constructing many so-called shortcut graphs under the Hausdorff distance, as well as a representation of the shortcut graph that enables us to find shortest paths in anticipated O ( n log ⁡ n ) time on spatial data, improving over O ( n 2 ) time using existing algorithms. Experimental evaluation of these techniques on geospatial data reveals a significant improvement of using shortcut graphs for progressive and non-progressive curve Simplification, both in terms of running time and memory usage.

  • Progressive Simplification of Polygonal Curves
    arXiv: Computational Geometry, 2018
    Co-Authors: Kevin Buchin, Maximilian Konzack, Wim Reddingius
    Abstract:

    Simplifying polygonal curves at different levels of detail is an important problem with many applications. Existing geometric optimization algorithms are only capable of minimizing the complexity of a simplified curve for a single level of detail. We present an $O(n^3m)$-time algorithm that takes a polygonal curve of n vertices and produces a set of consistent Simplifications for m scales while minimizing the cumulative Simplification complexity. This algorithm is compatible with distance measures such as the Hausdorff, the Fr\'echet and area-based distances, and enables Simplification for continuous scaling in $O(n^5)$ time. To speed up this algorithm in practice, we present new techniques for constructing and representing so-called shortcut graphs. Experimental evaluation of these techniques on trajectory data reveals a significant improvement of using shortcut graphs for progressive and non-progressive curve Simplification, both in terms of running time and memory usage.

David Coeurjolly - One of the best experts on this subject based on the ideXlab platform.

  • A generic and parallel algorithm for 2D digital curve polygonal approximation
    Journal of Real-Time Image Processing, 2011
    Co-Authors: Guillaume Damiand, David Coeurjolly
    Abstract:

    In this paper, we present a generic topological and geometrical framework which allows to define and control several parallel algorithms for 2D digital curve approximation. The proposed technique is based on combinatorial map Simplifications guided by geometrical criteria. We illustrate the genericity of the framework by defining three contour Simplification methods: a polygonal approximation one based an area deviation computation; a digital straight segments reconstruction one which guaranties to obtain a loss-less representation; and a moment preserving Simplification one which simplifies the contours while preserving geometrical moments of the image regions. Thanks to a complete experimental evaluation, we demonstrate that the proposed methods can be efficiently implemented in a multi-thread environment to simplify labeled image contours.