Text Processing

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Iryna Gurevych - One of the best experts on this subject based on the ideXlab platform.

Text Processing like humans do visually attacking and shielding nlp systems

arXiv: Computation and Language, 2019

Co-Authors: Steffen Eger, Gozde Gul şahin, Andreas Ruckle, Jiung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

Abstract:

Visual modifications to Text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

15 days free trial to Access Article
Text Processing like humans do visually attacking and shielding nlp systems

North American Chapter of the Association for Computational Linguistics, 2019

Co-Authors: Steffen Eger, Gozde Gul şahin, Andreas Ruckle, Jiung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

Abstract:

Visual modifications to Text are often used to obfuscate offensive comments in social media (e.g., “!d10t”) or as a writing style (“1337” in “leet speak”), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual perturbations demonstrate. We investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82%. We then explore three shielding methods—visual character embeddings, adversarial training, and rule-based recovery—which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

15 days free trial to Access Article

John Richardson - One of the best experts on this subject based on the ideXlab platform.

sentencepiece a simple and language independent subword tokenizer and detokenizer for neural Text Processing

arXiv: Computation and Language, 2018

Co-Authors: Taku Kudo, John Richardson

Abstract:

This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based Text Processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at this https URL.

15 days free trial to Access Article
sentencepiece a simple and language independent subword tokenizer and detokenizer for neural Text Processing

Empirical Methods in Natural Language Processing, 2018

Co-Authors: Taku Kudo, John Richardson

Abstract:

This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based Text Processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece.

15 days free trial to Access Article

Tetsuya Nasukawa - One of the best experts on this subject based on the ideXlab platform.

full Text Processing improving a practical nlp system based on surface information within the conText

International Conference on Computational Linguistics, 1996

Co-Authors: Tetsuya Nasukawa

Abstract:

Rich information for resolving ambiguities in sentence analysis, including various conText-dependent problems, can be obtained by analyzing a simple set of parsed trees of each sentence in a Text without constructing a precise model of the conText through deep semantic analysis. Thus, Processing a group of sentences together makes it possible to improve the accuracy of a practical natural language Processing (NLP) system such as a machine translation system. In this paper, we describe a simple conText model consisting of parsed trees of each sentence in a Text, and its effectiveness for handling various problems in NLP such as the resolution of structural ambiguities, pronoun referents, and the focus of focusing subjects (e.g. also and only), as well as for adding supplementary phrases to some elliptical sentences.

15 days free trial to Access Article

Steffen Eger - One of the best experts on this subject based on the ideXlab platform.

Text Processing like humans do visually attacking and shielding nlp systems

arXiv: Computation and Language, 2019

Co-Authors: Steffen Eger, Gozde Gul şahin, Andreas Ruckle, Jiung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

Abstract:

Visual modifications to Text are often used to obfuscate offensive comments in social media (e.g., "!d10t") or as a writing style ("1337" in "leet speak"), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual input perturbations demonstrate. We then investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82\%. We then explore three shielding methods---visual character embeddings, adversarial training, and rule-based recovery---which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

15 days free trial to Access Article
Text Processing like humans do visually attacking and shielding nlp systems

North American Chapter of the Association for Computational Linguistics, 2019

Co-Authors: Steffen Eger, Gozde Gul şahin, Andreas Ruckle, Jiung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

Abstract:

Visual modifications to Text are often used to obfuscate offensive comments in social media (e.g., “!d10t”) or as a writing style (“1337” in “leet speak”), among other scenarios. We consider this as a new type of adversarial attack in NLP, a setting to which humans are very robust, as our experiments with both simple and more difficult visual perturbations demonstrate. We investigate the impact of visual adversarial attacks on current NLP systems on character-, word-, and sentence-level tasks, showing that both neural and non-neural models are, in contrast to humans, extremely sensitive to such attacks, suffering performance decreases of up to 82%. We then explore three shielding methods—visual character embeddings, adversarial training, and rule-based recovery—which substantially improve the robustness of the models. However, the shielding methods still fall behind performances achieved in non-attack scenarios, which demonstrates the difficulty of dealing with visual attacks.

15 days free trial to Access Article

Gobinda Chowdhury - One of the best experts on this subject based on the ideXlab platform.

Natural Language Processing

Language, 2003

Co-Authors: Gobinda G Chowdhury, Gobinda Chowdhury

Abstract:

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language Text Processing systems - Text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the conText of www and digital libraries ; and (iv) evaluation of NLP systems.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Iryna Gurevych - One of the best experts on this subject based on the ideXlab platform.

Text Processing like humans do visually attacking and shielding nlp systems

Text Processing like humans do visually attacking and shielding nlp systems

John Richardson - One of the best experts on this subject based on the ideXlab platform.

sentencepiece a simple and language independent subword tokenizer and detokenizer for neural Text Processing

sentencepiece a simple and language independent subword tokenizer and detokenizer for neural Text Processing

Tetsuya Nasukawa - One of the best experts on this subject based on the ideXlab platform.

full Text Processing improving a practical nlp system based on surface information within the conText

Steffen Eger - One of the best experts on this subject based on the ideXlab platform.

Text Processing like humans do visually attacking and shielding nlp systems

Text Processing like humans do visually attacking and shielding nlp systems

Gobinda Chowdhury - One of the best experts on this subject based on the ideXlab platform.

Natural Language Processing