Language Script

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 198 Experts worldwide ranked by ideXlab platform

Johannes Ettl - One of the best experts on this subject based on the ideXlab platform.

  • Computerized patient identification for the EMBRACA clinical trial using real-time data from the PRAEGNANT network for metastatic breast cancer patients
    Breast Cancer Research and Treatment, 2016
    Co-Authors: Alexander Hein, Paul Gass, Christina Barbara Walter, Florin-andrei Taran, Andreas Hartkopf, Friedrich Overkamp, Hans-christian Kolberg, Peyman Hadji, Hans Tesch, Johannes Ettl
    Abstract:

    As breast cancer is a diverse disease, clinical trials are becoming increasingly diversified and are consequently being conducted in very small subgroups of patients, making study recruitment increasingly difficult. The aim of this study was to assess the use of data from a remote data entry system that serves a large national registry for metastatic breast cancer. The PRAEGNANT network is a real-time registry with an integrated biomaterials bank that was designed as a scientific study and as a means of identifying patients who are eligible for clinical trials, based on clinical and molecular information. Here, we report on the automated use of the clinical data documented to identify patients for a clinical trial (EMBRACA) for patients with metastatic breast cancer. The patients’ charts were assessed by two independent physicians involved in the clinical trial and also by a computer program that tested patients for eligibility using a structured query Language Script. In all, 326 patients from two study sites in the PRAEGNANT network were included in the analysis. Using expert assessment, 120 of the 326 patients (37 %) appeared to be eligible for inclusion in the EMBRACA study; with the computer algorithm assessment, a total of 129 appeared to be eligible. The sensitivity of the computer algorithm was 0.87 and its specificity was 0.88. Using computer-based identification of patients for clinical trials appears feasible. With the instrument’s high specificity, its application in a large cohort of patients appears to be feasible, and the workload for reassessing the patients is limited.

Umapada Pal - One of the best experts on this subject based on the ideXlab platform.

  • Language, Script, and Font Recognition
    Handbook of Document Image Processing and Recognition, 2014
    Co-Authors: Umapada Pal, Niladri Sekhar Dash
    Abstract:

    Automatic identification of a Language within a text document containing multiple Scripts and fonts is a challenging task, as it is not only linked with the shape, size, and style of the characters and symbols used in the formation of the text but also admixed with more crucial factors such as the forms and size of pages, layout of written text, spacing between text lines, design of characters, density of information, directionality of text composition, etc. Therefore, successful management of the various types of information in the act of character, Script, and Language recognition requires an intelligent system that can elegantly deal with all these factors and issues along with other secondary factors such as Language identity, writing system, ethnicity, anthropology, etc. Due to such complexities, identification of Script vis-à-vis Language has been a real challenge in optical character recognition (OCR) and information retrieval technology. Considering the global upsurge of so-called minor and/or unknown Languages, it has become a technological challenge to develop automatic or semiautomatic systems that can identify a Language vis-à-vis a Script in which a particular piece of text document is composed. Bearing these issues in mind, an attempt is initiated in this chapter to address some of the methods and approaches developed so far for Language, Script, and font recognition for written text documents. The first section, after presenting a general overview of Language, deals with the information about the origin of Language, the difficulties faced in Language identification, and the existing approaches to Language identification. The second section presents an overview of Script, differentiates between single- and multiScript documents, describes Script identification technologies and the challenges involved therein, focuses on the process of machine-printed Script identification, and then addresses the issues involved in handwritten Script identification. The third section tries to define font terminologies, addresses the problems involved in font generation, refers to the phenomenon of font variation in a Language, and discusses strategies for font and style recognition. Thus, the chapter depicts a panoramic portrait of the three basic components involved in OCR technology: the problems and issues involved, the milestones achieved so far, and the challenges that still lie ahead.

  • Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents
    Pattern Analysis and Applications, 2011
    Co-Authors: Alireza Alaei, P. Nagabhushan, Umapada Pal
    Abstract:

    The most important and difficult task in text document analysis is to achieve line segmentation accurately, particularly when the document is composed of unconstrained handwritten text. To accomplish this objective a painting scheme is proposed in this research work. Being motivated by the fact that the handwritten Persian texts offer the most critical challenges in the process of text-line segmentation, the new method has been devised by studying the cursive Persian text Scripts extensively; yet, in general the proposed line segmentation algorithm is applicable to handwritten text in any Language/Script. The text block is vertically decomposed into parallel pipe structures called as strip. Each row in each strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. Subsequently, the painted pipes are converted into two-tone painting and it is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, phrased as Piece - wise Potential Separating Line (PPSL), between two consecutive black spaces. The PPSLs are concatenated to produce the segmentation of text lines. Some additional procedures are built to handle certain anomalies, which may occur. The scheme is validated by extensive experimentation. We tested the proposed algorithm with 52 pages of Persian text documents containing totally 823 lines and correct line segmentation of 92.35% is achieved. Moreover, the proposed algorithm was also tested with two different datasets of 152 and 200 handwritten text-pages of different Languages. Efficiency and Script independency of the proposed algorithm were proved when compared with various approaches presented in recent literature.

  • ocr error detection and correction of an inflectional indian Language Script
    International Conference on Pattern Recognition, 1996
    Co-Authors: Bidyut B. Chaudhuri, Umapada Pal
    Abstract:

    This paper deals with an OCR error detection and correction technique for a highly inflectional Language Script like Bangla (a major Indian Language). This is the first report of its kind. Using two separate lexicons of root words and suffixes, candidate root-suffix pairs of each input word are detected, their grammatical agreement are tested and the root/suffix part in which the error has occurred is noted. The correction is made on the corresponding error part of the input string by a fast dictionary access technique. To do so some alternative strings are generated for an erroneous word. Among the alternative strings, those satisfying grammatical agreement in root-suffix and also having smallest Levenstein-Damerau distance are finally chosen as the correct ones. The system has an accuracy of 75.61%.

John Plaice - One of the best experts on this subject based on the ideXlab platform.

  • A multidimensional approach to typesetting
    2003
    Co-Authors: John Plaice, Yannis Haralambous, Paul Swoboda, C. A. Rowley
    Abstract:

    We propose to create a new model for multilingual computerized typesetting, in which each of Language, Script, font and character is treated as a multidimen- sional entity, and all combine to form a multidimensional context. Typesetting is undertaken in a typographical space, and becomes a multiple-stage process of preparing the input stream for typesetting, segmenting the stream into clusters or words, typesetting these clusters, and then recombining them. Each of the stages, including their respective algorithms, is dependent on the multidimensional context. This approach will support quality typesetting for a number of modern and ancient Scripts. The paper and talk will show how these are to be implemented in .

  • An extensible approach to high-quality multilingual typesetting
    Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation, 2003
    Co-Authors: John Plaice, Yannis Haralambous, Craig Rowley
    Abstract:

    We propose to create and study a new model for the micro-typography part of automated multilingual typesetting. This new model will support quality typesetting for a number of modern and ancient Scripts. The major innovations in the proposal are: the process is refined into four phases, each dependent on a multidimensional tree-structured context summarizing the current linguistic and cultural environment. The four phases are: preparing the input stream for typesetting; segmenting the stream into clusters (words); typesetting these clusters; and then recombining the clusters into a typeset text stream. The context is pervasive throughout the process; the algorithms used in each phase are context-dependent, as are the meanings of fundamental entities such as Language, Script, font and character.

  • RIDE - An extensible approach to high-quality multilingual typesetting
    Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation, 1
    Co-Authors: John Plaice, Yannis Haralambous, C. A. Rowley
    Abstract:

    We propose to create and study a new model for the micro-typography part of automated multilingual typesetting. This new model will support quality typesetting for a number of modern and ancient Scripts. The major innovations in the proposal are: the process is refined into four phases, each dependent on a multidimensional tree-structured context summarizing the current linguistic and cultural environment. The four phases are: preparing the input stream for typesetting; segmenting the stream into clusters (words); typesetting these clusters; and then recombining the clusters into a typeset text stream. The context is pervasive throughout the process; the algorithms used in each phase are context-dependent, as are the meanings of fundamental entities such as Language, Script, font and character.

C. A. Rowley - One of the best experts on this subject based on the ideXlab platform.

  • A multidimensional approach to typesetting
    2003
    Co-Authors: John Plaice, Yannis Haralambous, Paul Swoboda, C. A. Rowley
    Abstract:

    We propose to create a new model for multilingual computerized typesetting, in which each of Language, Script, font and character is treated as a multidimen- sional entity, and all combine to form a multidimensional context. Typesetting is undertaken in a typographical space, and becomes a multiple-stage process of preparing the input stream for typesetting, segmenting the stream into clusters or words, typesetting these clusters, and then recombining them. Each of the stages, including their respective algorithms, is dependent on the multidimensional context. This approach will support quality typesetting for a number of modern and ancient Scripts. The paper and talk will show how these are to be implemented in .

  • RIDE - An extensible approach to high-quality multilingual typesetting
    Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation, 1
    Co-Authors: John Plaice, Yannis Haralambous, C. A. Rowley
    Abstract:

    We propose to create and study a new model for the micro-typography part of automated multilingual typesetting. This new model will support quality typesetting for a number of modern and ancient Scripts. The major innovations in the proposal are: the process is refined into four phases, each dependent on a multidimensional tree-structured context summarizing the current linguistic and cultural environment. The four phases are: preparing the input stream for typesetting; segmenting the stream into clusters (words); typesetting these clusters; and then recombining the clusters into a typeset text stream. The context is pervasive throughout the process; the algorithms used in each phase are context-dependent, as are the meanings of fundamental entities such as Language, Script, font and character.

Bidyut B. Chaudhuri - One of the best experts on this subject based on the ideXlab platform.

  • ICPR - OCR error detection and correction of an inflectional Indian Language Script
    Proceedings of 13th International Conference on Pattern Recognition, 1996
    Co-Authors: Bidyut B. Chaudhuri
    Abstract:

    This paper deals with an OCR error detection and correction technique for a highly inflectional Language Script like Bangla (a major Indian Language). This is the first report of its kind. Using two separate lexicons of root words and suffixes, candidate root-suffix pairs of each input word are detected, their grammatical agreement are tested and the root/suffix part in which the error has occurred is noted. The correction is made on the corresponding error part of the input string by a fast dictionary access technique. To do so some alternative strings are generated for an erroneous word. Among the alternative strings, those satisfying grammatical agreement in root-suffix and also having smallest Levenstein-Damerau distance are finally chosen as the correct ones. The system has an accuracy of 75.61%.

  • ocr error detection and correction of an inflectional indian Language Script
    International Conference on Pattern Recognition, 1996
    Co-Authors: Bidyut B. Chaudhuri, Umapada Pal
    Abstract:

    This paper deals with an OCR error detection and correction technique for a highly inflectional Language Script like Bangla (a major Indian Language). This is the first report of its kind. Using two separate lexicons of root words and suffixes, candidate root-suffix pairs of each input word are detected, their grammatical agreement are tested and the root/suffix part in which the error has occurred is noted. The correction is made on the corresponding error part of the input string by a fast dictionary access technique. To do so some alternative strings are generated for an erroneous word. Among the alternative strings, those satisfying grammatical agreement in root-suffix and also having smallest Levenstein-Damerau distance are finally chosen as the correct ones. The system has an accuracy of 75.61%.