Mathematical Formula

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 28986 Experts worldwide ranked by ideXlab platform

Volker Sorge - One of the best experts on this subject based on the ideXlab platform.

  • Mathematical Formula identification and performance evaluation in pdf documents
    International Journal on Document Analysis and Recognition, 2014
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Volker Sorge
    Abstract:

    An important initial step of Mathematical Formula recognition is to correctly identify the location of Formulae within documents. Previous work in this area has traditionally focused on image-based documents; however, given the prevalence and popularity of the PDF format for dissemination, alternatives to image-based approaches are increasingly being explored. In this paper, we investigate the use of both machine learning techniques and heuristic rules to locate the boundaries of both isolated and embedded Formulae within documents, based upon data extracted directly from PDF files. We propose four new features along with preprocessing and post-processing techniques for isolated Formula identification. Furthermore, we compare, analyse and extensively tune nine state-of-the-art learning algorithms for a comprehensive evaluation of our proposed methods. The evaluation is carried out over a ground-truth dataset, which we have made publicly available, together with an application adaptable fine-grained evaluation metric. Our experimental results demonstrate that the overall accuracies of isolated and embedded Formula identification are increased by 11.52 and 10.65 %, compared with our previously proposed Formula identification approach.

  • a text line detection method for Mathematical Formula recognition
    International Conference on Document Analysis and Recognition, 2013
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Mohamed Alkalai, Volker Sorge
    Abstract:

    Text line detection is a prerequisite procedure of Mathematical Formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, Mathematical Formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at Mathematical Formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from Mathematical documents. Furthermore, the error rate in Mathematical Formula identification is reduced significantly through adopting the proposed text line detection method.

  • ICDAR - A Text Line Detection Method for Mathematical Formula Recognition
    2013 12th International Conference on Document Analysis and Recognition, 2013
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Mohamed Alkalai, Volker Sorge
    Abstract:

    Text line detection is a prerequisite procedure of Mathematical Formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, Mathematical Formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at Mathematical Formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from Mathematical documents. Furthermore, the error rate in Mathematical Formula identification is reduced significantly through adopting the proposed text line detection method.

  • faithful Mathematical Formula recognition from pdf documents
    Document Analysis Systems, 2010
    Co-Authors: Josef B Baker, Alan P Sexton, Volker Sorge
    Abstract:

    We present an approach to extracting Mathematical Formulae directly from PDF documents. We exploit both the perfect character information as well as additional font and spacing information available from a PDF document to ensure a faithful recognition of Mathematical expressions. The extracted information can be post-processed to produce suitable markup that can be re-inserted into the PDF documents in order to enable the handling of Mathematical Formulae by accessibility technology. Furthermore, we demonstrate how we recognise different types of Mathematical objects, such as relations, operators, etc., without reference to predefined knowledge or dictionary lookup, using character clustering and interspace and character font information alone, all of which contributes to our goal of reconstructing the intended semantics of a Formula from its presentation.

  • a linear grammar approach to Mathematical Formula recognition from pdf
    Calculemus '09 MKM '09 Proceedings of the 16th Symposium 8th International Conference. Held as Part of CICM '09 on Intelligent Computer Mathematics, 2009
    Co-Authors: Josef B Baker, Alan P Sexton, Volker Sorge
    Abstract:

    Many approaches have been proposed over the years for the recognition of Mathematical Formulae from scanned documents. More recently a need has arisen to recognise Formulae from PDF documents. Here we can avoid ambiguities introduced by traditional OCR approaches and instead extract perfect knowledge of the characters used in Formulae directly from the document. This can be exploited by Formula recognition techniques to achieve correct results and high performance. In this paper we revisit an old grammatical approach to Formula recognition, that of Anderson from 1968, and assess its applicability with respect to data extracted from PDF documents. We identify some problems of the original method when applied to common Mathematical expressions and show how they can be overcome. The simplicity of the original method leads to a very efficient recognition technique that not only is very simple to implement but also yields results of high accuracy for the recognition of Mathematical Formulae from PDF documents.

Xiaoyan Lin - One of the best experts on this subject based on the ideXlab platform.

  • Mathematical Formula identification and performance evaluation in pdf documents
    International Journal on Document Analysis and Recognition, 2014
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Volker Sorge
    Abstract:

    An important initial step of Mathematical Formula recognition is to correctly identify the location of Formulae within documents. Previous work in this area has traditionally focused on image-based documents; however, given the prevalence and popularity of the PDF format for dissemination, alternatives to image-based approaches are increasingly being explored. In this paper, we investigate the use of both machine learning techniques and heuristic rules to locate the boundaries of both isolated and embedded Formulae within documents, based upon data extracted directly from PDF files. We propose four new features along with preprocessing and post-processing techniques for isolated Formula identification. Furthermore, we compare, analyse and extensively tune nine state-of-the-art learning algorithms for a comprehensive evaluation of our proposed methods. The evaluation is carried out over a ground-truth dataset, which we have made publicly available, together with an application adaptable fine-grained evaluation metric. Our experimental results demonstrate that the overall accuracies of isolated and embedded Formula identification are increased by 11.52 and 10.65 %, compared with our previously proposed Formula identification approach.

  • a text line detection method for Mathematical Formula recognition
    International Conference on Document Analysis and Recognition, 2013
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Mohamed Alkalai, Volker Sorge
    Abstract:

    Text line detection is a prerequisite procedure of Mathematical Formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, Mathematical Formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at Mathematical Formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from Mathematical documents. Furthermore, the error rate in Mathematical Formula identification is reduced significantly through adopting the proposed text line detection method.

  • ICDAR - A Text Line Detection Method for Mathematical Formula Recognition
    2013 12th International Conference on Document Analysis and Recognition, 2013
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Mohamed Alkalai, Volker Sorge
    Abstract:

    Text line detection is a prerequisite procedure of Mathematical Formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, Mathematical Formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at Mathematical Formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from Mathematical documents. Furthermore, the error rate in Mathematical Formula identification is reduced significantly through adopting the proposed text line detection method.

  • Document Analysis Systems - Performance Evaluation of Mathematical Formula Identification
    2012 10th IAPR International Workshop on Document Analysis Systems, 2012
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Xiaofan Lin
    Abstract:

    This paper presents a performance evaluation system for Mathematical Formula identification. First, a ground-truth dataset is constructed to facilitate the performance comparison of different Mathematical Formula identification algorithms. Statistics analysis of the dataset shows the diversities of the dataset to reflect the real-world documents. Second, a performance evaluation metric for Mathematical Formula identification is proposed, including the error type definitions and the scenario-adjustable scoring. The proposed metric enables in-depth analysis of Mathematical Formula identification systems in different scenarios. Finally, based on the proposed evaluation metric, a tool is developed to automatically evaluate Mathematical Formula identification results. It is worth noting that the ground-truth dataset and the evaluation tool are freely available for academic purpose.

  • Mathematical Formula identification in pdf documents
    International Conference on Document Analysis and Recognition, 2011
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Xiaofan Lin
    Abstract:

    Recognizing Mathematical expressions in PDF documents is a new and important field in document analysis. It is quite different from extracting Mathematical expressions in image-based documents. In this paper, we propose a novel method by combining rule-based and learning-based methods to detect both isolated and embedded Mathematical expressions in PDF documents. Moreover, various features of Formulas, including geometric layout, character and context content, are used to adapt to a wide range of Formula types. Experimental results show satisfactory performance of the proposed method. Furthermore, the method has been successfully incorporated into a commercial software package for large-scale Chinese e-Book production.

Ko Ginbayashi - One of the best experts on this subject based on the ideXlab platform.

  • A Mathematical Formula to calculate the theoretical range of motion for total hip replacement.
    Journal of biomechanics, 2002
    Co-Authors: Fumihiro Yoshimine, Ko Ginbayashi
    Abstract:

    Abstract The reduced range of motion (ROM) resulting from total hip replacement (THR) leads to frequent prosthetic impingement, which may restrict activities of daily living and cause subluxation and dislocation. Therefore, to know the ROM of THR is very important in clinical situations and in the design of prostheses. THR involves a pure ball and socket joint. We created a Mathematical Formula to calculate the theoretical ROM of THR limited by the prosthetic impingement. The ROM of THR is governed by the following five factors, (1) The prosthetic ROM (oscillation angle: obtained from company data), (2) cup abduction (3) cup anterior opening, (4) the angle of the femoral neck component from the horizontal plane, and (5) the femoral neck anteversion. The last 4 factors are able to be obtained from anterior–posterior, axial X-rays and CT of the patient’s THR. The objective was to create Mathematical Formulas that could accurately and quickly calculate the ROM of THR. By entering the five values into a computer programmed with the Formulas, one could obtain the ROM for the THR. This reveals the effect on ROM of the oscillation angle and the interaction of ROM with cup abduction, anterior opening and neck anteversion. Furthermore this readily would enable a clinical evaluation of the possibility of postoperative dislocation and help in postoperative rehabilitation. The calculated numerical values of ROM by these Mathematical Formulas were successfully compared with the ROMs obtained from 3-dimensional computer graphics (3D-CG).

  • Technical note A Mathematical Formula to calculate the theoretical range of motion for total hip replacement
    2002
    Co-Authors: Fumihiro Yoshimine, Ko Ginbayashi
    Abstract:

    The reduced range of motion (ROM) resulting from total hip replacement (THR) leads to frequent prosthetic impingement,which may restrict activities of daily living and cause subluxation and dislocation. Therefore,to know the ROM of THR is very important in clinical situations and in the design of prostheses. THR involves a pure ball and socket joint. We created a Mathematical Formula to calculate the theoretical ROM of THR limited by the prosthetic impingement. The ROM of THR is governed by the following five factors,(1) The prosthetic ROM (oscillation angle: obtained from company data),(2) cup abduction (3) cup anterior opening, (4) the angle of the femoral neck component from the horizontal plane,and (5) the femoral neck anteversion. The last 4 factors are able to be obtained from anterior–posterior,axial X-rays and CT of the patient’s THR. The objective was to create Mathematical Formulas that could accurately and quickly calculate the ROM of THR. By entering the five values into a computer programmed with the Formulas,one could obtain the ROM for the THR. This reveals the effect on ROM of the oscillation angle and the interaction of ROM with cup abduction,anterior opening and neck anteversion. Furthermore this readily would enable a clinical evaluation of the possibility of postoperative dislocation and help in postoperative rehabilitation. The calculated numerical values of ROM by these Mathematical Formulas were successfully compared with the ROMs obtained from 3-dimensional computer graphics (3D-CG). r 2002 Elsevier Science Ltd. All rights reserved.

Josef B Baker - One of the best experts on this subject based on the ideXlab platform.

  • Mathematical Formula identification and performance evaluation in pdf documents
    International Journal on Document Analysis and Recognition, 2014
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Volker Sorge
    Abstract:

    An important initial step of Mathematical Formula recognition is to correctly identify the location of Formulae within documents. Previous work in this area has traditionally focused on image-based documents; however, given the prevalence and popularity of the PDF format for dissemination, alternatives to image-based approaches are increasingly being explored. In this paper, we investigate the use of both machine learning techniques and heuristic rules to locate the boundaries of both isolated and embedded Formulae within documents, based upon data extracted directly from PDF files. We propose four new features along with preprocessing and post-processing techniques for isolated Formula identification. Furthermore, we compare, analyse and extensively tune nine state-of-the-art learning algorithms for a comprehensive evaluation of our proposed methods. The evaluation is carried out over a ground-truth dataset, which we have made publicly available, together with an application adaptable fine-grained evaluation metric. Our experimental results demonstrate that the overall accuracies of isolated and embedded Formula identification are increased by 11.52 and 10.65 %, compared with our previously proposed Formula identification approach.

  • a text line detection method for Mathematical Formula recognition
    International Conference on Document Analysis and Recognition, 2013
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Mohamed Alkalai, Volker Sorge
    Abstract:

    Text line detection is a prerequisite procedure of Mathematical Formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, Mathematical Formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at Mathematical Formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from Mathematical documents. Furthermore, the error rate in Mathematical Formula identification is reduced significantly through adopting the proposed text line detection method.

  • ICDAR - A Text Line Detection Method for Mathematical Formula Recognition
    2013 12th International Conference on Document Analysis and Recognition, 2013
    Co-Authors: Xiaoyan Lin, Zhi Tang, Liangcai Gao, Josef B Baker, Mohamed Alkalai, Volker Sorge
    Abstract:

    Text line detection is a prerequisite procedure of Mathematical Formula recognition, however, many incorrectly segmented text lines are often produced due to the two-dimensional structures of mathematics when using existing segmentation methods such as Projection Profiles Cutting or white space analysis. In consequence, Mathematical Formula recognition is adversely affected by these incorrectly detected text lines, with errors propagating through further processes. Aimed at Mathematical Formula recognition, we propose a text line detection method to produce reliable line segmentation. Based on the results produced by PPC, a learning based merging strategy is presented to combine incorrectly split text lines. In the merging strategy, the features of layout and text for a text line and those between successive lines are utilised to detect the incorrectly split text lines. Experimental results show that the proposed approach obtains good performance in detecting text lines from Mathematical documents. Furthermore, the error rate in Mathematical Formula identification is reduced significantly through adopting the proposed text line detection method.

  • faithful Mathematical Formula recognition from pdf documents
    Document Analysis Systems, 2010
    Co-Authors: Josef B Baker, Alan P Sexton, Volker Sorge
    Abstract:

    We present an approach to extracting Mathematical Formulae directly from PDF documents. We exploit both the perfect character information as well as additional font and spacing information available from a PDF document to ensure a faithful recognition of Mathematical expressions. The extracted information can be post-processed to produce suitable markup that can be re-inserted into the PDF documents in order to enable the handling of Mathematical Formulae by accessibility technology. Furthermore, we demonstrate how we recognise different types of Mathematical objects, such as relations, operators, etc., without reference to predefined knowledge or dictionary lookup, using character clustering and interspace and character font information alone, all of which contributes to our goal of reconstructing the intended semantics of a Formula from its presentation.

  • a linear grammar approach to Mathematical Formula recognition from pdf
    Calculemus '09 MKM '09 Proceedings of the 16th Symposium 8th International Conference. Held as Part of CICM '09 on Intelligent Computer Mathematics, 2009
    Co-Authors: Josef B Baker, Alan P Sexton, Volker Sorge
    Abstract:

    Many approaches have been proposed over the years for the recognition of Mathematical Formulae from scanned documents. More recently a need has arisen to recognise Formulae from PDF documents. Here we can avoid ambiguities introduced by traditional OCR approaches and instead extract perfect knowledge of the characters used in Formulae directly from the document. This can be exploited by Formula recognition techniques to achieve correct results and high performance. In this paper we revisit an old grammatical approach to Formula recognition, that of Anderson from 1968, and assess its applicability with respect to data extracted from PDF documents. We identify some problems of the original method when applied to common Mathematical expressions and show how they can be overcome. The simplicity of the original method leads to a very efficient recognition technique that not only is very simple to implement but also yields results of high accuracy for the recognition of Mathematical Formulae from PDF documents.

F U Hongguang - One of the best experts on this subject based on the ideXlab platform.

  • research and implementation of internet Mathematical Formula search engine based on latex
    Journal of Computer Applications, 2010
    Co-Authors: F U Hongguang
    Abstract:

    Search of Mathematical Formulas is very important for study and research.Nowadays widely used search engines,like Google and Baidu,have no such services.Except Formula images,Mathematical Formulas mainly exist on WWW pages in the form of Latex and MathML.There is no mature Formula search engine for Latex and MathML.This paper introduced algorithms of word segmentation,index and search based on Latex(MathML can be transformed to Latex) and implemented an efficient Mathematical Formula search engine based on Lucene.