Scientific Publications - Explore the Science & Experts

The Experts below are selected from a list of 145719 Experts worldwide ranked by ideXlab platform

Xiaoyan Song - One of the best experts on this subject based on the ideXlab platform.

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Scientometrics, 2010

Co-Authors: Tom Magerman, Bart Van Looy, Xiaoyan Song

Abstract:

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and Scientific Publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected—and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents—in this case, patents and Publications—might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the Publications and patents of a sample of academic inventors (n = 6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that text mining techniques can be valuable for detecting similarities between patents and Publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

15 days free trial to Access Article
exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Social Science Research Network, 2008

Co-Authors: Tom Magerman, Bart Van Looy, Xiaoyan Song

Abstract:

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and Scientific Publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected - and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents - in this case, patents and Publications - might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the Publications and patents of a sample of academic inventors (n=6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that mixt mining techniques can be valuable for detecting similarities between patents and Publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

15 days free trial to Access Article

Martin J Oconnor - One of the best experts on this subject based on the ideXlab platform.

a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

Journal of Biomedical Semantics, 2013

Co-Authors: Saeed Hassanpour, Martin J Oconnor

Abstract:

Background A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within Scientific Publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.

15 days free trial to Access Article
a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

Journal of Biomedical Semantics, 2013

Co-Authors: Saeed Hassanpour, Martin J Oconnor, Amar K Das

Abstract:

A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within Scientific Publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of Scientific Publications.

15 days free trial to Access Article

Tom Magerman - One of the best experts on this subject based on the ideXlab platform.

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Scientometrics, 2010

Co-Authors: Tom Magerman, Bart Van Looy, Xiaoyan Song

Abstract:

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and Scientific Publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected—and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents—in this case, patents and Publications—might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the Publications and patents of a sample of academic inventors (n = 6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that text mining techniques can be valuable for detecting similarities between patents and Publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

15 days free trial to Access Article
exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Social Science Research Network, 2008

Co-Authors: Tom Magerman, Bart Van Looy, Xiaoyan Song

Abstract:

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and Scientific Publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected - and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents - in this case, patents and Publications - might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the Publications and patents of a sample of academic inventors (n=6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that mixt mining techniques can be valuable for detecting similarities between patents and Publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

15 days free trial to Access Article

Saeed Hassanpour - One of the best experts on this subject based on the ideXlab platform.

a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

Journal of Biomedical Semantics, 2013

Co-Authors: Saeed Hassanpour, Martin J Oconnor

Abstract:

Background A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within Scientific Publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.

15 days free trial to Access Article
a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

Journal of Biomedical Semantics, 2013

Co-Authors: Saeed Hassanpour, Martin J Oconnor, Amar K Das

Abstract:

A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within Scientific Publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of Scientific Publications.

15 days free trial to Access Article

Bart Van Looy - One of the best experts on this subject based on the ideXlab platform.

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Scientometrics, 2010

Co-Authors: Tom Magerman, Bart Van Looy, Xiaoyan Song

Abstract:

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and Scientific Publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected—and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents—in this case, patents and Publications—might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the Publications and patents of a sample of academic inventors (n = 6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that text mining techniques can be valuable for detecting similarities between patents and Publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

15 days free trial to Access Article
exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Social Science Research Network, 2008

Co-Authors: Tom Magerman, Bart Van Looy, Xiaoyan Song

Abstract:

In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and Scientific Publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected - and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents - in this case, patents and Publications - might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the Publications and patents of a sample of academic inventors (n=6). We also validated the outcomes by using independently obtained validation scores of human raters. While we conclude that mixt mining techniques can be valuable for detecting similarities between patents and Publications, our findings also indicate that the various options available to arrive at similarity measures vary considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Scientific Publications with ideXlab!

Xiaoyan Song - One of the best experts on this subject based on the ideXlab platform.

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Martin J Oconnor - One of the best experts on this subject based on the ideXlab platform.

a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

Tom Magerman - One of the best experts on this subject based on the ideXlab platform.

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

Saeed Hassanpour - One of the best experts on this subject based on the ideXlab platform.

a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

a semantic based method for extracting concept definitions from Scientific Publications evaluation in the autism phenotype domain

Bart Van Looy - One of the best experts on this subject based on the ideXlab platform.

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications

exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and Scientific Publications