Patent Classification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5235 Experts worldwide ranked by ideXlab platform

Meng Jung Shih - One of the best experts on this subject based on the ideXlab platform.

  • Patent Classification using ontology based Patent network analysis
    Pacific Asia Conference on Information Systems, 2010
    Co-Authors: Meng Jung Shih, Duenren Liu
    Abstract:

    Patent management is increasingly important for organizations to sustain their competitive advantage. The Classification of Patents is essential for Patent management and industrial analysis. In this study, we propose a novel Patent network-based Classification method to analyze query Patents and predict their classes. The proposed Patent network, which contains various types of nodes that represent different features extracted from Patent documents, is constructed based on the relationship metrics derived from Patent metadata. The novel approach analyzes reachable nodes in the Patent ontology network to calculate their relevance to query Patents, after which it uses the modified k-nearest neighbor classifier to classify query Patents. We evaluate the performance of the proposed approach on a test dataset of Patent documents obtained from the United States Patent and Trademark Office (USPTO), and compare it with the performance of the three conventional methods. The results demonstrate that the proposed Patent network-based approach outperforms the conventional approaches.

  • Hybrid-Patent Classification Based on Patent-Network Analysis
    Journal of the Association for Information Science and Technology, 2010
    Co-Authors: Meng Jung Shih
    Abstract:

    Effective Patent management is essential for organizations to maintain their competitive advantage. The Classification of Patents is a critical part of Patent management and industrial analysis. This study proposes a hybrid-Patent-Classification approach that combines a novel Patent-network-based Classification method with three conventional Classification methods to analyze query Patents and predict their classes. The novel Patent network contains various types of nodes that represent different features extracted from Patent documents. The nodes are connected based on the relationship metrics derived from the Patent metadata. The proposed Classification method predicts a query Patent's class by analyzing all reachable nodes in the Patent network and calculating their relevance to the query Patent. It then classifies the query Patent with a modified k-nearest neighbor classifier. To further improve the approach, we combine it with content-based, citation-based, and metadata-based Classification methods to develop a hybrid-Classification approach. We evaluate the performance of the hybrid approach on a test dataset of Patent documents obtained from the U.S. Patent and Trademark Office, and compare its performance with that of the three conventional methods. The results demonstrate that the proposed Patent-network-based approach yields more accurate class predictions than the Patent network-based approach.

Eva Dhondt - One of the best experts on this subject based on the ideXlab platform.

  • Patent Classification on subgroup level using balanced winnow
    Patent Information Retrieval, 2017
    Co-Authors: Eva Dhondt, Suzan Verberne, N H J Oostdijk, Lou Boves
    Abstract:

    In the past decade research into automated Patent Classification has mainly focused on the higher levels of International Patent Classification (IPC) hierarchy. The Patent community has expressed a need for more precise Classification to better aid current pre-Classification and retrieval efforts (Benzineb and Guyot, Current challenges in Patent information retrieval. Springer, New York, pp 239–261, 2011). In this chapter we investigate the three main difficulties associated with automated Classification on the lowest level in the IPC, i.e. subgroup level. In an effort to improve Classification accuracy on this level, we (1) compare flat Classification with a two-step hierarchical system which models the IPC hierarchy and (2) examine the impact of combining unigrams with PoS-filtered skipgrams on both the subclass and subgroup levels. We present experiments on English Patent abstracts from the well-known WIPO-alpha benchmark data set, as well as from the more realistic CLEF-IP 2010 data set. We find that the flat and hierarchical Classification approaches achieve similar performance on a small data set but that the latter is much more feasible under real-life conditions. Additionally, we find that combining unigram and skipgram features leads to similar and highly significant improvements in Classification performance (over unigram-only features) on both the subclass and subgroup levels, but only if sufficient training data is available.

  • text representations for Patent Classification
    Computational Linguistics, 2013
    Co-Authors: Eva Dhondt, Suzan Verberne, Cornelis H A Koster, Lou Boves
    Abstract:

    With the increasing rate of Patent application filings, automated Patent Classification is of rising economic importance. This article investigates how Patent Classification can be improved by using different representations of the Patent documents. Using the Linguistic Classification System (LCS), we compare the impact of adding statistical phrases (in the form of bigrams) and linguistic phrases (in two different dependency formats) to the standard bag-of-words text representation on a subset of 532,264 English abstracts from the CLEF-IP 2010 corpus. In contrast to previous findings on Classification with phrases in the Reuters-21578 data set, for Patent Classification the addition of phrases results in significant improvements over the unigram baseline. The best results were achieved by combining all four representations, and the second best by combining unigrams and lemmatized bigrams. This article includes extensive analyses of the class models (a.k.a. class profiles) created by the classifiers in the...

  • using skipgrams and pos based feature selection for Patent Classification
    Computational Linguistics in the Netherlands, 2012
    Co-Authors: Eva Dhondt, Suzan Verberne, Cornelis H A Koster, N Weber, Lou Boves
    Abstract:

    Until recently, phrases were deemed suboptimal features for text Classification because of their sparseness (Lewis 1992). In recent work (Koster et al. 2011, D’hondt et al. Forthcoming), however, it was found that for classifying English Patent documents, combining phrasal and unigram representations leads to significantly better Classification results, because phrases are better suited to catch the Multi-Word Terms (MWT) abundant in the terminology-rich technical Patent texts. In this article, we consider the task of Patent Classification of English abstracts at the class level (about 120 classes) of the International Patent Classification (IPC). We compare (a) the impact of two types of phrases to capture meaningful information (bigrams and skipgrams); and (b) the impact of performing additional filtering of the Classification features, based on their Part of Speech (PoS). For this purpose we performed a series of Classification experiments using different phrasal text representations and feature selection to determine which representation is most beneficial to English Patent Classification. We further investigated which type of information (as captured by the PoS-filtered skipgrams) has most impact during Classification. The results show that combining unigrams and PoS-filtered skipgrams leads to a significant improvement in Classification scores over the unigram baseline. Additional experiments show that the most important phrasal features are bigrams and additional useful phrases can be captured by allowing at most 2 skips in the skipgram approach. Deeper analysis revealed that the noun-noun combinations and – to a lesser extent – the adjectival-noun combinations are the most informative phrasal features for Patent Classification.

  • Patent Classification experiments with the linguistic Classification system lcs in clef ip 2011
    CLEF (Notebook Papers Labs Workshop), 2011
    Co-Authors: Suzan Verberne, Eva Dhondt
    Abstract:

    We report the results of a series of Classification experiments with the Linguistic Classification System LCS in the context of CLEF-IP 2011. We participated in the main Classification task: classifying documents on the subclass level. We investigated (1) the use of different sections (abstract, description, metadata) from the Patent documents; (2) adding dependency triples to the bag-of-words representation; (3) adding the WIPO corpus to the EPO training data; (4) the use of Patent citations in the test data for reranking the classes; and (5) the threshold on the class scores for class selection. We found that adding full descriptions to abstracts gives a clear improvement; the first 400 words of the description also improves Classification but to a lesser degree. Adding metadata (applicants, inventors en address) did not improve Classification. Adding dependency triples to words gives a much higher recall at the cost of a lower precision but this effect is largely due to the class selection threshold. We did not find an effect from adding the WIPO corpus, nor from reranking with Patent citations. In future work, we plan to investigate whether there are other methods for reranking with Patent citations that does give an improvement, because we feel that the citations may still give valuable information. Our most important finding however is the importance of the threshold on the class selection. For the current work, we only compared two values for the threshold and the results are much better for 1.0 than for 0.5. The 0.5 threshold gives higher recall in all runs, which was the original motivation for submitting runs with a lower threshold. However, because the much lower precision, the F-scores are lower. We think that there is still some improvement to be gained from proper tuning of the class selection threshold, and the use of a flexible threshold (also taking into account the different text representations). This is part of our future work.

  • Patent Classification experiments with the linguistic Classification system lcs
    Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010) CLEF-IP workshop, 2010
    Co-Authors: Suzan Verberne, Merijn Vogel, Eva Dhondt
    Abstract:

    In the context of the CLEF-IP 2010 Classification task, we conducted a series of experiments with the Linguistic Classification System (LCS). We compared two document representations for Patent abstracts: a bag-of-words representation and a syntactic/semantic representation containing both words and dependency triples. We evaluated two types of output: using a fixed cut-off on the ranking of the classes and using a flexible cut-off based on a threshold on the Classification scores. Using the Winnow classifier, we obtained an improvement in Classification scores when triples are added to the bag of words. However, our results are remarkably better on a held-out subset of the target data than on the 2 000-topic test set. The main findings of this paper are: (1) adding dependency triples to words has a positive effect on Classification accuracy and (2) selecting classes by using a threshold on the Classification scores instead of returning a fixed number of classes per document improves Classification scores while at the same time it lowers the number of classes needs to be judged manually by the professionals at the Patent office.

Han Tong Loh - One of the best experts on this subject based on the ideXlab platform.

  • pattern oriented associative rule based Patent Classification
    Expert Systems With Applications, 2010
    Co-Authors: Han Tong Loh
    Abstract:

    This paper proposes an innovative pattern-oriented associative rule-based approach to construct automatic TRIZ-based Patent Classification system. Derived from associative rule-based text categorization, the new approach does not only discover the semantic relationship among features in a document by their co-occurrence, but also captures the syntactic information by manually generalized patterns. We choose 7 classes which address 20 of the 40 TRIZ Principles and perform experiments upon the binary set for each class. Compared with three currently popular Classification algorithms (SVM, C4.5 and NB), the new approach shows some improvement. More importantly, this new approach has its own advantages, which were discussed in this paper as well.

  • automatic Classification of Patent documents for triz users
    World Patent Information, 2006
    Co-Authors: Han Tong Loh, Lixiang Shen
    Abstract:

    Abstract In contrast to traditional inventors, inventors using TRIZ are not only interested in searching for prior art in related fields, but also for the analogous inventions in other fields that have solved the same Technical Contradiction by using the same method. To be useful for TRIZ users, Patents are required to be classified by the Contradiction they solved and Inventive Principles they used instead of the fields in which they are involved. Most of the currently available automatic Patent Classification systems are based on technology-dependent schemes such as the IPC and they cannot satisfy TRIZ users’ requirements. In this paper, an automatic Patent Classification for TRIZ users is proposed and explained in detail. In a preliminary study, Patent documents were collected for 6 out of 40 Inventive Principles, and the proposed automatic Classification tested.

Duenren Liu - One of the best experts on this subject based on the ideXlab platform.

  • Patent Classification using ontology based Patent network analysis
    Pacific Asia Conference on Information Systems, 2010
    Co-Authors: Meng Jung Shih, Duenren Liu
    Abstract:

    Patent management is increasingly important for organizations to sustain their competitive advantage. The Classification of Patents is essential for Patent management and industrial analysis. In this study, we propose a novel Patent network-based Classification method to analyze query Patents and predict their classes. The proposed Patent network, which contains various types of nodes that represent different features extracted from Patent documents, is constructed based on the relationship metrics derived from Patent metadata. The novel approach analyzes reachable nodes in the Patent ontology network to calculate their relevance to query Patents, after which it uses the modified k-nearest neighbor classifier to classify query Patents. We evaluate the performance of the proposed approach on a test dataset of Patent documents obtained from the United States Patent and Trademark Office (USPTO), and compare it with the performance of the three conventional methods. The results demonstrate that the proposed Patent network-based approach outperforms the conventional approaches.

Suzan Verberne - One of the best experts on this subject based on the ideXlab platform.

  • Patent Classification on subgroup level using balanced winnow
    Patent Information Retrieval, 2017
    Co-Authors: Eva Dhondt, Suzan Verberne, N H J Oostdijk, Lou Boves
    Abstract:

    In the past decade research into automated Patent Classification has mainly focused on the higher levels of International Patent Classification (IPC) hierarchy. The Patent community has expressed a need for more precise Classification to better aid current pre-Classification and retrieval efforts (Benzineb and Guyot, Current challenges in Patent information retrieval. Springer, New York, pp 239–261, 2011). In this chapter we investigate the three main difficulties associated with automated Classification on the lowest level in the IPC, i.e. subgroup level. In an effort to improve Classification accuracy on this level, we (1) compare flat Classification with a two-step hierarchical system which models the IPC hierarchy and (2) examine the impact of combining unigrams with PoS-filtered skipgrams on both the subclass and subgroup levels. We present experiments on English Patent abstracts from the well-known WIPO-alpha benchmark data set, as well as from the more realistic CLEF-IP 2010 data set. We find that the flat and hierarchical Classification approaches achieve similar performance on a small data set but that the latter is much more feasible under real-life conditions. Additionally, we find that combining unigram and skipgram features leads to similar and highly significant improvements in Classification performance (over unigram-only features) on both the subclass and subgroup levels, but only if sufficient training data is available.

  • text representations for Patent Classification
    Computational Linguistics, 2013
    Co-Authors: Eva Dhondt, Suzan Verberne, Cornelis H A Koster, Lou Boves
    Abstract:

    With the increasing rate of Patent application filings, automated Patent Classification is of rising economic importance. This article investigates how Patent Classification can be improved by using different representations of the Patent documents. Using the Linguistic Classification System (LCS), we compare the impact of adding statistical phrases (in the form of bigrams) and linguistic phrases (in two different dependency formats) to the standard bag-of-words text representation on a subset of 532,264 English abstracts from the CLEF-IP 2010 corpus. In contrast to previous findings on Classification with phrases in the Reuters-21578 data set, for Patent Classification the addition of phrases results in significant improvements over the unigram baseline. The best results were achieved by combining all four representations, and the second best by combining unigrams and lemmatized bigrams. This article includes extensive analyses of the class models (a.k.a. class profiles) created by the classifiers in the...

  • using skipgrams and pos based feature selection for Patent Classification
    Computational Linguistics in the Netherlands, 2012
    Co-Authors: Eva Dhondt, Suzan Verberne, Cornelis H A Koster, N Weber, Lou Boves
    Abstract:

    Until recently, phrases were deemed suboptimal features for text Classification because of their sparseness (Lewis 1992). In recent work (Koster et al. 2011, D’hondt et al. Forthcoming), however, it was found that for classifying English Patent documents, combining phrasal and unigram representations leads to significantly better Classification results, because phrases are better suited to catch the Multi-Word Terms (MWT) abundant in the terminology-rich technical Patent texts. In this article, we consider the task of Patent Classification of English abstracts at the class level (about 120 classes) of the International Patent Classification (IPC). We compare (a) the impact of two types of phrases to capture meaningful information (bigrams and skipgrams); and (b) the impact of performing additional filtering of the Classification features, based on their Part of Speech (PoS). For this purpose we performed a series of Classification experiments using different phrasal text representations and feature selection to determine which representation is most beneficial to English Patent Classification. We further investigated which type of information (as captured by the PoS-filtered skipgrams) has most impact during Classification. The results show that combining unigrams and PoS-filtered skipgrams leads to a significant improvement in Classification scores over the unigram baseline. Additional experiments show that the most important phrasal features are bigrams and additional useful phrases can be captured by allowing at most 2 skips in the skipgram approach. Deeper analysis revealed that the noun-noun combinations and – to a lesser extent – the adjectival-noun combinations are the most informative phrasal features for Patent Classification.

  • Patent Classification experiments with the linguistic Classification system lcs in clef ip 2011
    CLEF (Notebook Papers Labs Workshop), 2011
    Co-Authors: Suzan Verberne, Eva Dhondt
    Abstract:

    We report the results of a series of Classification experiments with the Linguistic Classification System LCS in the context of CLEF-IP 2011. We participated in the main Classification task: classifying documents on the subclass level. We investigated (1) the use of different sections (abstract, description, metadata) from the Patent documents; (2) adding dependency triples to the bag-of-words representation; (3) adding the WIPO corpus to the EPO training data; (4) the use of Patent citations in the test data for reranking the classes; and (5) the threshold on the class scores for class selection. We found that adding full descriptions to abstracts gives a clear improvement; the first 400 words of the description also improves Classification but to a lesser degree. Adding metadata (applicants, inventors en address) did not improve Classification. Adding dependency triples to words gives a much higher recall at the cost of a lower precision but this effect is largely due to the class selection threshold. We did not find an effect from adding the WIPO corpus, nor from reranking with Patent citations. In future work, we plan to investigate whether there are other methods for reranking with Patent citations that does give an improvement, because we feel that the citations may still give valuable information. Our most important finding however is the importance of the threshold on the class selection. For the current work, we only compared two values for the threshold and the results are much better for 1.0 than for 0.5. The 0.5 threshold gives higher recall in all runs, which was the original motivation for submitting runs with a lower threshold. However, because the much lower precision, the F-scores are lower. We think that there is still some improvement to be gained from proper tuning of the class selection threshold, and the use of a flexible threshold (also taking into account the different text representations). This is part of our future work.

  • Patent Classification experiments with the linguistic Classification system lcs
    Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010) CLEF-IP workshop, 2010
    Co-Authors: Suzan Verberne, Merijn Vogel, Eva Dhondt
    Abstract:

    In the context of the CLEF-IP 2010 Classification task, we conducted a series of experiments with the Linguistic Classification System (LCS). We compared two document representations for Patent abstracts: a bag-of-words representation and a syntactic/semantic representation containing both words and dependency triples. We evaluated two types of output: using a fixed cut-off on the ranking of the classes and using a flexible cut-off based on a threshold on the Classification scores. Using the Winnow classifier, we obtained an improvement in Classification scores when triples are added to the bag of words. However, our results are remarkably better on a held-out subset of the target data than on the 2 000-topic test set. The main findings of this paper are: (1) adding dependency triples to words has a positive effect on Classification accuracy and (2) selecting classes by using a threshold on the Classification scores instead of returning a fixed number of classes per document improves Classification scores while at the same time it lowers the number of classes needs to be judged manually by the professionals at the Patent office.