Prior Art

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 5736 Experts worldwide ranked by ideXlab platform

W. Bruce Croft - One of the best experts on this subject based on the ideXlab platform.

  • SIGIR - Transforming patents into Prior-Art queries
    Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009
    Co-Authors: W. Bruce Croft
    Abstract:

    Searching for Prior-Art patents is an essential step for the patent examiner to validate or invalidate a patent application. In this paper, we consider the whole patent as the query, which reduces the burden on the user, and also makes many more potential search features available. We explore how to automatically transform the query patent into an effective search query, especially focusing on the effect of different patent fields. Experiments show that the background summary of a patent is the most useful source of terms for generating a query, even though most previous work used the patent claims.

  • Transforming patents into Prior-Art queries
    Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09, 2009
    Co-Authors: Xiaoibng Xue, W. Bruce Croft
    Abstract:

    Searching for Prior-Art patents is an essential step for the patent examiner to validate or invalidate a patent application. In this paper, we consider the whole patent as the query, which reduces the burden on the user, and also makes many more potential search features available. We explore how to automatically transform the query patent into an effective search query, especially focusing on the effect of different patent fields. Experiments show that the background summary of a patent is the most useful source of terms for generating a query, even though most previous work used the patent claims.

Gareth J F Jones - One of the best experts on this subject based on the ideXlab platform.

  • Studying machine translation technologies for large-data CLIR tasks: a patent Prior-Art search case study
    Information Retrieval, 2013
    Co-Authors: Walid Magdy, Gareth J F Jones
    Abstract:

    Prior-Art search in patent retrieval is concerned with finding all existing patents relevant to a patent application. Since patents often appear in different languages, cross-language information retrieval (CLIR) is an essential component of effective patent search. In recent years machine translation (MT) has become the dominant approach to translation in CLIR. Standard MT systems focus on generating proper translations that are morphologically and syntactically correct. Development of effective MT systems of this type requires large training resources and high computational power for training and translation. This is an important issue for patent CLIR where queries are typically very long sometimes taking the form of a full patent application, meaning that query translation using MT systems can be very slow. However, in contrast to MT, the focus for information retrieval (IR) is on the conceptual meaning of the search words regardless of their surface form, or the linguistic structure of the output. Thus much of the complexity of MT is not required for effective CLIR. We present an adapted MT technique specifically designed for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus Prior to the training phase. Applying this step leads to a significant decrease in the MT computational and training resources requirements. Experimental application of the new approach to the cross language patent retrieval task from CLEF-IP 2010 shows that the new technique to be up to 23 times faster than standard MT for query translations, while maintaining IR effectiveness statistically indistinguishable from standard MT when large training resources are used. Furthermore the new method is significantly better than standard MT when only limited translation training resources are available, which can be a significant issue for translation in specialized domains. The new MT technique also enables patent document translation in a practical amount of time with a resulting significant improvement in the retrieval effectiveness.

  • simple vs sophisticated approaches for patent Prior Art search
    European Conference on Information Retrieval, 2011
    Co-Authors: Walid Magdy, Patrice Lopez, Gareth J F Jones
    Abstract:

    Patent Prior-Art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-Art in patent Prior-Art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.

  • ECIR - Simple vs. sophisticated approaches for patent Prior-Art search
    Lecture Notes in Computer Science, 2011
    Co-Authors: Walid Magdy, Patrice Lopez, Gareth J F Jones
    Abstract:

    Patent Prior-Art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-Art in patent Prior-Art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.

  • United we fall, divided we stand: a study of query segmentation and prf for patent Prior Art search
    Proceedings of the 4th workshop on Patent information retrieval - PaIR '11, 2011
    Co-Authors: Debasis Ganguly, Johannes Leveling, Gareth J F Jones
    Abstract:

    Previous research in patent search has shown that reducing queries by extracting a few key terms is ineffective primarily because of the vocabulary mismatch between patent applications used as queries and existing patent documents. This finding has led to the use of full patent applications as queries in patent Prior Art search. In addition, standard information retrieval (IR) techniques such as query expansion (QE) do not work effectively with patent queries, principally because of the presence of noise terms in the massive queries. In this study, we take a new approach to QE for patent search. Text segmentation is used to decompose a patent query into self coherent sub-topic blocks. Each of these much shorted sub-topic blocks which is representative of a specific aspect or facet of the invention, is then used as a query to retrieve documents. Documents retrieved using the different resulting sub-queries or query streams are interleaved to construct a final ranked list. This technique can exploit the potential benefit of QE since the segmented queries are generally more focused and less ambiguous than the full patent query. Experiments on the CLEF-2010 IP Prior-Art search task show that the proposed method outperforms the retrieval effectiveness achieved when using a single full patent application text as the query, and also demonstrates the potential benefits of QE to alleviate the vocabulary mismatch problem in patent search.

  • applying the kiss principle for the clef ip 2010 Prior Art candidate patent search task
    2010 Working Notes for CLEF Conference CLEF 2010, 2010
    Co-Authors: Walid Magdy, Gareth J F Jones
    Abstract:

    We present our experiments and results for the DCU CNGL pArticipation in the CLEF-IP 2010 Candidate Patent Search Task. Our work applied standard information retrieval (IR) techniques to patent search. In addition, a very simple citation extraction method was applied to improve the results. This was our second consecutive pArticipation in the CLEF-IP tasks. Our experiments in 2009 showed that many sophisticated approach to IR do not improve the retrieval effectiveness for this task. For this reason of we decided to apply only simple methods in 2010. These were demonstrated to be highly competitive with other pArticipants. DCU submitted three runs for the Prior Art Candidate Search Task, two of these runs achieved the second and third ranks among the 25 runs submitted by nine different pArticipants. Our best run achieved MAP of 0.203, recall of 0.618, and PRES of 0.523.

Carla P Gomes - One of the best experts on this subject based on the ideXlab platform.

  • ranking structured documents a large margin based approach for patent Prior Art search
    International Joint Conference on Artificial Intelligence, 2009
    Co-Authors: Yunsong Guo, Carla P Gomes
    Abstract:

    We propose an approach for automatically ranking structured documents applied to patent Prior Art search. Our model, SVM Patent Ranking (SVMPR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgements in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVMPR performs on average 30%-40% better than many other state-of-the-Art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.

  • IJCAI - Ranking structured documents: a large margin based approach for patent Prior Art search
    2009
    Co-Authors: Yunsong Guo, Carla P Gomes
    Abstract:

    We propose an approach for automatically ranking structured documents applied to patent Prior Art search. Our model, SVM Patent Ranking (SVMPR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgements in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVMPR performs on average 30%-40% better than many other state-of-the-Art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.

Walid Magdy - One of the best experts on this subject based on the ideXlab platform.

  • Studying machine translation technologies for large-data CLIR tasks: a patent Prior-Art search case study
    Information Retrieval, 2013
    Co-Authors: Walid Magdy, Gareth J F Jones
    Abstract:

    Prior-Art search in patent retrieval is concerned with finding all existing patents relevant to a patent application. Since patents often appear in different languages, cross-language information retrieval (CLIR) is an essential component of effective patent search. In recent years machine translation (MT) has become the dominant approach to translation in CLIR. Standard MT systems focus on generating proper translations that are morphologically and syntactically correct. Development of effective MT systems of this type requires large training resources and high computational power for training and translation. This is an important issue for patent CLIR where queries are typically very long sometimes taking the form of a full patent application, meaning that query translation using MT systems can be very slow. However, in contrast to MT, the focus for information retrieval (IR) is on the conceptual meaning of the search words regardless of their surface form, or the linguistic structure of the output. Thus much of the complexity of MT is not required for effective CLIR. We present an adapted MT technique specifically designed for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus Prior to the training phase. Applying this step leads to a significant decrease in the MT computational and training resources requirements. Experimental application of the new approach to the cross language patent retrieval task from CLEF-IP 2010 shows that the new technique to be up to 23 times faster than standard MT for query translations, while maintaining IR effectiveness statistically indistinguishable from standard MT when large training resources are used. Furthermore the new method is significantly better than standard MT when only limited translation training resources are available, which can be a significant issue for translation in specialized domains. The new MT technique also enables patent document translation in a practical amount of time with a resulting significant improvement in the retrieval effectiveness.

  • simple vs sophisticated approaches for patent Prior Art search
    European Conference on Information Retrieval, 2011
    Co-Authors: Walid Magdy, Patrice Lopez, Gareth J F Jones
    Abstract:

    Patent Prior-Art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-Art in patent Prior-Art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.

  • ECIR - Simple vs. sophisticated approaches for patent Prior-Art search
    Lecture Notes in Computer Science, 2011
    Co-Authors: Walid Magdy, Patrice Lopez, Gareth J F Jones
    Abstract:

    Patent Prior-Art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-Art in patent Prior-Art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.

  • applying the kiss principle for the clef ip 2010 Prior Art candidate patent search task
    2010 Working Notes for CLEF Conference CLEF 2010, 2010
    Co-Authors: Walid Magdy, Gareth J F Jones
    Abstract:

    We present our experiments and results for the DCU CNGL pArticipation in the CLEF-IP 2010 Candidate Patent Search Task. Our work applied standard information retrieval (IR) techniques to patent search. In addition, a very simple citation extraction method was applied to improve the results. This was our second consecutive pArticipation in the CLEF-IP tasks. Our experiments in 2009 showed that many sophisticated approach to IR do not improve the retrieval effectiveness for this task. For this reason of we decided to apply only simple methods in 2010. These were demonstrated to be highly competitive with other pArticipants. DCU submitted three runs for the Prior Art Candidate Search Task, two of these runs achieved the second and third ranks among the 25 runs submitted by nine different pArticipants. Our best run achieved MAP of 0.203, recall of 0.618, and PRES of 0.523.

Suzan Verberne - One of the best experts on this subject based on the ideXlab platform.

  • combining document representations for Prior Art retrieval
    CLEF (Notebook Papers Labs Workshop), 2011
    Co-Authors: Eva Dhondt, Suzan Verberne, Wouter Alink, Roberto Cornacchia
    Abstract:

    In this paper we report on our pArticipation in the CLEF-IP 2011 Prior Art retrieval task. We investigated whether adding syntactic information in the form of dependency triples to a bag-of-words representation could lead to improvements in patent retrieval. In our experiments, we investigated this effect on the title, abstract and first 400 words of the description section. The experiments were conducted in the Spinque framework with which we tried to optimize for the combinations of text representation and document sections. We found that adding triples did not improve overall MAP scores, compared to the baseline bag-of-words approach but does result in slightly higher set recall scores. In future work we will extend our experiments to use all the text sections of the patent documents and fine-tune the mixture weights.

  • Re-ranking based on Syntactic Dependencies in Prior-Art Retrieval
    2010
    Co-Authors: Eva D'hondt, Suzan Verberne, Nelleke Oostdijk, Lou Boves
    Abstract:

    In this paper we present an experiment using syntax (in the form of dependency triplets) to rerank retrieval results in the patent domain. This work is a follow-up experiment of our pArticipation in the first CLEF-IP track, which focussed on Prior Art retrieval. We shall first describe the work done in our pArticipation to the CLEF-IP track and then go on to show why improving Mean Average Precision (MAP) is important to the patent searchers community. We then introduce an additional reranking step to our BOW retrieval approach which is based on syntactic information. Using syntactic structures called Dependency Triplets as index terms we perform a second retrieval step within the retrieved result sets and examine if the ranking of the relevant documents (captured by the MAP score) can be improved for Prior Art search.

  • clef ip 2010 Prior Art retrieval using the different sections in patent documents
    CLEF-IP 2010. Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010) CLEF-IP workshop, 2010
    Co-Authors: Eva Dhondt, Suzan Verberne
    Abstract:

    In this paper we describe our pArticipation in the 2010 CLEF-IP Prior Art Retrieval task where we examined the impact of information in dierent sections of patent documents, namely the title, abstract, claims, description and IPC-R sections, on the retrieval and re-ranking of patent documents. Using a standard bag-of-words approach in Lemur we found that the IPC-R sections are the most informative for patent retrieval. We then performed a re-ranking of the retrieved documents using a Logistic Regression Model, trained on the retrieved documents in the training set. We found indications that the information contained in the text sections of the patent document can contribute to a better ranking of the retrieved documents. The ocial results have shown that among the nine groups that pArticipated in the Prior Art Retrieval task we achieved the eigth rank in terms of both Mean Average Precision (MAP) and Recall.

  • CLEF (Notebook Papers/LABs/Workshops) - CLEF-IP 2010: Prior Art Retrieval using the different sections in patent documents
    2010
    Co-Authors: Eva D'hondt, Suzan Verberne
    Abstract:

    In this paper we describe our pArticipation in the 2010 CLEF-IP Prior Art Retrieval task where we examined the impact of information in dierent sections of patent documents, namely the title, abstract, claims, description and IPC-R sections, on the retrieval and re-ranking of patent documents. Using a standard bag-of-words approach in Lemur we found that the IPC-R sections are the most informative for patent retrieval. We then performed a re-ranking of the retrieved documents using a Logistic Regression Model, trained on the retrieved documents in the training set. We found indications that the information contained in the text sections of the patent document can contribute to a better ranking of the retrieved documents. The ocial results have shown that among the nine groups that pArticipated in the Prior Art Retrieval task we achieved the eigth rank in terms of both Mean Average Precision (MAP) and Recall.

  • CLEF (Working Notes) - Prior Art retrieval using the claims section as a bag of words
    Lecture Notes in Computer Science, 2010
    Co-Authors: Suzan Verberne, Eva D'hondt
    Abstract:

    In this paper we describe our pArticipation in the 2009 CLEFIP task, which was targeted at Prior-Art search for topic patent documents. We opted for a baseline approach to get a feeling for the specifics of the task and the documents used. Our system retrieved patent documents based on a standard bag-of-words approach for both the Main Task and the English Task. In both runs, we extracted the claim sections from all English patents in the corpus and saved them in the Lemur index format with the patent IDs as DOCIDs. These claims were then indexed using Lemur's BuildIndex function. In the topic documents we also focused exclusively on the claims sections. These were extracted and converted to queries by removing stopwords and punctuation.We did not perform any term selection or query expansion. We retrieved 100 patents per topic using Lemur's RetEval function, retrieval model TF-IDF. Compared to the other runs submitted to the track, we obtained good results in terms of nDCG (0.46) and moderate results in terms of MAP (0.054).