Jaccard Coefficient

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3855 Experts worldwide ranked by ideXlab platform

Javier Tejada-cárcamo - One of the best experts on this subject based on the ideXlab platform.

  • MICAI (Special Sessions) - Unilateral Weighted Jaccard Coefficient for NLP
    2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), 2015
    Co-Authors: Julio Santisteban, Javier Tejada-cárcamo
    Abstract:

    Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Weighted Jaccard Coefficient (uwJaccard), which takes into consideration not only the space among two points but also the semantics among them in a distributional semantic model, the Unilateral Weighted Jaccard Coefficient provides a measure of uncertainty which will be able to measure the uncertainty among sentences such as "man bites dog" and "dog bites man".

  • Unilateral Weighted Jaccard Coefficient for NLP
    2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), 2015
    Co-Authors: Julio Santisteban, Javier Tejada-cárcamo
    Abstract:

    Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Weighted Jaccard Coefficient (uwJaccard), which takes into consideration not only the space among two points but also the semantics among them in a distributional semantic model, the Unilateral Weighted Jaccard Coefficient provides a measure of uncertainty which will be able to measure the uncertainty among sentences such as "man bites dog" and "dog bites man".

Li Zhang - One of the best experts on this subject based on the ideXlab platform.

  • Jaccard Coefficient based bi clustering and fusion recommender system for solving data sparsity
    Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2019
    Co-Authors: Jiangfei Cheng, Li Zhang
    Abstract:

    Recommender systems have been very common and useful nowadays, which recommend suitable items to users by predicting ratings for items. The most used collaborative filtering recommender system suffers from the sparsity issue due to insufficient data. To cope with this issue, we propose a Jaccard Coefficient-based Bi-clustering and Fusion (JC-BiFu) method for Recommender system. JC-BiFu uses density peak clustering for both users and items, and then makes estimations for missing values in the user-item rating matrix when finding the similar users. Finally, we utilize both users and items to generate the final predictions. Experimental analysis shows that our approach can improve the performance of user recommendations at the extreme levels of sparsity in user-item rating matrix.

  • PAKDD (2) - Jaccard Coefficient-Based Bi-clustering and Fusion Recommender System for Solving Data Sparsity.
    Advances in Knowledge Discovery and Data Mining, 2019
    Co-Authors: Jiangfei Cheng, Li Zhang
    Abstract:

    Recommender systems have been very common and useful nowadays, which recommend suitable items to users by predicting ratings for items. The most used collaborative filtering recommender system suffers from the sparsity issue due to insufficient data. To cope with this issue, we propose a Jaccard Coefficient-based Bi-clustering and Fusion (JC-BiFu) method for Recommender system. JC-BiFu uses density peak clustering for both users and items, and then makes estimations for missing values in the user-item rating matrix when finding the similar users. Finally, we utilize both users and items to generate the final predictions. Experimental analysis shows that our approach can improve the performance of user recommendations at the extreme levels of sparsity in user-item rating matrix.

  • On Measuring Semantic Similarity of Business Process Models
    2009 International Conference on Interoperability for Enterprise Software and Applications China, 2009
    Co-Authors: Li Zhang
    Abstract:

    Identifying and integrating similar business processes within different organizations is an important task in construction of virtual enterprise. However, the same business process may be represented in different ways by different modelers even when they using the same modeling language. This is because that different organization using different terminologies and the problem of semantic heterogeneity makes it a tedious job to compare business processes. Therefore, the technologies of semantic similarity computing are employed to resolve ambiguity issues caused by the use of synonyms or homonyms. In particular, the idea of similarity propagation is introduced to pick out a mapping between corresponding activities and data, and Hungarian algorithm is expanded to reduce its time complexity. Then the similarity of whole models is measured based on Jaccard Coefficient. Finally, an experiment is given to evaluate the method.

E. Magli - One of the best experts on this subject based on the ideXlab platform.

  • Analysis of SparseHash: an efficient embedding of set-similarity via sparse projections.
    arXiv: Data Structures and Algorithms, 2019
    Co-Authors: D. Valsesia, S. M. Fosson, C. Ravazzi, T. Bianchi, E. Magli
    Abstract:

    Embeddings provide compact representations of signals in order to perform efficient inference in a wide variety of tasks. In particular, random projections are common tools to construct Euclidean distance-preserving embeddings, while hashing techniques are extensively used to embed set-similarity metrics, such as the Jaccard Coefficient. In this letter, we theoretically prove that a class of random projections based on sparse matrices, called SparseHash, can preserve the Jaccard Coefficient between the supports of sparse signals, which can be used to estimate set similarities. Moreover, besides the analysis, we provide an efficient implementation and we test the performance in several numerical experiments, both on synthetic and real datasets.

  • SparseHash: Embedding Jaccard Coefficient between supports of signals
    2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2016
    Co-Authors: D. Valsesia, S. M. Fosson, C. Ravazzi, T. Bianchi, E. Magli
    Abstract:

    Embeddings provide compact representations of signals to be used to perform inference in a wide variety of tasks. Random projections have been extensively used to preserve Euclidean distances or inner products of high dimensional signals into low dimensional representations. Different techniques based on hashing have been used in the past to embed set similarity metrics such as the Jaccard Coefficient. In this paper we show that a class of random projections based on sparse matrices can be used to preserve the Jaccard Coefficient between the supports of sparse signals. Our proposed construction can be therefore used in a variety of tasks in machine learning and multimedia signal processing where the overlap between signal supports is a relevant similarity metric. We also present an application in retrieval of similar text documents where SparseHash improves over MinHash.

  • ICME Workshops - SparseHash: Embedding Jaccard Coefficient between supports of signals
    2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2016
    Co-Authors: D. Valsesia, S. M. Fosson, C. Ravazzi, T. Bianchi, E. Magli
    Abstract:

    Embeddings provide compact representations of signals to be used to perform inference in a wide variety of tasks. Random projections have been extensively used to preserve Euclidean distances or inner products of high dimensional signals into low dimensional representations. Different techniques based on hashing have been used in the past to embed set similarity metrics such as the Jaccard Coefficient. In this paper we show that a class of random projections based on sparse matrices can be used to preserve the Jaccard Coefficient between the supports of sparse signals. Our proposed construction can be therefore used in a variety of tasks in machine learning and multimedia signal processing where the overlap between signal supports is a relevant similarity metric. We also present an application in retrieval of similar text documents where SparseHash improves over MinHash.

Julio Santisteban - One of the best experts on this subject based on the ideXlab platform.

  • MICAI (Special Sessions) - Unilateral Weighted Jaccard Coefficient for NLP
    2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), 2015
    Co-Authors: Julio Santisteban, Javier Tejada-cárcamo
    Abstract:

    Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Weighted Jaccard Coefficient (uwJaccard), which takes into consideration not only the space among two points but also the semantics among them in a distributional semantic model, the Unilateral Weighted Jaccard Coefficient provides a measure of uncertainty which will be able to measure the uncertainty among sentences such as "man bites dog" and "dog bites man".

  • Unilateral Weighted Jaccard Coefficient for NLP
    2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), 2015
    Co-Authors: Julio Santisteban, Javier Tejada-cárcamo
    Abstract:

    Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Weighted Jaccard Coefficient (uwJaccard), which takes into consideration not only the space among two points but also the semantics among them in a distributional semantic model, the Unilateral Weighted Jaccard Coefficient provides a measure of uncertainty which will be able to measure the uncertainty among sentences such as "man bites dog" and "dog bites man".

Khairul Hafidh - One of the best experts on this subject based on the ideXlab platform.

  • memprediksi masa studi mahasiswa menggunakan metode Jaccard Coefficient studi kasus mahasiswa program studi teknik informatika jurusan teknik elektro fakultas teknik universitas tanjungpura
    Jurnal Sistem dan Teknologi Informasi (JustIN), 2015
    Co-Authors: Khairul Hafidh
    Abstract:

    The rapid growth of education and competitive create conditions for each college to continue to improve the quality especially with college accreditation assessment of BAN-PT. Informatics which is one of the undergraduate program (S1) at the University of Tanjungpura Pontianak in regular class on 2010 batch was graduating only 3 of 35 students in 8 semesters. This shows that there is still a lot of Informatics students who took over the study period of 8 semesters of the scheduled 8 semesters. One way to achieve the highest quality quality of higher education system is to collect data as attributes of the main learning experiences that affect student achievement. Analysis prediction graduate student more than 8 semesters is to conduct case-based reasoning is to find the degree of similarity between the cases base with new cases (cases to be predicted). The main source of knowledge is a case-based reasoning system based on the cases that have been or are already stored in the cases base. Cases are obtained to support this research was obtained from the results of interviews with graduates from 2003 to 2010 generation. Case-based reasoning process through four stages: Retrieve (penulusuran case), Reuse (using the same solution in the cases base), Revise (revise the proposed solution), and Retain (storage case). Case similarity calculation was conducted using the Jaccard Coefficient. If the similarity value = 1 then the new cases are not stored in the cases base, but if the similarity value <1 then a new case can be saved in the cases base. Testing the system with the Jaccard Coefficient to the 31 new cases have a value of 80.65% accuracy. Keywords: Prediction, Period of Study, Case-Based Reasoning and Jaccard Coefficient

  • MEMPREDIKSI MASA STUDI MAHASISWA MENGGUNAKAN METODE Jaccard Coefficient (Studi Kasus: Mahasiswa Program Studi Teknik Informatika Jurusan Teknik Elektro Fakultas Teknik Universitas Tanjungpura)
    2015
    Co-Authors: Khairul Hafidh
    Abstract:

    The rapid growth of education and competitive create conditions for each college to continue to improve the quality especially with college accreditation assessment of BAN-PT. Informatics which is one of the undergraduate program (S1) at the University of Tanjungpura Pontianak in regular class on 2010 batch was graduating only 3 of 35 students in 8 semesters. This shows that there is still a lot of Informatics students who took over the study period of 8 semesters of the scheduled 8 semesters. One way to achieve the highest quality quality of higher education system is to collect data as attributes of the main learning experiences that affect student achievement. Analysis prediction graduate student more than 8 semesters is to conduct case-based reasoning is to find the degree of similarity between the cases base with new cases (cases to be predicted). The main source of knowledge is a case-based reasoning system based on the cases that have been or are already stored in the cases base. Cases are obtained to support this research was obtained from the results of interviews with graduates from 2003 to 2010 generation. Case-based reasoning process through four stages: Retrieve (penulusuran case), Reuse (using the same solution in the cases base), Revise (revise the proposed solution), and Retain (storage case). Case similarity calculation was conducted using the Jaccard Coefficient. If the similarity value = 1 then the new cases are not stored in the cases base, but if the similarity value