Vector Space

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 234171 Experts worldwide ranked by ideXlab platform

Raymond J Mooney - One of the best experts on this subject based on the ideXlab platform.

  • multi prototype Vector Space models of word meaning
    North American Chapter of the Association for Computational Linguistics, 2010
    Co-Authors: Joseph Reisinger, Raymond J Mooney
    Abstract:

    Current Vector-Space models of lexical semantics create a single "prototype" Vector to represent the meaning of a word. However, due to lexical ambiguity, encoding word meaning with a single Vector is problematic. This paper presents a method that uses clustering to produce multiple "sense-specific" Vectors for each word. This approach provides a context-dependent Vector representation of word meaning that naturally accommodates homonymy and polysemy. Experimental comparisons to human judgements of semantic similarity for both isolated words as well as words in sentential contexts demonstrate the superiority of this approach over both prototype and exemplar based Vector-Space models.

  • HLT-NAACL - Multi-Prototype Vector-Space Models of Word Meaning
    2010
    Co-Authors: Joseph Reisinger, Raymond J Mooney
    Abstract:

    Current Vector-Space models of lexical semantics create a single "prototype" Vector to represent the meaning of a word. However, due to lexical ambiguity, encoding word meaning with a single Vector is problematic. This paper presents a method that uses clustering to produce multiple "sense-specific" Vectors for each word. This approach provides a context-dependent Vector representation of word meaning that naturally accommodates homonymy and polysemy. Experimental comparisons to human judgements of semantic similarity for both isolated words as well as words in sentential contexts demonstrate the superiority of this approach over both prototype and exemplar based Vector-Space models.

Chris Dyer - One of the best experts on this subject based on the ideXlab platform.

  • improving Vector Space word representations using multilingual correlation
    Conference of the European Chapter of the Association for Computational Linguistics, 2014
    Co-Authors: Manaal Faruqui, Chris Dyer
    Abstract:

    The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining Vector Space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into Vectors generated monolingually. We evaluate the resulting word representations on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representations than monolingual techniques.

Joseph Reisinger - One of the best experts on this subject based on the ideXlab platform.

  • multi prototype Vector Space models of word meaning
    North American Chapter of the Association for Computational Linguistics, 2010
    Co-Authors: Joseph Reisinger, Raymond J Mooney
    Abstract:

    Current Vector-Space models of lexical semantics create a single "prototype" Vector to represent the meaning of a word. However, due to lexical ambiguity, encoding word meaning with a single Vector is problematic. This paper presents a method that uses clustering to produce multiple "sense-specific" Vectors for each word. This approach provides a context-dependent Vector representation of word meaning that naturally accommodates homonymy and polysemy. Experimental comparisons to human judgements of semantic similarity for both isolated words as well as words in sentential contexts demonstrate the superiority of this approach over both prototype and exemplar based Vector-Space models.

  • HLT-NAACL - Multi-Prototype Vector-Space Models of Word Meaning
    2010
    Co-Authors: Joseph Reisinger, Raymond J Mooney
    Abstract:

    Current Vector-Space models of lexical semantics create a single "prototype" Vector to represent the meaning of a word. However, due to lexical ambiguity, encoding word meaning with a single Vector is problematic. This paper presents a method that uses clustering to produce multiple "sense-specific" Vectors for each word. This approach provides a context-dependent Vector representation of word meaning that naturally accommodates homonymy and polysemy. Experimental comparisons to human judgements of semantic similarity for both isolated words as well as words in sentential contexts demonstrate the superiority of this approach over both prototype and exemplar based Vector-Space models.

Guoli Peng - One of the best experts on this subject based on the ideXlab platform.

  • an improved focused crawler based on semantic similarity Vector Space model
    Applied Soft Computing, 2015
    Co-Authors: Yajun Du, Xianjing Lv, Guoli Peng
    Abstract:

    An improved retrieval model - the Semantic Similarity Vector Space Model (SSVSM).The proposed model accurately predicts the unvisited URLs - priorities to the given topic.The proposed model guides focused crawlers to download large quantity and high quality web pages. A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. In many studies, the Vector Space Model (VSM) and Semantic Similarity Retrieval Model (SSRM) take advantage of cosine similarity and semantic similarity to compute similarities between web pages and the given topic. However, if there are no common terms between a web page and the given topic, the VSM will not obtain the proper topical similarity of the web page. In addition, if all of the terms between them are synonyms, then the SSRM will also not obtain the proper topical similarity. To address these problems, this paper proposes an improved retrieval model, the Semantic Similarity Vector Space Model (SSVSM), which integrates the TF*IDF values of the terms and the semantic similarities among the terms to construct topic and document semantic Vectors that are mapped to the same double-term set, and computes the cosine similarities between these semantic Vectors as topic-relevant similarities of documents, including the full texts and anchor texts of unvisited hyperlinks. Next, the proposed model predicts the priorities of the unvisited hyperlinks by integrating the full text and anchor text topic-relevant similarities. The experimental results demonstrate that this approach improves the performance of the focused crawlers and outperforms other focused crawlers based on Breadth-First, VSM and SSRM. In conclusion, this method is significant and effective for focused crawlers.

Haizhou Li - One of the best experts on this subject based on the ideXlab platform.

  • a Vector Space modeling approach to spoken language identification
    IEEE Transactions on Audio Speech and Language Processing, 2007
    Co-Authors: Haizhou Li
    Abstract:

    We propose a novel approach to automatic spoken language identification (LID) based on Vector Space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term Vector, we convert a spoken utterance into a feature Vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a Vector Space classifier for LID. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks