Hashtag

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 13737 Experts worldwide ranked by ideXlab platform

Aixin Sun - One of the best experts on this subject based on the ideXlab platform.

  • hspam14 a collection of 14 million tweets for Hashtag oriented spam research
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
    Co-Authors: Surendra Sedhai, Aixin Sun
    Abstract:

    Hashtag facilitates information diffusion in Twitter by creating dynamic and virtual communities for information aggregation from all Twitter users. Because Hashtags serve as additional channels for one's tweets to be potentially accessed by other users than her own followers, Hashtags are targeted for spamming purposes (e.g., Hashtag hijacking), particularly the popular and trending Hashtags. Although much effort has been devoted to fighting against email/web spam, limited studies are on Hashtag-oriented spam in tweets. In this paper, we collected 14 million tweets that matched some trending Hashtags in two months' time and then conducted systematic annotation of the tweets being spam and ham (i.e., non-spam). We name the annotated dataset HSpam14. Our annotation process includes four major steps: (i) heuristic-based selection to search for tweets that are more likely to be spam, (ii) near-duplicate cluster based annotation to firstly group similar tweets into clusters and then label the clusters, (iii) reliable ham tweets detection to label tweets that are non-spam, and (iv) Expectation-Maximization (EM)-based label prediction to predict the labels of remaining unlabeled tweets. One major contribution of this work is the creation of HSpam14 dataset, which can be used for Hashtag-oriented spam research in tweets. Another contribution is the observations made from the preliminary analysis of the HSpam14 dataset.

  • tagging your tweets a probabilistic modeling of Hashtag annotation in twitter
    Conference on Information and Knowledge Management, 2014
    Co-Authors: Aixin Sun, Quan Yuan, Gao Cong
    Abstract:

    The adoption of Hashtags in major social networks including Twitter, Facebook, and Google+ is a strong evidence of its importance in facilitating information diffusion and social chatting. To understand the factors (e.g., user interest, posting time and tweet content) that may affect Hashtag annotation in Twitter and to capture the implicit relations between latent topics in tweets and their corresponding Hashtags, we propose two PLSA-style topic models to model the Hashtag annotation behavior in Twitter. Content-Pivoted Model (CPM) assumes that tweet content guides the generation of Hashtags while Hashtag-Pivoted Model (HPM) assumes that Hashtags guide the generation of tweet content. Both models jointly incorporate user, time, Hashtag and tweet content in a probabilistic framework. The PLSA-style models also enable us to verify the impact of social factor on Hashtag annotation by introducing social network regularization in the two models. We evaluate the proposed models using perplexity and demonstrate their effectiveness in two applications: retrospective Hashtag annotation and related Hashtag discovery. Our results show that HPM outperforms CPM by perplexity and both user and time are important factors that affect model performance. In addition, incorporating social network regularization does not improve model performance. Our experimental results also demonstrate the effectiveness of our models in both applications compared with baseline methods.

  • SIGIR - Hashtag recommendation for hyperlinked tweets
    Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014
    Co-Authors: Surendra Sedhai, Aixin Sun
    Abstract:

    Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of Hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending Hashtags to hyperlinked tweets, we argue that the functions of Hashtags such as providing the right context to interpret the tweets, tweet categorization, and tweet promotion, can be extended to the linked documents. The proposed solution for Hashtag recommendation consists of two phases. In the first phase, we select candidate Hashtags through five schemes by considering the similar tweets, the similar documents, the named entities contained in the document, and the domain of the link. In the second phase, we formulate the Hashtag recommendation problem as a learning to rank problem and adopt RankSVM to aggregate and rank the candidate Hashtags. Our experiments on a collection of 24 million tweets show that the proposed solution achieves promising results.

  • Hashtag recommendation for hyperlinked tweets
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014
    Co-Authors: Surendra Sedhai, Aixin Sun
    Abstract:

    Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of Hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending Hashtags to hyperlinked tweets, we argue that the functions of Hashtags such as providing the right context to interpret the tweets, tweet categorization, and tweet promotion, can be extended to the linked documents. The proposed solution for Hashtag recommendation consists of two phases. In the first phase, we select candidate Hashtags through five schemes by considering the similar tweets, the similar documents, the named entities contained in the document, and the domain of the link. In the second phase, we formulate the Hashtag recommendation problem as a learning to rank problem and adopt RankSVM to aggregate and rank the candidate Hashtags. Our experiments on a collection of 24 million tweets show that the proposed solution achieves promising results.

  • will this Hashtag be popular tomorrow
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012
    Co-Authors: Aixin Sun, Gao Cong
    Abstract:

    Hashtags are widely used in Twitter to define a shared context for events or topics. In this paper, we aim to predict Hashtag popularity in near future (i.e., next day). Given a Hashtag that has the potential to be popular in the next day, we construct a Hashtag profile using the tweets containing the Hashtag, and extract both content and context features for Hashtag popularity prediction. We model this prediction problem as a classification problem and evaluate the effectiveness of the extracted features and classification models.

Paola Velardi - One of the best experts on this subject based on the ideXlab platform.

  • Hashtag sense clustering based on temporal similarity
    Computational Linguistics, 2017
    Co-Authors: Giovanni Stilo, Paola Velardi
    Abstract:

    Hashtags are creative labels used in micro-blogs to characterize the topic of a message/discussion. Regardless of the use for which they were originally intended, Hashtags cannot be used as a means to cluster messages with similar content. First, because Hashtags are created in a spontaneous and highly dynamic way by users in multiple languages, the same topic can be associated with different Hashtags, and conversely, the same Hashtag may refer to different topics in different time periods. Second, contrary to common words, Hashtag disambiguation is complicated by the fact that no sense catalogs e.g., Wikipedia or WordNet are available; and, furthermore, Hashtag labels are difficult to analyze, as they often consist of acronyms, concatenated words, and so forth. A common way to determine the meaning of Hashtags has been to analyze their context, but, as we have just pointed out, Hashtags can have multiple and variable meanings. In this article, we propose a temporal sense clustering algorithm based on the idea that semantically related Hashtags have similar and synchronous usage patterns.

  • EKAW - Temporal Semantics: Time-Varying Hashtag Sense Clustering
    Lecture Notes in Computer Science, 2014
    Co-Authors: Giovanni Stilo, Paola Velardi
    Abstract:

    Hashtags are creative labels used in micro-blogs to characterize the topic of a message/discussion. However, since Hashtags are created in a spontaneous and highly dynamic way by users using multiple languages, the same topic can be associated to different Hashtags and conversely, the same Hashtag may imply different topics in different time spans. Contrary to common words, sense clustering for Hashtags is complicated by the fact that no sense catalogues are available, like, e.g. Wikipedia or WordNet and furthermore, Hashtag labels are often obscure. In this paper we propose a sense clustering algorithm based on temporal mining. First, Hashtag time series are converted into strings of symbols using Symbolic Aggregate ApproXimation (SAX), then, Hashtags are clustered based on string similarity and temporal co-occurrence. Evaluation is performed on two reference datasets of semantically tagged Hashtags. We also perform a complexity evaluation of our algorithm, since efficiency is a crucial performance factor when processing large-scale data streams, such as Twitter.

David Chiu - One of the best experts on this subject based on the ideXlab platform.

  • A Hashtag recommendation system for twitter data streams
    Computational Social Networks, 2016
    Co-Authors: Eriko Otsuka, Scott A. Wallace, David Chiu
    Abstract:

    Background Twitter has evolved into a powerful communication and information sharing tool used by millions of people around the world to post what is happening now. A Hashtag, a keyword prefixed with a hash symbol (#), is a feature in Twitter to organize tweets and facilitate effective search among a massive volume of data. In this paper, we propose an automatic Hashtag recommendation system that helps users find new Hashtags related to their interests on-demand. Methods For Hashtag ranking, we propose the Hashtag Frequency-Inverse Hashtag Ubiquity (HF-IHU) ranking scheme, which is a variation of the well-known TF-IDF, that considers Hashtag relevancy, as well as data sparseness which is one of the key challenges in analyzing microblog data. Our system is built on top of Hadoop, a leading platform for distributed computing, to provide scalable performance using Map-Reduce. Experiments on a large Twitter data set demonstrate that our method successfully yields relevant Hashtags for user’s interest and that recommendations are more stable and reliable than ranking tags based on tweet content similarity. Results and conclusions Our results show that HF-IHU can achieve over 30 % Hashtag recall when asked to identify the top 10 relevant Hashtags for a particular tweet. Furthermore, our method out-performs kNN, k-popularity, and Naïve Bayes by 69, 54, and 17 %, respectively, on recall of the top 200 Hashtags.

  • design and evaluation of a twitter Hashtag recommendation system
    International Database Engineering and Applications Symposium, 2014
    Co-Authors: Eriko Otsuka, Scott A. Wallace, David Chiu
    Abstract:

    Twitter has evolved into a powerful communication and information sharing tool used by millions of people around the world to post what is happening now. A Hashtag, a keyword prefixed with a hash symbol (#), is a feature in Twitter to organize tweets and facilitate effective search among a massive volume of data. In this paper, we propose an automatic Hashtag recommendation system that helps users find new Hashtags related to their interests. We propose the Hashtag Frequency-Inverse Hashtag Ubiquity (HF-IHU) ranking scheme, which is a variation of the well-known TF-IDF, that considers Hashtag relevancy, as well as data sparseness. Experiments on a large Twitter data set demonstrate that our method successfully yields relevant Hashtags for user's interest and that recommendations more stable and reliable than ranking tags based on tweet content similarity. Our results show that HF-IHU can achieve over 30% Hashtag recall when asked to identify the top 10 relevant Hashtags for a particular tweet.

  • IDEAS - Design and evaluation of a Twitter Hashtag recommendation system
    Proceedings of the 18th International Database Engineering & Applications Symposium on - IDEAS '14, 2014
    Co-Authors: Eriko Otsuka, Scott A. Wallace, David Chiu
    Abstract:

    Twitter has evolved into a powerful communication and information sharing tool used by millions of people around the world to post what is happening now. A Hashtag, a keyword prefixed with a hash symbol (#), is a feature in Twitter to organize tweets and facilitate effective search among a massive volume of data. In this paper, we propose an automatic Hashtag recommendation system that helps users find new Hashtags related to their interests. We propose the Hashtag Frequency-Inverse Hashtag Ubiquity (HF-IHU) ranking scheme, which is a variation of the well-known TF-IDF, that considers Hashtag relevancy, as well as data sparseness. Experiments on a large Twitter data set demonstrate that our method successfully yields relevant Hashtags for user's interest and that recommendations more stable and reliable than ranking tags based on tweet content similarity. Our results show that HF-IHU can achieve over 30% Hashtag recall when asked to identify the top 10 relevant Hashtags for a particular tweet.

Gao Cong - One of the best experts on this subject based on the ideXlab platform.

  • tagging your tweets a probabilistic modeling of Hashtag annotation in twitter
    Conference on Information and Knowledge Management, 2014
    Co-Authors: Aixin Sun, Quan Yuan, Gao Cong
    Abstract:

    The adoption of Hashtags in major social networks including Twitter, Facebook, and Google+ is a strong evidence of its importance in facilitating information diffusion and social chatting. To understand the factors (e.g., user interest, posting time and tweet content) that may affect Hashtag annotation in Twitter and to capture the implicit relations between latent topics in tweets and their corresponding Hashtags, we propose two PLSA-style topic models to model the Hashtag annotation behavior in Twitter. Content-Pivoted Model (CPM) assumes that tweet content guides the generation of Hashtags while Hashtag-Pivoted Model (HPM) assumes that Hashtags guide the generation of tweet content. Both models jointly incorporate user, time, Hashtag and tweet content in a probabilistic framework. The PLSA-style models also enable us to verify the impact of social factor on Hashtag annotation by introducing social network regularization in the two models. We evaluate the proposed models using perplexity and demonstrate their effectiveness in two applications: retrospective Hashtag annotation and related Hashtag discovery. Our results show that HPM outperforms CPM by perplexity and both user and time are important factors that affect model performance. In addition, incorporating social network regularization does not improve model performance. Our experimental results also demonstrate the effectiveness of our models in both applications compared with baseline methods.

  • will this Hashtag be popular tomorrow
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012
    Co-Authors: Aixin Sun, Gao Cong
    Abstract:

    Hashtags are widely used in Twitter to define a shared context for events or topics. In this paper, we aim to predict Hashtag popularity in near future (i.e., next day). Given a Hashtag that has the potential to be popular in the next day, we construct a Hashtag profile using the tweets containing the Hashtag, and extract both content and context features for Hashtag popularity prediction. We model this prediction problem as a classification problem and evaluate the effectiveness of the extracted features and classification models.

  • SIGIR - Will this #Hashtag be popular tomorrow?
    Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12, 2012
    Co-Authors: Aixin Sun, Gao Cong
    Abstract:

    Hashtags are widely used in Twitter to define a shared context for events or topics. In this paper, we aim to predict Hashtag popularity in near future (i.e., next day). Given a Hashtag that has the potential to be popular in the next day, we construct a Hashtag profile using the tweets containing the Hashtag, and extract both content and context features for Hashtag popularity prediction. We model this prediction problem as a classification problem and evaluate the effectiveness of the extracted features and classification models.

Surendra Sedhai - One of the best experts on this subject based on the ideXlab platform.

  • hspam14 a collection of 14 million tweets for Hashtag oriented spam research
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
    Co-Authors: Surendra Sedhai, Aixin Sun
    Abstract:

    Hashtag facilitates information diffusion in Twitter by creating dynamic and virtual communities for information aggregation from all Twitter users. Because Hashtags serve as additional channels for one's tweets to be potentially accessed by other users than her own followers, Hashtags are targeted for spamming purposes (e.g., Hashtag hijacking), particularly the popular and trending Hashtags. Although much effort has been devoted to fighting against email/web spam, limited studies are on Hashtag-oriented spam in tweets. In this paper, we collected 14 million tweets that matched some trending Hashtags in two months' time and then conducted systematic annotation of the tweets being spam and ham (i.e., non-spam). We name the annotated dataset HSpam14. Our annotation process includes four major steps: (i) heuristic-based selection to search for tweets that are more likely to be spam, (ii) near-duplicate cluster based annotation to firstly group similar tweets into clusters and then label the clusters, (iii) reliable ham tweets detection to label tweets that are non-spam, and (iv) Expectation-Maximization (EM)-based label prediction to predict the labels of remaining unlabeled tweets. One major contribution of this work is the creation of HSpam14 dataset, which can be used for Hashtag-oriented spam research in tweets. Another contribution is the observations made from the preliminary analysis of the HSpam14 dataset.

  • SIGIR - Hashtag recommendation for hyperlinked tweets
    Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014
    Co-Authors: Surendra Sedhai, Aixin Sun
    Abstract:

    Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of Hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending Hashtags to hyperlinked tweets, we argue that the functions of Hashtags such as providing the right context to interpret the tweets, tweet categorization, and tweet promotion, can be extended to the linked documents. The proposed solution for Hashtag recommendation consists of two phases. In the first phase, we select candidate Hashtags through five schemes by considering the similar tweets, the similar documents, the named entities contained in the document, and the domain of the link. In the second phase, we formulate the Hashtag recommendation problem as a learning to rank problem and adopt RankSVM to aggregate and rank the candidate Hashtags. Our experiments on a collection of 24 million tweets show that the proposed solution achieves promising results.

  • Hashtag recommendation for hyperlinked tweets
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014
    Co-Authors: Surendra Sedhai, Aixin Sun
    Abstract:

    Presence of hyperlink in a tweet is a strong indication of tweet being more informative. In this paper, we study the problem of Hashtag recommendation for hyperlinked tweets (i.e., tweets containing links to Web pages). By recommending Hashtags to hyperlinked tweets, we argue that the functions of Hashtags such as providing the right context to interpret the tweets, tweet categorization, and tweet promotion, can be extended to the linked documents. The proposed solution for Hashtag recommendation consists of two phases. In the first phase, we select candidate Hashtags through five schemes by considering the similar tweets, the similar documents, the named entities contained in the document, and the domain of the link. In the second phase, we formulate the Hashtag recommendation problem as a learning to rank problem and adopt RankSVM to aggregate and rank the candidate Hashtags. Our experiments on a collection of 24 million tweets show that the proposed solution achieves promising results.