Online Evaluation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 128439 Experts worldwide ranked by ideXlab platform

Pavel Serdyukov - One of the best experts on this subject based on the ideXlab platform.

  • SIGIR - Effective Online Evaluation for Web Search
    Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019
    Co-Authors: Alexey Drutsa, Pavel Serdyukov, Gleb Gusev, Eugene Kharitonov, Denis Kulemyakin, Igor Yashkov
    Abstract:

    We present you a program of a balanced mix between an overview of academic achievements in the field of Online Evaluation and a portion of unique industrial practical experience shared by both the leading researchers and engineers from global Internet companies. First, we give basic knowledge from mathematical statistics. This is followed by foundations of main Evaluation methods such as A/B testing, interleaving, and observational studies. Then, we share rich industrial experiences on constructing of an experimentation pipeline and Evaluation metrics (emphasizing best practices and common pitfalls). A large part of our tutorial is devoted to modern and state-of-the-art techniques (including the ones based on machine learning) that allow to conduct Online experimentation efficiently. We invite software engineers, designers, analysts, and managers of web services and software products, as well as beginners, advanced specialists, and researchers to learn how to make web service development effectively data-driven.

  • Online Evaluation for Effective Web Service Development
    arXiv: Human-Computer Interaction, 2018
    Co-Authors: Roman Budylin, Pavel Serdyukov, Alexey Drutsa, Gleb Gusev, Igor Yashkov
    Abstract:

    Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on Evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large internet companies use Online Evaluation on a day-to-day basis and at a large scale. The number of smaller companies using A/B testing in their development cycle is also growing. Web development across the board strongly depends on quality of experimentation platforms. In this tutorial, we overview state-of-the-art methods underlying everyday Evaluation pipelines at some of the leading Internet companies. Software engineers, designers, analysts, service or product managers --- beginners, advanced specialists, and researchers --- can learn how to make web service development data-driven and do it effectively.

  • SIGIR - Challenges and Opportunities in Online Evaluation of Search Engines
    Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015
    Co-Authors: Pavel Serdyukov
    Abstract:

    Yandex is one of the largest Internet companies in Europe, operating Russia's most popular search engine, generating 58.6\% of all search traffic in Russia (as of April 2015). As all modern search engines, Yandex increasingly relies on Online Evaluation methods such as A/B tests and interleaving. These Online Evaluation methods test various changes in the search engine by analyzing the changes in the character of its interactions with its users. There are several grand challenges in Online Evaluation, including the choice of an appropriate Online metric and the need to deal the limited number of user interactions available for a search engine for experimentation. In my talk, I will overview our latest research on improving the sensitivity of well-known Online metrics, on discovery of more sensitive and robust Online metrics, on scheduling and early stopping of Online experiments.

Amr Huber - One of the best experts on this subject based on the ideXlab platform.

  • offline and Online Evaluation of news recommender systems at swissinfo ch
    Conference on Recommender Systems, 2014
    Co-Authors: Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, Amr Huber
    Abstract:

    We report on the live Evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and Online accuracy Evaluations. In an offline setting, recommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For Online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the Evaluation of recommender systems with offline data as well as for the use of the click-through rate as a performance indicator.

  • RecSys - Offline and Online Evaluation of news recommender systems at swissinfo.ch
    Proceedings of the 8th ACM Conference on Recommender systems - RecSys '14, 2014
    Co-Authors: Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, Amr Huber
    Abstract:

    We report on the live Evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and Online accuracy Evaluations. In an offline setting, recommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For Online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the Evaluation of recommender systems with offline data as well as for the use of the click-through rate as a performance indicator.

Igor Yashkov - One of the best experts on this subject based on the ideXlab platform.

  • SIGIR - Effective Online Evaluation for Web Search
    Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019
    Co-Authors: Alexey Drutsa, Pavel Serdyukov, Gleb Gusev, Eugene Kharitonov, Denis Kulemyakin, Igor Yashkov
    Abstract:

    We present you a program of a balanced mix between an overview of academic achievements in the field of Online Evaluation and a portion of unique industrial practical experience shared by both the leading researchers and engineers from global Internet companies. First, we give basic knowledge from mathematical statistics. This is followed by foundations of main Evaluation methods such as A/B testing, interleaving, and observational studies. Then, we share rich industrial experiences on constructing of an experimentation pipeline and Evaluation metrics (emphasizing best practices and common pitfalls). A large part of our tutorial is devoted to modern and state-of-the-art techniques (including the ones based on machine learning) that allow to conduct Online experimentation efficiently. We invite software engineers, designers, analysts, and managers of web services and software products, as well as beginners, advanced specialists, and researchers to learn how to make web service development effectively data-driven.

  • Online Evaluation for Effective Web Service Development
    arXiv: Human-Computer Interaction, 2018
    Co-Authors: Roman Budylin, Pavel Serdyukov, Alexey Drutsa, Gleb Gusev, Igor Yashkov
    Abstract:

    Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on Evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large internet companies use Online Evaluation on a day-to-day basis and at a large scale. The number of smaller companies using A/B testing in their development cycle is also growing. Web development across the board strongly depends on quality of experimentation platforms. In this tutorial, we overview state-of-the-art methods underlying everyday Evaluation pipelines at some of the leading Internet companies. Software engineers, designers, analysts, service or product managers --- beginners, advanced specialists, and researchers --- can learn how to make web service development data-driven and do it effectively.

Florent Garcin - One of the best experts on this subject based on the ideXlab platform.

  • offline and Online Evaluation of news recommender systems at swissinfo ch
    Conference on Recommender Systems, 2014
    Co-Authors: Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, Amr Huber
    Abstract:

    We report on the live Evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and Online accuracy Evaluations. In an offline setting, recommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For Online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the Evaluation of recommender systems with offline data as well as for the use of the click-through rate as a performance indicator.

  • RecSys - Offline and Online Evaluation of news recommender systems at swissinfo.ch
    Proceedings of the 8th ACM Conference on Recommender systems - RecSys '14, 2014
    Co-Authors: Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, Amr Huber
    Abstract:

    We report on the live Evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and Online accuracy Evaluations. In an offline setting, recommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For Online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the Evaluation of recommender systems with offline data as well as for the use of the click-through rate as a performance indicator.

Jöran Beel - One of the best experts on this subject based on the ideXlab platform.

  • Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems.
    arXiv: Information Retrieval, 2019
    Co-Authors: Andrew Collins, Jöran Beel
    Abstract:

    Many recommendation algorithms are available to digital library recommender system operators. The effectiveness of algorithms is largely unreported by way of Online Evaluation. We compare a standard term-based recommendation approach to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases. We evaluate the consistency of their performance across multiple scenarios. Through our recommender-as-a-service Mr. DLib, we delivered 33.5M recommendations to users of Sowiport and Jabref over the course of 19 months, from March 2017 to October 2018. The effectiveness of the algorithms differs significantly between Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400% difference in effectiveness between the best and worst algorithm in both scenarios separately. The best performing algorithm in Sowiport (terms) is the worst performing in Jabref. The best performing algorithm in Jabref (keyphrases) is 70% worse in Sowiport, than Sowiport`s best algorithm (click-through rate; 0.1% terms, 0.03% keyphrases).

  • JCDL - Document Embeddings vs. Keyphrases vs. Terms for Recommender Systems: A Large-Scale Online Evaluation
    2019 ACM IEEE Joint Conference on Digital Libraries (JCDL), 2019
    Co-Authors: Andrew Collins, Jöran Beel
    Abstract:

    Many recommendation algorithms are available to digital library recommender system operators. The effectiveness of algorithms is largely unreported by way of Online Evaluation. We compare a standard term-based recommendation approach to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases. We evaluate the consistency of their performance across multiple scenarios. Through our recommender-system as-a-service Mr. DLib, we delivered 33.5M recommendations to users of Sowiport and Jabref over the course of 19 months, from March 2017 to October 2018. The effectiveness of the algorithms differs significantly between Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400% difference in effectiveness between the best and worst algorithm in both scenarios separately. The best performing algorithm in Sowiport (terms) is the worst performing in Jabref. The best performing algorithm in Jabref (keyphrases) is 70% worse in Sowiport, than Sowiport's best algorithm (click-through rate; 0.1% terms, 0.03% keyphrases).