Evaluator Effect

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1950 Experts worldwide ranked by ideXlab platform

Erik Frokjaer - One of the best experts on this subject based on the ideXlab platform.

  • a study of the Evaluator Effect in usability testing
    Human-Computer Interaction, 2008
    Co-Authors: Kasper Hornbaek, Erik Frokjaer
    Abstract:

    ABSTRACT The Evaluator Effect names the observation that usability Evaluators in similar conditions identify substantially different sets of usability problems. Yet little is known about the factors involved in the Evaluator Effect. We present a study of 50 novice Evaluators' usability tests and subsequent comparisons, in teams and individually, of the resulting usability problems. The same problems were analyzed independently by 10 human–computer interaction experts. The study shows an agreement between Evaluators of about 40%, indicating a substantial Evaluator Effect. Team matching of problems following the individual matching appears to improve the agreement, and Evaluators express greater satisfaction with the teams' matchings. The matchings of individuals, teams, and independent experts show Evaluator Effects of similar sizes; yet individuals, teams, and independent experts fundamentally disagree about which problems are similar. Previous claims in the literature about the Evaluator Effect are chall...

Kasper Hornbaek - One of the best experts on this subject based on the ideXlab platform.

  • a study of the Evaluator Effect in usability testing
    Human-Computer Interaction, 2008
    Co-Authors: Kasper Hornbaek, Erik Frokjaer
    Abstract:

    ABSTRACT The Evaluator Effect names the observation that usability Evaluators in similar conditions identify substantially different sets of usability problems. Yet little is known about the factors involved in the Evaluator Effect. We present a study of 50 novice Evaluators' usability tests and subsequent comparisons, in teams and individually, of the resulting usability problems. The same problems were analyzed independently by 10 human–computer interaction experts. The study shows an agreement between Evaluators of about 40%, indicating a substantial Evaluator Effect. Team matching of problems following the individual matching appears to improve the agreement, and Evaluators express greater satisfaction with the teams' matchings. The matchings of individuals, teams, and independent experts show Evaluator Effects of similar sizes; yet individuals, teams, and independent experts fundamentally disagree about which problems are similar. Previous claims in the literature about the Evaluator Effect are chall...

Niels Jacobsen - One of the best experts on this subject based on the ideXlab platform.

  • what you get is what you see revisiting the Evaluator Effect in usability tests
    Behaviour & Information Technology, 2014
    Co-Authors: Morten Hertzum, Rolf Molich, Niels Jacobsen
    Abstract:

    Usability evaluation is essential to user-centred design; yet, Evaluators who analyse the same usability test sessions have been found to identify substantially different sets of usability problems. We revisit this Evaluator Effect by having 19 experienced usability professionals analyse video-recorded test sessions with five users. Nine participants analysed moderated sessions; 10 participants analysed unmoderated sessions. For the moderated sessions, participants reported an average of 33% of the problems reported by all nine of these participants and 50% of the subset of problems reported as critical or serious by at least one participant. For the unmoderated sessions, the percentages were 32% and 40%. Thus, the Evaluator Effect was similar for moderated and unmoderated sessions, and it was substantial for the full set of problems and still present for the most severe problems. In addition, participants disagreed in their severity ratings. As much as 24% (moderated) and 30% (unmoderated) of the problem...

  • usability inspections by groups of specialists perceived agreement in spite of disparate observations
    Human Factors in Computing Systems, 2002
    Co-Authors: Morten Hertzum, Niels Jacobsen, Rolf Molich
    Abstract:

    Evaluators who examine the same system using the same usability evaluation method tend to report substantially different sets of problems. This so-called Evaluator Effect means that different evaluations point to considerably different revisions of the evaluated system. The first step in coping with the Evaluator Effect is to acknowledge its existence. In this study 11 usability specialists individually inspected a website and then met in four groups to combine their findings into group outputs. Although the overlap in reported problems between any two Evaluators averaged only 9%, the 11 Evaluators felt that they were largely in agreement. The Evaluators perceived their disparate observations as mulitiple sources of evidence in support of the same issues, not as disagreements. Thus, the group work increased the Evaluators' confidence in their individual inspections, rather than alerted them to the Evaluator Effect.

  • the Evaluator Effect a chilling fact about usability evaluation methods
    International Journal of Human-computer Interaction, 2001
    Co-Authors: Morten Hertzum, Niels Jacobsen
    Abstract:

    Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking- aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial Evaluator Effect in that multiple Evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the Evaluator Effect exists for both novice and experienced Evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 Evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no 1 of the 3 UEMs is consistently better than the others. Although Evaluator Effects of this magnitude may not be surprising for a UEM as informal as HE, it is cer...

  • the Evaluator Effect during first time use of the cognitive walkthrough technique
    International Conference on Human-Computer Interaction, 1999
    Co-Authors: Morten Hertzum, Niels Jacobsen
    Abstract:

    While several studies have evaluated how well CW predicts the problems encountered in thinking-aloud studies (e.g. John and Mashyna 1997, Lewis et al. 1990), only Lewis et al. have assessed to what extent different Evaluators obtain the same results when evaluating the same interface. Data from Lewis et al. suggests that the variability in performance among Evaluators using CW is much lower than that of Evaluators using heuristic evaluation or thinking-aloud studies (Jacobsen et al. 1998, Nielsen 1994). One reason for this seemingly higher robustness of CW might be that it is a quite structured process. CW has however evolved considerably since the study of Lewis et al. Moreover, their data was limited in sample size and applicability to actual CW Evaluators.

  • the Evaluator Effect in usability studies problem detection and severity judgments
    Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 1998
    Co-Authors: Niels Jacobsen, Morten Hertzum, Bonnie E John
    Abstract:

    Usability studies are commonly used in industry and applied in research as a yardstick for other usability evaluation methods. Though usability studies have been studied extensively, one potential ...

Morten Hertzum - One of the best experts on this subject based on the ideXlab platform.

  • what you get is what you see revisiting the Evaluator Effect in usability tests
    Behaviour & Information Technology, 2014
    Co-Authors: Morten Hertzum, Rolf Molich, Niels Jacobsen
    Abstract:

    Usability evaluation is essential to user-centred design; yet, Evaluators who analyse the same usability test sessions have been found to identify substantially different sets of usability problems. We revisit this Evaluator Effect by having 19 experienced usability professionals analyse video-recorded test sessions with five users. Nine participants analysed moderated sessions; 10 participants analysed unmoderated sessions. For the moderated sessions, participants reported an average of 33% of the problems reported by all nine of these participants and 50% of the subset of problems reported as critical or serious by at least one participant. For the unmoderated sessions, the percentages were 32% and 40%. Thus, the Evaluator Effect was similar for moderated and unmoderated sessions, and it was substantial for the full set of problems and still present for the most severe problems. In addition, participants disagreed in their severity ratings. As much as 24% (moderated) and 30% (unmoderated) of the problem...

  • usability inspections by groups of specialists perceived agreement in spite of disparate observations
    Human Factors in Computing Systems, 2002
    Co-Authors: Morten Hertzum, Niels Jacobsen, Rolf Molich
    Abstract:

    Evaluators who examine the same system using the same usability evaluation method tend to report substantially different sets of problems. This so-called Evaluator Effect means that different evaluations point to considerably different revisions of the evaluated system. The first step in coping with the Evaluator Effect is to acknowledge its existence. In this study 11 usability specialists individually inspected a website and then met in four groups to combine their findings into group outputs. Although the overlap in reported problems between any two Evaluators averaged only 9%, the 11 Evaluators felt that they were largely in agreement. The Evaluators perceived their disparate observations as mulitiple sources of evidence in support of the same issues, not as disagreements. Thus, the group work increased the Evaluators' confidence in their individual inspections, rather than alerted them to the Evaluator Effect.

  • the Evaluator Effect a chilling fact about usability evaluation methods
    International Journal of Human-computer Interaction, 2001
    Co-Authors: Morten Hertzum, Niels Jacobsen
    Abstract:

    Computer professionals have a need for robust, easy-to-use usability evaluation methods (UEMs) to help them systematically improve the usability of computer artifacts. However, cognitive walkthrough (CW), heuristic evaluation (HE), and thinking- aloud study (TA)-3 of the most widely used UEMs-suffer from a substantial Evaluator Effect in that multiple Evaluators evaluating the same interface with the same UEM detect markedly different sets of problems. A review of 11 studies of these 3 UEMs reveals that the Evaluator Effect exists for both novice and experienced Evaluators, for both cosmetic and severe problems, for both problem detection and severity assessment, and for evaluations of both simple and complex systems. The average agreement between any 2 Evaluators who have evaluated the same system using the same UEM ranges from 5% to 65%, and no 1 of the 3 UEMs is consistently better than the others. Although Evaluator Effects of this magnitude may not be surprising for a UEM as informal as HE, it is cer...

  • the Evaluator Effect during first time use of the cognitive walkthrough technique
    International Conference on Human-Computer Interaction, 1999
    Co-Authors: Morten Hertzum, Niels Jacobsen
    Abstract:

    While several studies have evaluated how well CW predicts the problems encountered in thinking-aloud studies (e.g. John and Mashyna 1997, Lewis et al. 1990), only Lewis et al. have assessed to what extent different Evaluators obtain the same results when evaluating the same interface. Data from Lewis et al. suggests that the variability in performance among Evaluators using CW is much lower than that of Evaluators using heuristic evaluation or thinking-aloud studies (Jacobsen et al. 1998, Nielsen 1994). One reason for this seemingly higher robustness of CW might be that it is a quite structured process. CW has however evolved considerably since the study of Lewis et al. Moreover, their data was limited in sample size and applicability to actual CW Evaluators.

  • the Evaluator Effect in usability studies problem detection and severity judgments
    Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 1998
    Co-Authors: Niels Jacobsen, Morten Hertzum, Bonnie E John
    Abstract:

    Usability studies are commonly used in industry and applied in research as a yardstick for other usability evaluation methods. Though usability studies have been studied extensively, one potential ...

M M Bekker - One of the best experts on this subject based on the ideXlab platform.

  • managing the Evaluator Effect in user testing
    International Conference on Human-Computer Interaction, 2003
    Co-Authors: Arnold P O S Vermeeren, I E H Van Kesteren, M M Bekker
    Abstract:

    If multiple Evaluators analyse the outcomes of a single user test, the agreement between their lists of identified usability problems tends to be limited. This is called the ‘Evaluator Effect’. In the present paper, three user tests, taken from various domains, are reported and Evaluator Effects were measured. In all three studies, the Evaluator Effect proved to be less than in Jacobsen et al.'s (1998) study, but still present. Through detailed analysis of the data, it was possible to identify various causes for the Evaluator Effect, ranging from inaccuracies in logging and mishearing verbal utterances to differences in interpreting user intentions. Suggested strategies for managing the Evaluator Effect are: doing a systematic and detailed data analysis with automated logging, discussing specific usability problems with other Evaluators, and having the entire data analysis done by multiple Evaluators.