Home is Where the Lab is: A Comparison of Online and Lab Data From a Time-sensitive Study of Interruption

Sandy J J Gould; Anna L Cox; Duncan P Brumby; Sarah Wiseman

doi:10.15346/hc.v2i1.4

Authors

Sandy J J Gould University College London
Anna L Cox University College London
Duncan P Brumby University College London
Sarah Wiseman University College London

DOI:

https://doi.org/10.15346/hc.v2i1.4

Keywords:

Online experimentation, interruptions, multitasking, human performance

Abstract

While experiments have been run online for some time with positive results, there are still outstanding questions about the kinds of tasks that can be successfully deployed to remotely situated online participants. Some tasks, such as menu selection, have worked well but these do not represent the gamut of tasks that interest HCI researchers. In particular, we wondered whether long-lasting, time-sensitive tasks that require continuous concentration could work successfully online, given the confounding effects that might accompany the online deployment of such a task. We ran an archetypal interruption experiment both online and in the lab to investigate whether studies demonstrating such characteristics might be more vulnerable to a loss of control than the short, time-insensitive studies that are representative of the majority of previous online studies. Statistical comparisons showed no significant differences in performance on a number of dimensions. However, there were issues with data quality that stemmed from participants misunderstanding the task. Our findings suggest that long-lasting experiments using time-sensitive performance measures can be run online but that care must be taken when introducing participants to experimental procedures.

References

Altmann, E. M., Trafton, G. J., & Hambrick, D. Z. (2013). Momentary Interruptions Can Derail the Train of Thought. Journal of Experimental Psychology: General, 143(1), 215–226. doi:10.1037/a0030986

Altmann, E. M., & Trafton, J. G. (2002). Memory for goals: an activation-based model. Cognitive Science, 26(1), 39–83. doi:10.1016/S0364-0213(01)00058-1

Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43(3), 800–813. doi:10.3758/s13428-011-0081-0

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk A New Source of Inexpensive, Yet High-Quality, Data? Perspectives on Psychological Science, 6(1), 3–5. doi:10.1177/1745691610393980

Cades, D. M., Davis, D. A. B., Trafton, J. G., & Monk, C. A. (2007). Does the Difficulty of an Interruption Affect Our Ability to Resume? Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 51(4), 234–238. doi:10.1177/154193120705100419

Chung, P. H., & Byrne, M. D. (2008). Cue effectiveness in mitigating postcompletion errors in a routine procedural task. International Journal of Human-Computer Studies, 66(4), 217–232. doi:10.1016/j.ijhcs.2007.09.001

Dabbish, L., Mark, G., & González, V. M. (2011). Why do I keep interrupting myself?: environment, habit and self-interruption. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 3127–3130). New York, NY, USA: ACM. doi:10.1145/1978942.1979405

Dandurand, F., Shultz, T. R., & Onishi, K. H. (2008). Comparing online and lab methods in a problem-solving experiment. Behavior Research Methods, 40(2), 428–434. doi:10.3758/BRM.40.2.428

Germine, L., Nakayama, K., Duchaine, B. C., Chabris, C. F., Chatterjee, G., & Wilmer, J. B. (2012). Is the Web as good as the lab? Comparable performance from Web and lab in cognitive/perceptual experiments. Psychonomic Bulletin & Review, 19(5), 847–857. doi:10.3758/s13423-012-0296-9

González, V. M., & Mark, G. (2004). “Constant, constant, multi-tasking craziness”: managing multiple working spheres. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 113–120). New York, NY, USA: ACM. doi:10.1145/985692.985707

Gray, W. D. (2000). The nature and processing of errors in interactive behavior. Cognitive Science, 24(2), 205–248. doi:10.1016/S0364-0213(00)00022-7

Heer, J., & Bostock, M. (2010). Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 203–212). New York, NY, USA: ACM. doi:10.1145/1753326.1753357

Hodgetts, H. M., & Jones, D. M. (2006). Contextual Cues Aid Recovery From Interruption: The Role of Associative Activation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(5), 1120–1132. doi:10.1037/0278-7393.32.5.1120

Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., … Horton, J. (2013). The future of crowd work. In Proceedings of the 2013 conference on Computer supported cooperative work (pp. 1301–1318). New York, NY, USA: ACM. doi:10.1145/2441776.2441923

Komarov, S., Reinecke, K., & Gajos, K. Z. (2013). Crowdsourcing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 207–216). New York, NY, USA: ACM. doi:10.1145/2470654.2470684

Li, S. Y. W., Blandford, A., Cairns, P., & Young, R. M. (2008). The Effect of Interruptions on Postcompletion and Other Procedural Errors: An Account Based on the Activation-Based Goal Memory Model. Journal of Experimental Psychology: Applied, 14(4), 314–328. doi:10.1037/a0014397

Li, S. Y. W., Cox, A. L., Blandford, A., Cairns, P., & Abeles, A. (2006). Further investigations into post-completion error: the effects of interruption position and duration. In Proceedings of the 28th Annual Meeting of the Cognitive Science Conference (pp. 471–476). Vancouver, BC, Canada: Cognitive Science Society.

Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44(1), 1–23. doi:10.3758/s13428-011-0124-6

Monk, C. A., Trafton, J. G., & Boehm-Davis, D. A. (2008). The Effect of Interruption Duration and Demand on Resuming Suspended Goals. Journal of Experimental Psychology: Applied, 14(4), 299–313. doi:10.1037/a0014402

Newell, A., & Card, S. K. (1985). The prospects for psychological science in human-computer interaction. Human-Computer Interaction, 1(3), 209–242. doi:10.1207/s15327051hci0103_1

Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45(4), 867–872. doi:10.1016/j.jesp.2009.03.009

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411–419.

Payne, S. J., Duggan, G. B., & Neth, H. (2007). Discretionary task interleaving: Heuristics for time allocation in cognitive foraging. Journal of Experimental Psychology: General, 136(3), 370–388. doi:10.1037/0096-3445.136.3.370

Ratwani, R. M., & Trafton, J. G. (2008). Spatial memory guides task resumption. Visual Cognition, 16(8), 1001–1010. doi:10.1080/13506280802025791

Rzeszotarski, J. M., & Kittur, A. (2011). Instrumenting the crowd: using implicit behavioral measures to predict task performance. In Proceedings of the 24th annual ACM symposium on User interface software and technology (pp. 13–22). New York, NY, USA: ACM. doi:10.1145/2047196.2047199

Salvucci, D. D. (2010). On reconstruction of task context after interruption. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 89–92). New York, NY, USA: ACM. doi:10.1145/1753326.1753341

Salvucci, D. D., & Bogunovich, P. (2010). Multitasking and monotasking: The effects of mental workload on deferred task interruptions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 85–88). New York, NY, USA: ACM. doi:10.1145/1753326.1753340

Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 254–263). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613751

Stanton, J. M. (1998). An Empirical Assessment of Data Collection Using the Internet. Personnel Psychology, 51(3), 709–725. doi:10.1111/j.1744-6570.1998.tb00259.x

Suri, S., & Watts, D. J. (2011). Cooperation and Contagion in Web-Based, Networked Public Goods Experiments. PLoS ONE, 6(3). doi:10.1371/journal.pone.0016836

Trafton, J. G., Altmann, E. M., & Ratwani, R. M. (2011). A memory for goals model of sequence errors. Cognitive Systems Research, 12(2), 134–143. doi:16/j.cogsys.2010.07.010