Assignment 6: User Testing & Analysis

Overview. Over the last two months, you identified some unmet user needs with regards to social networking, brainstormed and refined a series of design ideas, and then implemented them as a full-stack web app. Hooray! The final step of the design process is then to test your app out with potential end users to evaluate how successfully it addresses the needs and issues you identified all the way back in A1, as well as other design goals you set yourself in subsequent assignments.

Purpose. This assignment will help you gain experience with planning and conducting user tests, and then reflecting and synthesizing a set of observations for future improvements.

Your Tasks

In this assignment, you will conduct two user tests, each with a different participant. Ideally, your participants are members of the intended audience of your application, but they may be anyone of your choosing—the only limitation is that they cannot be currently taking 6.1040. Each test should last approximately one hour, comprising a period where your participant works through a series of tasks you have set them, and then a portion to debrief them about their experience.

To plan your tests:

Prepopulate Realistic Data. As we described at the beginning of this class, we should not expect end-users to be designers. Thus, while dummy data (e.g., Lorem Ipsum, Alyssa P. Hackers and Ben Bitdiddles, etc.) were suitable during your design and implementation phases, end-users will have trouble understanding your application if it isn’t filled with realistic data that is appropriate for your domain.

Richly populate your app with a diverse range of data to give users a vibrant impression of what it would be like to use your app at the peak of its usage and popularity.
Formulate a task list. To make sure your user tests yield informative results, it can often be better to set your users specific tasks rather than let them explore your app in an open-ended way.

Create a list of tasks that cover the key concepts of your app, focusing on the concepts that are particularly unique or important to your design. Each task should typically involve executing a sequence of user interaction actions. You should include at least 5 tasks, from simple one-action tasks to more complex multi-action tasks, that will test how easily the user can cross the gulfs of execution and evaluation discussed in lecture. You should plan for these tasks to take roughly 40 minutes of your hour session.

Format your task list as a table, where each row corresponds to a different task and with columns for: (1) a short title for each task; (2) a succinct instruction, in words that could be given to the user; and, (3) a brief rationale for including the task, explaining why the task is worth testing, what you hope to learn or uncover about your design when testing this task with a user versus excuting it yourself as part of a cognitive walkthrough. Order the rows in such a way that any application state required by subsequent tasks has been correctly setup by earlier tasks.

Conduct your studies by asking each participant to perform the tasks in your task list. Remember to obtain your participant’s consent, and prep them about their role and expectations. Have them perform each task following the order defined in your table, and encourage them to think out loud. If they fall silent, prompt them to keep thinking out loud. If they get stuck, give them a chance to get unstuck first—only intervene if they are really unable to make progress. Try to say as little as possible, and avoid explaining the user interface to them.

Throughout the study, watch what your participant is doing, saying, and even feeling (e.g., watch for facial expressions, sighing, etc. which often signal frustration or other emotions). Take careful notes throughout. You might also consider capturing a screen recording, as well as audio/video of your participant (all with their consent, of course) for further analysis after the session is complete.

In the final 20 minutes of your hour session, debrief your participant to get their overall thoughts and impressions of your application. What did they think worked well, versus what could be improved? Dig into moments you noticed them hesitate, get confused, or get stuck—what did they find confusing, what were they hoping to do, how did they figure things out?

For each study, write a 300–500 word report that summarizes and analyzes key moments of participant behavior—i.e., what interesting or unexpected things did you notice, and why do you think they occurred. For instance, you might observe a participant struggle with a particular interface element or interaction flow—your analysis might then rely on your debriefing to describe what were they expecting to do, what did the interface do instead, across which gulf did the flow break down, etc. Aim for your report to be balanced between reporting positive and negative results.

Follow these summaries up by bullet pointing 3–5 flaws or opportunities for improvement. Describe what the flaw or opportunity is, explain why it is currently occuring, and brainstorm ways that future designs and implementations might be able to address it. For each bullet, classify it by level (physical, linguistic, or conceptual) and degree of severity (minor, moderate, major, critical). For instance, minor issues introduce some friction to the experience that, while annoying, a participant can recover from and move on; critical issues, however, bottleneck participants so severely that they required your intervention to make further progress. Moderate and major issues fall along that spectrum.

Submission

Post your writeups to your portfolio. As before, feel free to structure the document as you please—just don’t put things into a PDF! And, remember to check that your writeup displays properly when viewed through your deployed, public URL.

Submit this Google Form to finalize your assignment submission, also by the deadline.

You must complete both steps for us to consider your assignment submitted.

Rubric

Component	Excellent	Satisfactory	Poor
Prepopulated Data	The deployed app is richly populated with realistic data that gives a vivid impression of real-world usage.	The app is populated with plausible data, but this data is relatively terse or shallow in a way that hinders an impression of realistic use.	Deployed app contains several instances of dummy data or placeholder text that makes it look like a prototype or toy.
Task List	Task list exhaustively covers the key aspects of the application. Rationales are well-justified, succinctly but compellingly describing how insights might differ from cognitive walkthroughs.	Task list covers a broad range of functionality and complexity. Rationales begin to offer some initial justification but could go further in hypothesizing insights that could be gained from having real participants perform the tasks (vs. heuristics/cognitive walkthroughs/etc.).	Task list misses important functionality, or spans only a limited a range of complexity. Rationales poorly justify why it is necessary for a user to be performing these tasks.
Study Reports	Well-balanced reports that go beyond straightforward reporting of observations to richly analyze what caused interesting participant behavior. Analyses are well-grounded in evidence (e.g., participant quotes, your own observations and inferences, etc.).	Well-balanced reports, but focus primarily on reporting results with only preliminary analysis. Some evidence is provided but has an unclear connection to the observations/analyses or is otherwise uninformative.	Reports are unbalanced—overly focusing on either the positive or negative results—and/or miss several opportunities for analyzing results. Little to no evidence is provided to concretely ground observations/insights.
Design Flaws/Opportunities	At least 3 compelling flaws/opportunities are bulleted that span different levels and severities. Every bullet conveys a crisp, descriptive definition with rich explanations of how the flaw manifested and ways to address it in the future. All bullets are grounded in evidence from the study results.	At least 3 interesting flaws/opportunities are bulleted. Bullet points convey good descriptions, but are occasionally difficult to understand because they are too high-level. A more diverse range of levels or severities could have been explored. Explanations provide reasonable evidence, but are occasionally shallow in brainstorming future ideas.	3 or fewer flaws/opportunities are bulleted. They cluster around specific levels/severities, or surface issues that are trivial or did not need a user study to identify. Definitions are vague, and explanations are shallow.

As in previous assignments, while rubric cells may not map to specific point scores, qualitative judgments correspond roughly to grades of A (9/10), B (8/10), C (7/10).

Advice

Build rapport. Like the interviews you constructed in A1, we recommend building rapport with your participant so that they feel comfortable thinking out loud and making mistakes in front of you. Emphasize that you are testing your application, not the participant—their performance (including when they get stuck, confused, etc.) does not reflect poorly on them.
Prompting thinking aloud. Thinking out loud will feel very strange to most participants, and they will be prone to fall silent. When they do, prompt them to keep talking. Remind them to tell you what they are thinking, what they are trying to do, what questions come up when they try to do a task, and even reading out the things they see on the screen (which can be an important signal of the order in which participants read things on screen, including whether they miss certain things!.
Pre-decide where to help. You are going to be very tempted to jump in to help the participant every time they get stuck—resist that temptation. Instead, come up with a pre-determined set of criteria for when you will intervene, and try to stick to that. Of course, if participants are completely unable to make progress, that is a good time to intervene.