jump to navigation

Test your design intuition December 10, 2009

Posted by jeremyliew in A:B testing, UI.
1 comment so far

I’m an advocate of A:B testing of all elements of design and copy. However, that doesn’t mean that good design intuition can’t help advance the baseline from which you start your testing.

Which Test Won? has a list of real world A:B tests run on different homepages, lead gen pages, search landing pages etc, all with an eye to which helped advance the funnel the best.

Which Dell Coupons Page Layout Resulted in More Sales? A/B Test by an Online Publisher

VERSION A

VS.

VERSION B

TechTargetvasm TechTargetvbsm

Helpfully, it also analyzes the results and draws specific design conclusions.

I recommend checking the site out for anyone in social media, social gaming, ecommerce or lead gen.

Which user experience research tools a startup should use, and when November 7, 2008

Posted by jeremyliew in A:B testing, product management, UI, usability.
6 comments

I recently posted about how usability testing can slow down launch but speed up success. But usability testing is just one of many elements of user experience research, with others including the ethnographic field studies made popular by Ideo, the A:B testing becoming standard for web 2.0, customer feedback, focus groups etc. With so many tools at your disposal, which user experience research tools should you use and when?

Jakob Nielsen recently posted about this topic, and concluded that it depends on what phase of product management you are in. For startups, my summary of his work is below:

Ideation: At the very beginning of the process you are looking at new ideas and opportunities. In this phase, aside form the founders vision, ethnographic field studies, focus groups, diary studies, surveys and data mining of webwide behavior can all be useful. Most startups will not have access to proprietary user data of existing products to identify additional opportunities.

Pre-launch: Once you’ve settled on a product idea and are working towards (beta) launch, you want to improve design and functionality as much as possible to minimize risk and maximize the likelihood for a successful launch. In this phase rely on tools such as cardsorting, paper prototype and usability studies, participatory design, desirability studies and field studies (including closed alpha launches to “friendlies”) to improve the user experience.

Post-launch: Once you’ve pushed the product out you will have live data that you can use to compare the product both to itself and to its competition. In this phase, usability benchmarking, online assessments, customer emails, surveys and A/B testing will be your primary tools

Nielsen provides some additional frameworks to differentiate when to use different forms of user experience research in his post. The site is a good resource about user experience in general.

Tips on A:B testing November 4, 2008

Posted by jeremyliew in A:B testing, product management.
4 comments

If you’re doing A:B testing, you should read this paper and the accompanying presentation by Kohavi, Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO. It is a comprehensive primer on how to do A:B testing well.

Some nuggets of goodness culled from the paper in no particular order (some sound obvious but read the paper to get more context). It will take you 20 minutes and you’ll use the lessons you learn for years:

1. Agree on evaluation criteria UP FRONT (vs after the fact analysis)
2. Ensure sample size is sufficiently large to have high confidence in results (with small samples, testing a version against itself can show wide ranges in performance)
3. Be truly random, and consistent when allocating users to groups
4. Ramp experiments from 1% of users to [50%] of users to get results fast [subject to day of week variations]. Auto abort if the new version is significantly worse.
5. Account for: robots skewing results, “newness” effect, time of day, day of week
6. Understand why as well as what (e.g. is lower performance caused by slower load time? incompatible browser types or screen sizes? etc. If fixable, fix and retest.)
7. Integrate constant testing into culture and systems
8. Test all the time.

Discovered via Eric Reis

Usability testing slows down launch but speeds up success October 10, 2008

Posted by jeremyliew in A:B testing, product management, UI, usability.
9 comments

I hate hearing the term “user error”. Good usability testing should eliminate most user error. I am a big proponent of A:B testing with live users. However, often a small usability test can quickly highlight any big problems before you go live so that you are working from a better starting point.

Many developers like to quickly prototype and push code out quickly, and I am a fan of this. However, if taken to an extreme, it can lead to products based on incorrect assumptions. Noah Kagan notes that at Mint:

We did surveys, user testing and psychological profiles. This was extremely useful in identifying the types of users we may have on the site and especially for seeing how people use the site. I never really did this before and was AMAZED how people use the site vs. what I expected

As Noah points out, usability testing can be easy and cheap. What it requires is simple:

1. Determine what you are trying to test. This is usually a list of the form, “How easy is it to complete task X?”

2. Recruit representative users. If you’re testing a new user experience, Craigslist is fine for this [Tip – if your core user is a middle america, . But make sure that your testers are truly representative. Your existing user base is another good place to find testers, but make sure that you’re not just listening to just your loudest users. The key is to pre-qualify the users to ensure that they are “average”.

3. Do the test. First ask them what their first impressions of the site are. What captures their attention? What would they do first? Ask the users to speak out loud during the session, explaining what they are thinking at all times. Then ask them to complete the tasks that you have listed. Watch and listen. Note what they find easy, what they find confusing, and what they don’t find at all.

This can be incredibly frustrating for you. You’ll think that some things are “obvious” that are not, or you’ll be shocked to see how unfamiliar users are with your site, or even with how browsers work. Remember that your role is to learn, not teach. Don’t touch the screen, the keyboard or the mouse; don’t point out how to do anything (even when they are “doing it wrong”, even if it is a “basic mistake”). You can provide encouragement and reassurance, or ask questions about why they did something, but that is it. You’ll be surprised at what you see.

The key is to internalize that there is no such thing as “user error” and there are no “stupid users”. If users are having problems achieving the tasks that you laid out for them, then the fault lies with your site. You’ll need to review your UI.

I prefer to do these usability tests over webex with users at their own computers. This makes the interaction as natural as possible for the testers. As an added bonus, you can then record both their screencast and the phonecall for later review.

Usability testing does not have to be a lot of work/ You only need to test five users to uncover most usability problems.


The most striking truth of the curve is that zero users give zero insights.

When you’ve completed your usability test, go back and makes some changes, but then come back and test again:

You want to run multiple tests because the real goal of usability engineering is to improve the design and not just to document its weaknesses. After the first study with 5 users has found 85% of the usability problems, you will want to fix these problems in a redesign.

After creating the new design, you need to test again. Even though I said that the redesign should “fix” the problems found in the first study, the truth is that you think that the new design overcomes the problems. But since nobody can design the perfect user interface, there is no guarantee that the new design does in fact fix the problems. A second test will discover whether the fixes worked or whether they didn’t. Also, in introducing a new design, there is always the risk of introducing a new usability problem, even if the old one did get fixed.

Also, the second test with 5 users will discover most of the remaining 15% of the original usability problems that were not found in the first test. (There will still be 2% of the original problems left – they will have to wait until the third test to be identified.)

Finally, the second test will be able to probe deeper into the usability of the fundamental structure of the site, assessing issues like information architecture, task flow, and match with user needs. These important issues are often obscured in initial studies where the users are stumped by stupid surface-level usability problems that prevent them from really digging into the site.

This can all feel like overhead when all you want to do is launch. Trust me, it isn’t overhead. Getting this stuff closer to right the first time will only help you reach your goals faster.

How to implement reporting and analytics for your startup July 30, 2008

Posted by jeremyliew in A:B testing, analytics, product management, start-up, startup, startups.
2 comments

Andrew Chen has a good post on how a startup should think about implementing analytics that I think applies to companies of all sizes and is worth reading. He notes:

In general, a philosophy on the role of analytics within a startup is:

If you’re not going to do something about it, it may not be worth measuring.

(Similarly, if you want to act to improve something, you’ll want to measure it)

Don’t build metrics that aren’t going to be part of your day-to-day operations or don’t have potential to be incorporated as such. Building reports that no one looks at is just activity without accomplishment, and is a waste of time.

He goes on:

Metrics as a “product tax”
In fact, one way to view analytics is that they are a double-digit “tax” on your product development process because of a couple things:

* It takes engineers lots of time and development effort
* It produces numbers that people argue about
* It requires machines, serious infrastructure, its own software, etc
* Fundamentally, it slows down your feature development

As a rough estimate, I’ve found that it takes between 25-40% of your resources to do analytics REALLY well. So for every 3 engineers working on product features, you’d want to put 1 just on analytics. This may seem like a ton (and it is), but it throws off indispensible knowledge that you can’t get elsewhere, like:

* Validating your assumptions
* Pinpointing bottlenecks and key problems
* Creating the ability to predict/model your business to make future decisions
* It tells you which features actually are good and what features don’t matter

I recommend reading the whole thing.

One additional piece of advice that I’ve found helpful:

1. Ask the product owners to use excel to mock up EXACTLY the reports that they would like to use, whether charts, tables, graphs, including time periods and mock data. This is way better than PRDs when it comes to reporting.
2. Go line by line through these reports with the product owners and ask them “what decision will you make with this data”. If the answer is “none” or if it is for investigation or theortical purposes rather than frequent operating decisions, cut the report out.

Happy analytics!

How many A:B tests do I have to run before it is meaningful? July 21, 2008

Posted by jeremyliew in A:B testing, product management.
2 comments

Inside Facebook has a good post about how to not screw up your A:B testing that is a useful reminder about how many tests you need to run before you know that the results are statistically significant.

The author notes:

How many [tests] do we need to declare a statistically significant difference between [a design leading to action success rate] of p1 and one of p2? This is readily calculable:

* number of samples required per cell = 2.7 * (p1*(1-p1) + p2*(1-p2))/(p1-p2)^2

(By the way, the pre-factor of 2.7 has a one-sided confidence level of 95% and power of 50% baked into it. These have to do with the risk of choosing to switch when you shouldn’t and not switching when you should. We’re not running drug trials here so these two choices are fine for our purposes. The above calculation will determine the minimum and also the maximum you need to run.)

Thus, if you did this number of tests and found that the difference in action success was greater than (p1-p2), then you would have a 95% confidence level that the design being tested is responsible for the increase in success rate, and you would move to a new best practice.

The author reminds developers to adhere to A:B testing best practices, including:

# Running the two cells concurrently
# Randomly assigning an individual user to a cell and make sure they stay in that cell during the test
# Scheduling the test to neutralize time-of-day and day-of-week effects.
# Serving users from countries that are of interest.

One thing that immediately emerges from this formula is that you don’t need that many tests to determine if a new design is working. For example, testing a design that anticipates increasing success from .5% to .575% only needs about 52k tests. For apps and websites that are at scale, this does not take very long.

The danger is that, because of the overhead of putting up and taking down tests, “bad” test designs stay up for too long, exposing too many users to a worse experience than usual. While some people consider A:B testing to be splitting users into equal groups, there is no such requirement. I’d advise developers to size their test cells to be x% of their total traffic, where x% is a little more required to hit the minimums calculated above over a week. This neutralizes time of day and day of week effects, minimizes the overhead of test set up, and ensures that not too many users are exposed to bad designs. It also allows multiple, independent tests to be run simultaneously.