Personnellement, j’ai une formation universitaire en psychologie où j’ai appris l’importance de la validité interne et externe des données recueillies notamment lors des tests. J’ai fait quelques bonds et poussé quelques profs de marketing dans leurs retranchements quand ils m’expliquaient « qu’on valide le quanti avec le quali » ou l’inverse (ça changeait suivant les profs). Tout cela pour vous dire que Jeff Sauro vient de publier un livre disponible en France : Quantifying the User Experience: Practical Statistics for User Research. Bien sur là, ça ne laisse plus la place à l’improvisation et pif au mètre !
Pour vous faire une idée du travail de Jeff Sauro, vous pouvez lire son blog et retrouver, sur son site, des outils statistiques bien pratiques.
Le sommaire du livre :
- Introduction & How to Use this Book
- Visual Guide to What Test
- Skipping the formulas
- Quantifying User Research
- What is User Research?
- Usability Tests (lab and remote)
- Benchmarking
- Comparative Testing
- Qualitative Studies
- Surveys
- Requirements Gathering
- A/B Testing
- Questionnaires
- Using Inferential Statistics with usability Data
- Samples Size, Normality and other statistical concerns
- Measuring Usability: Quantifiable Aspects of Usability
- Introduction: Metrics as independent to formative and summative tests
- Completion
- Time
- Satisfaction
- Errors
- Clicks / Page Views
- Combined Scores
- Problems Discovered
- How precise are our estimates: Confidence Intervals
- Confidence Interval = Twice the Margin of Error
- Confidence Intervals Provide Precision & Location
- Three Components of a Confidence Interval
- Confidence Level
- Variability
- Sample Size
- Confidence Interval for a Completion Rate
- Confidence Interval History
- Wald Interval: terribly inaccurate for small samples
- Exact Confidence Interval
- Adjusted-Wald: Add Two Successes & Two Failures
- Best Point Estimates for a Completion Rate
- How accurate are point estimates from small samples?
- Confidence Interval for a Problem Occurrence
- Confidence Interval for Rating Scales and other Continuous Data
- Confidence Interval for Task Time Data
- Mean or Median Task Time?
- The Geometric Mean
- Log Transforming Confidence Intervals for Task Time Data
- Confidence Interval for a Median
- Did we meet or exceed our goal?
- Introduction
- One-Tailed and Two-Tailed Tests
- Comparing a Completion Rate to a Benchmark
- Small Sample Test
- Mid-Probability
- Large Sample Test
- Comparing a Satisfaction Score to a Benchmark
- Do at Least 75% Agree? Converting Continuous Ratings to Discrete
- Disadvantages to Converting Continuous Ratings to Discrete
- Net Promoter Score
- Comparing a Task Time to a Benchmark
- Is there a statistical difference between products?
- Comparing two Means (Rating Scales & Task Times)
- 2-sample t-test (between subjects)
- Confidence Interval around the Difference
- Paired t-test (within subjects)
- Confidence Interval around the Difference
- Comparing Completion Rates
- Small Samples : Fisher Exact Test
- Large-Samples : The N-1 2-proportion test
- Confidence Interval around the Difference
- Relationship between Chi-Square Tests and 2-proportion tests
- A/B Testing & Conversion Rates
- What Sample Sizes Do We Need? Part 1: Summative Usability Studies
- Introduction
- Why Do We Care?
- The Type of Usability Study Matters
- Basic Principles of Summative Sample Size Estimation
- Estimating Values
- Example 1: A Realistic Usability Testing Example Given Estimate of Variability
- Example 2: An Unrealistic Usability Testing Example
- Example 3: No Estimate of Variability
- Comparing Values
- Example 4: Comparison with a Benchmark
- Example 5: Within-Subjects Comparison of an Alternative
- Example 6: Between-Subjects Comparison of an Alternative
- Example 7: Where’s the Power?
- What Can I Do to Control Variability
- Sample Size Estimation for Binomial Confidence Intervals
- Binomial Sample Size Estimation for Small Samples
- Sample Size for Comparison with a Benchmark Proportion
- Sample Size Estimation for Proportions & Chi-Squared Tests
- What Sample Sizes Do We Need? Part 2 : Problem Discovery
- Using a Probabilistic Model of Problem Discovery to Estimate Sample Sizes for Formative User Research
- The famous equation (P(x ≥ 1) = 1 – (1 – p)n
- Deriving a sample size estimation equation from 1 – (1 – p)n
- Using the tables to plan sample sizes for formative user research
- Assumptions of the Binomial Probability Model
- Additional Applications of the Model
- Estimating the composite value of p for multiple problems or other events
- Adjusting small-sample composite estimates of p
- Estimating p
- Adjusting the Initial Estimate of p
- Using the Adjusted Estimate of p
- Investigating Sample Size Effectiveness
- Estimating the Number of Problems Available for Discovery
- What Affects the Value of p?
- Attitudinal Measurement with Questionnaires
- Scales, Labels and Points
- Post-Task Questionnaires
- ASQ, SMEQ, 1-question Likert
- Post-Test
- SUS, SUMI, PSSUQ, Homegrown scales
- Usability and Loyalty
- Net Promoter Scores and SUS
- Controversies in Measurement & Statistics
- Industrial versus Scientific: Purpose of statistics is to help in better decision making over the long run
- Multi-Point Scales
- p-values and NHST
- Parametric versus Non-Parametric Statistics
- Which confidence level
- When x=n or x=0 what confidence level do you use?
- Multiple testing versus omnibus testing
- 2 x 2 tables
- Final Thoughts on Statistics for User Research
- Appendix A: A Crash Course in Fundamental Statistical Concepts
- Central Tendency: Mean & Median
- Standard Deviation & Variance
- Population Parameters and Sample Statistics
- Standard Deviation
- Margin of Error
- Alpha
- Standard Error of the Mean
- Central Limit Theorem
- The normal distribution
- The Binomial Distribution
- Normal Approximation to the Binomial
- Introduction to Hypothesis Testing
- The Null and Alternative Hypothesis (Ho and Ha)
- Type I and Type II Errors
- Confidence and Power
- Making decisions from p-values
- If p is low reject the Ho
- One and Two Tailed Tests
- Mechanics of Test Statistics
- z statistics
- t-statistics
Encore un billet intéressant. Effectivement, ça fait du bien de voir un peu de rigueur sur le sujet ; bien souvent la jeunesse de la discipline est considérée comme une carte blanche pour ne pas avoir à s’encombrer de tests formels.