What is a P-Value?
A p-value is a probability. Specifically, it is the probability of getting results at least as extreme as what you observed, assuming the null hypothesis is true. That single sentence contains a lot — let's unpack it carefully, because p-values are among the most misunderstood concepts in all of statistics.
In this lesson
1 What a P-Value Actually Is
Start with a concrete example. You flip a coin 100 times and get 60 heads. You suspect the coin is biased. The question is: if this coin were actually fair, how often would you expect to get 60 or more heads just by chance?
That probability is the p-value. If it's 0.03, it means: if the coin were fair, you'd only get 60+ heads 3% of the time by pure chance. That's unlikely enough that you might conclude the coin is probably biased.
The p-value is the probability of observing your data (or something more extreme) if the null hypothesis were true. It does NOT tell you the probability that the null hypothesis is true or false.
The null hypothesis is typically the boring, default assumption — "this coin is fair," "this drug has no effect," "these two groups are the same." You calculate how surprising your observed data would be if that boring assumption were correct.
2 Hypothesis Testing Context
P-values come from hypothesis tests. The setup always has the same structure: you state a null hypothesis (H₀, the default assumption) and an alternative hypothesis (H₁, what you're trying to show). You collect data, run a statistical test, and get a p-value.
The threshold 0.05 (or 5%) is a convention, not a law of nature. It was proposed by statistician Ronald Fisher in 1925 and has stuck. Some fields use 0.01 (1%) or even 0.001 for higher standards. Particle physics uses 0.0000003 (five sigma) before declaring a new particle discovered.
3 How to Interpret p < 0.05
When p < 0.05, the result is called "statistically significant." This means: if the null hypothesis were true, you'd see results this extreme less than 5% of the time by chance. It suggests your result is unlikely to be pure noise.
When p > 0.05, the result is "not statistically significant." This does NOT mean the null hypothesis is true. It means the data doesn't provide strong enough evidence against it. Absence of evidence is not evidence of absence.
A result can be statistically significant but practically meaningless. With a large enough sample, tiny differences become statistically significant. A drug that reduces blood pressure by 0.1 mmHg might produce p = 0.001 in a 100,000-person trial — statistically significant, clinically irrelevant.
4 What P-Values Do NOT Tell You
This is where most misunderstandings live. A p-value does NOT tell you:
The probability that the null hypothesis is true. P = 0.03 does not mean there is a 3% chance the drug has no effect. It means: if the drug had no effect, there would be a 3% chance of seeing your data. These sound similar but are logically different statements.
The size or importance of the effect. A tiny, practically meaningless difference can produce a very small p-value with a large enough sample. Always report effect sizes alongside p-values.
That your result will replicate. The replication crisis in science partly stems from treating p < 0.05 as proof. A single study producing p = 0.04 is weak evidence. Replicated findings across multiple independent studies are strong evidence.
'The p-value is the probability that our results were due to chance.' This is wrong. The p-value assumes chance (the null hypothesis) and asks how extreme the data is under that assumption. It does not calculate the probability that chance explains your results.
5 P-Hacking and Why It Matters
P-hacking is manipulating analysis until p < 0.05 appears. This includes: collecting data until significance is reached and stopping there, trying multiple outcomes and only reporting the significant one, or excluding data points that hurt significance. Each practice inflates false positive rates dramatically.
If you run 20 statistical tests on random data, you'd expect one to show p < 0.05 by pure chance (the 5% threshold). Reporting only that one test as a finding is p-hacking even if unintentional.
The solution is pre-registration (stating hypotheses before collecting data), reporting all tests conducted, and treating p-values as one piece of evidence rather than a binary pass/fail gate.
Practice Problems
📚 Further Reading & Resources
Go deeper with these trusted free resources.