What is a P-Value?
P-values are one of the most used and most misunderstood numbers in science. Researchers cite them constantly, journalists misreport them constantly, and even statisticians argue about them. Here's what they actually mean.
A p-value tells you how surprising your data would be if there were actually no effect. A small p-value (below 0.05 by convention) means your results would be unlikely to occur by chance, so you have evidence against the "nothing is happening" assumption. It does NOT tell you the probability that your hypothesis is correct.
Start with a concrete example
You flip a coin 100 times and get 63 heads. You suspect the coin is rigged. How do you decide whether this is evidence of a rigged coin, or just normal random variation?
Here's the key question: if this coin were actually fair, how often would you expect to get 63 or more heads just by chance? It turns out the answer is about 1.3% of the time. That's the p-value , 0.013.
What does that tell you? If the coin is fair, getting 63+ heads would be pretty unusual. It doesn't prove the coin is rigged. But it's evidence. How much evidence depends on how small the p-value is, and on the standards you've set in advance.
The precise definition
The p-value is the probability of observing data at least as extreme as yours, assuming the null hypothesis is true.
Unpacking that: the null hypothesis is the boring default assumption , "this coin is fair," "this drug has no effect," "these two groups are the same." You never actually assume your hypothesis is true. You assume the opposite and ask whether your data is surprising under that assumption.
"At least as extreme" means: not just the exact outcome you got, but anything equally surprising or more so. With the coin, it's not just P(exactly 63 heads), it's P(63 or more heads).
This framing matters. The p-value is not asking "is my hypothesis true?" It's asking "would my data be surprising if the null hypothesis were true?" Those are very different questions.
Why 0.05? And what does "statistically significant" mean?
By convention, a p-value below 0.05 is called "statistically significant." This threshold was proposed by statistician Ronald Fisher in 1925 and has stuck around mostly through inertia. It means: if the null hypothesis were true, results this extreme would happen less than 5% of the time by chance.
Is 5% the right cutoff? Not necessarily. Physics uses 0.0000003 (five-sigma) before declaring a new particle discovered. Medical studies often use 0.01. Some fields use 0.10. The cutoff should depend on the stakes involved and how costly it would be to be wrong in either direction.
When a result clears the threshold and gets called statistically significant, it doesn't mean the effect is large, important, or definitely real. It means: if there were no effect, this outcome would be unlikely. With a large enough sample, even a tiny, practically meaningless effect becomes statistically significant.
What p-values cannot tell you
This is where people consistently go wrong, including in published research papers.
A p-value of 0.03 does not mean there is a 3% chance the null hypothesis is true. It means that if the null hypothesis were true, you'd see results this extreme 3% of the time. Subtle but critically different.
It also doesn't tell you the probability that your alternative hypothesis is true. It doesn't measure the size or importance of an effect. It doesn't indicate that a result will replicate. And it doesn't mean much on its own without knowing the sample size, the effect size, and whether this was one test or the 20th test in the same dataset.
Running 20 tests on random data and reporting only the one that shows p < 0.05 is called p-hacking, and it's a genuine problem in research. If you run enough tests, you'll eventually find significance by chance. The 5% threshold means you'll get a false positive 5% of the time even when nothing is happening.
How to use p-values properly
Report the actual p-value, not just whether it crossed a threshold. "p = 0.043" is more informative than "p < 0.05."
Always report effect sizes alongside p-values. A drug that reduces blood pressure by 0.1 mmHg might produce p = 0.001 in a 100,000-person trial. Statistically overwhelming. Clinically meaningless. Effect size tells you whether the result matters, not just whether it exists.
Pre-register your hypotheses before collecting data when possible. Deciding what you're looking for after seeing the results dramatically inflates false positive rates.
Treat a single significant p-value as interesting, not conclusive. Results that replicate across independent studies are evidence. Single studies are hints.
Practice Problems
Sources & Further Reading
The explanations on this page draw on the following established sources. We link to primary and secondary sources so you can verify claims and go deeper on any topic.