Type I and Type II Errors: A Researcher's Guide

Every hypothesis test can be wrong in two ways. A Type I error happens when you reject a null hypothesis that's actually true, declaring an effect that isn't there. A Type II error happens when you fail to reject a null hypothesis that's actually false, missing an effect that's really present. The first is a false positive. The second is a false negative. Understanding both, and the tradeoff between them, is what separates a researcher who runs tests from one who understands what the results mean.

This guide explains both error types in plain terms, shows how they connect to alpha, beta, and statistical power, walks through the tradeoff that links them, and works through what each error would mean in a real study. We'll also cover how sample size affects your chances of each error, and how to write about these risks the way committees and reviewers expect. The framework here builds directly on hypothesis testing setup and how to read a p-value, so it helps to be comfortable with both first.

Quick Answer

Type I error (false positive). Rejecting a true null hypothesis. You conclude there's an effect when there isn't one. The probability of a Type I error is alpha, the significance level you set, usually 0.05.

Type II error (false negative). Failing to reject a false null hypothesis. You miss a real effect. The probability of a Type II error is beta.

Statistical power. The probability of correctly detecting a real effect, equal to 1 minus beta. Higher power means a smaller chance of a Type II error.

The tradeoff. Lowering alpha to reduce false positives raises beta and increases false negatives, unless you increase the sample size. Larger samples reduce both error types at once.

The Two Ways a Test Can Be Wrong

A hypothesis test ends in one of two decisions: reject the null hypothesis, or fail to reject it. The truth, which you never observe directly, is also one of two states: the null is true, or it's false. Put those together and there are four possible outcomes. Two are correct. Two are errors.

When the null is true and you fail to reject it, you're right. When the null is false and you reject it, you're right again. The trouble comes in the other two cells. When the null is true but you reject it anyway, that's a Type I error. When the null is false but you fail to reject it, that's a Type II error. This table lays out all four.

Your decision	Null is actually true	Null is actually false
Fail to reject the null	Correct (true negative)	Type II error (false negative)
Reject the null	Type I error (false positive)	Correct (true positive)

The two correct outcomes need no defense. The two errors are the heart of the matter, because every study carries some risk of each, and the design choices you make determine how large those risks are.

Type I Error: The False Positive

A Type I error is a false alarm. You ran your test, the result looked significant, and you concluded there was an effect. But in reality the null hypothesis was true, and what you saw was just an unlucky sample. The effect you reported doesn't exist.

The probability of making a Type I error is alpha, the significance level you choose before running the test. When you set alpha at 0.05, you're accepting a 5% chance of a false positive whenever the null is actually true. That's the direct meaning of the threshold. Set alpha at 0.01 instead, and you cut the false-positive rate to 1%, but as we'll see, that comes at a cost.

Type I errors are the ones science worries about most visibly. A published finding that later fails to replicate is often a Type I error: a result that reached significance in one sample but reflected no real effect. This is why fields with high stakes, like drug approval, often demand stricter alpha levels. A false positive that puts an ineffective treatment on the market is a costly mistake.

Type II Error: The False Negative

A Type II error is a miss. The effect was real, but your test failed to detect it. You ran the analysis, the result didn't reach significance, and you failed to reject the null hypothesis even though it was false. The effect was there; your study just couldn't see it.

The probability of a Type II error is beta. Unlike alpha, you don't set beta directly. It depends on several things: the true size of the effect, your sample size, the variability in your data, and the alpha level you chose. Smaller effects, smaller samples, and noisier data all push beta up, making a miss more likely.

Type II errors get less attention than false positives, but they matter just as much. A real treatment effect that a study fails to detect can stall a promising line of research. An underpowered study that misses a true effect contributes a "no effect found" result to the literature, which can be just as misleading as a false positive. The difference is that false negatives are quieter, so they're easier to overlook.

Writing Up Your Results Section?

Describing error risks, power, and what your findings can and can't support is exactly where careful statistical writing matters. Editor World's editors hold advanced degrees and read this kind of analysis every day, so they catch an overstated claim or a missing power justification before your reviewer does. Start with a free sample edit of your first 300 words and work with an editor who knows your field.

Request a Free Sample Edit

Alpha, Beta, and Statistical Power

Three quantities describe the error structure of any hypothesis test. They're worth keeping straight, because reviewers expect you to know them.

Alpha (the significance level)

Alpha is the probability of a Type I error: the chance of rejecting a true null. You set it in advance, almost always at 0.05, sometimes at 0.01 for stricter fields. Alpha is entirely under your control, because it's just the threshold you choose for significance.

Beta (the Type II error rate)

Beta is the probability of a Type II error: the chance of failing to reject a false null. You don't set beta directly. It falls out of your study design, mainly the effect size, the sample size, and the variability in the data. A common target is to keep beta at or below 0.20.

Power (1 minus beta)

Statistical power is the probability of correctly detecting a real effect, which is 1 minus beta. If beta is 0.20, power is 0.80. By convention, a power of 0.80 is treated as the minimum acceptable level for a well-designed study, meaning an 80% chance of catching a true effect of the size you expect. Power analysis, done before data collection, tells you the sample size you need to reach that level.

These three quantities are linked. Alpha is the false-positive rate. Beta is the false-negative rate. Power is your ability to detect what's really there. Designing a study well means managing all three at once, not just picking an alpha and hoping for the best.

The Tradeoff Between the Two Errors

Here's the part that trips people up. For a fixed sample size, the two error types pull against each other. If you lower alpha to reduce the chance of a false positive, you make significance harder to reach, which raises beta and increases the chance of a false negative. Tighten the screws against Type I errors, and Type II errors slip up.

Think of alpha as the bar a result has to clear to count as significant. Raise the bar (lower alpha), and fewer false positives get through, but more true effects also fail to clear it. Lower the bar (higher alpha), and you catch more true effects, but more false positives sneak in too. With a fixed sample, you can't reduce both risks at once by adjusting alpha. You're trading one for the other.

The way out of the tradeoff is sample size. A larger sample reduces the variability of your estimate, which lets you detect smaller effects with the same alpha. That means you can hold alpha steady at 0.05 and still drive beta down by collecting more data. This is why power analysis matters: it tells you how large a sample you need to keep both error rates acceptably low at once. Increasing the sample is the only way to improve on both fronts simultaneously.

A Worked Example: Risk Tolerance by Gender

Let's ground both errors in the running example from this cluster. Fisher and Yao (2017) studied gender differences in financial risk tolerance. Suppose you're testing whether men and women differ in mean risk tolerance, with the null hypothesis that they don't.

Here's what each error would mean in that study.

Type I error (false positive). You conclude that men and women differ in risk tolerance, and you report a significant gender effect. But in truth there's no difference in the population. Your significant result came from an unlucky sample. Other researchers may build on a finding that isn't real.
Type II error (false negative). There really is a gender difference in the population, but your test comes back non-significant, so you conclude the data shows no difference. The effect existed; your study missed it, perhaps because the sample was too small to detect a modest gap.

Now suppose the true difference is small, say a third of a standard deviation. With a sample of 60 per group, your power to detect it might be only around 0.50, meaning a 50% chance of a Type II error: a coin flip on whether you catch the effect at all. Increase the sample to 200 per group, and power climbs well above 0.80, cutting the false-negative risk sharply. The alpha stayed at 0.05 the whole time. Only the sample size changed. That's the tradeoff resolved by data rather than by loosening the threshold.

This is also why a non-significant result from a small study tells you so little. A failure to reject the null could mean there's no effect, or it could mean there is one and the study was underpowered to find it. Without a power analysis, you can't tell which.

How to Reduce Each Type of Error

You have real levers over both error rates. The trick is knowing which lever moves which error, and at what cost.

To reduce Type I errors, lower alpha. Setting alpha at 0.01 instead of 0.05 cuts the false-positive rate, which is common in fields where a false alarm is especially costly. The cost is higher beta unless you also raise the sample size.
To reduce Type II errors, increase power. The cleanest way is a larger sample. You can also reduce measurement error, use a more sensitive design, or focus on detecting a meaningful minimum effect size.
To reduce both at once, increase the sample size. A bigger sample is the only single change that lowers both error rates together, because it shrinks the variability that drives both.
Run a power analysis before collecting data. A priori power analysis tells you the sample size needed to hit your target power (usually 0.80) for the smallest effect you care about. This is now expected in most dissertation proposals and grant applications.
Correct for multiple comparisons. Running many tests inflates the overall Type I error rate. Methods like the Bonferroni correction adjust alpha to keep the family-wide false-positive rate under control.

The single most common weakness flagged in methods sections is the absence of a power analysis. A study that reports a non-significant result without any account of its power leaves the reader unable to tell a true null from a missed effect. The standardized values behind these power calculations rest on the same normal distribution that underlies the rest of inferential statistics.

Which Error Is Worse?

There's no universal answer. It depends entirely on the consequences in your specific context, and a good researcher reasons about the costs rather than treating one error as always more serious.

In medical testing, a Type I error on a diagnostic test (telling a healthy person they're sick) leads to unnecessary worry and follow-up, while a Type II error (telling a sick person they're healthy) can be fatal. There, false negatives are often the graver risk. In the courtroom analogy, convicting an innocent person is a Type I error, and letting a guilty person go free is a Type II error; most legal systems are deliberately built to make the first kind rarer, even at the cost of more of the second.

In your own research, the honest move is to state which error is more costly for your question and design accordingly. If a missed effect would stall important treatment, prioritize power. If a false positive would send a field down a dead end, prioritize a strict alpha. Naming the tradeoff explicitly is a mark of careful methodology, and reviewers notice when it's missing.

Quick Self-Check Before You Submit

Run your methods and results sections against this checklist before a committee or reviewer sees them.

Did you state your alpha level explicitly, and justify it if it isn't the conventional 0.05?
Did you report a power analysis, ideally a priori, that justifies your sample size?
For any non-significant result, did you address whether the study had enough power to detect the effect you expected?
If you ran multiple tests, did you correct for the inflated Type I error rate?
Did you avoid treating a non-significant result as proof that no effect exists?
Where relevant, did you note which error type is more costly for your question?

If any of those is missing, it's worth addressing before submission, because power and error-rate reasoning are among the first things a careful reviewer checks. This is where a subject-matter editor adds real value, since the person reviewing your work needs to recognize an underpowered design or an overstated null result when they see one. Editor World lets you choose your own editor by field across dissertation editing and journal article editing, so the editor reading your methods understands the standards your discipline holds you to.

Frequently Asked Questions

What is the difference between a Type I and Type II error?

A Type I error is a false positive: you reject a null hypothesis that's actually true, concluding an effect exists when it doesn't. A Type II error is a false negative: you fail to reject a null hypothesis that's actually false, missing an effect that's really present. The probability of a Type I error is alpha, the significance level you set in advance. The probability of a Type II error is beta, which depends on the effect size, sample size, and variability in the data. Both are risks every hypothesis test carries.

What is alpha and what is beta?

Alpha is the probability of a Type I error, the chance of rejecting a true null hypothesis. You set it in advance, almost always at 0.05 and sometimes at 0.01. Beta is the probability of a Type II error, the chance of failing to reject a false null. Unlike alpha, you don't set beta directly. It depends on the true effect size, the sample size, the variability in the data, and the alpha level you chose. A common goal is to keep beta at or below 0.20.

What is statistical power?

Statistical power is the probability of correctly detecting a real effect, equal to 1 minus beta. If beta is 0.20, power is 0.80. By convention, a power of 0.80 is treated as the minimum acceptable level for a well-designed study, meaning an 80% chance of detecting a true effect of the size you expect. A power analysis done before data collection tells you the sample size required to reach your target power, and it's now expected in most dissertation proposals and grant applications.

Why is there a tradeoff between Type I and Type II errors?

For a fixed sample size, lowering alpha to reduce false positives makes significance harder to reach, which raises beta and increases the chance of a false negative. Alpha acts as the bar a result has to clear to count as significant. Raising that bar lets fewer false positives through but also causes more true effects to be missed. The two error rates pull against each other when the sample is held constant. The only way to reduce both at once is to increase the sample size, which lowers the variability that drives both errors.

How does sample size affect Type I and Type II errors?

Sample size has no direct effect on the Type I error rate, because alpha is set by the researcher regardless of sample size. It has a strong effect on the Type II error rate. A larger sample reduces the variability of the estimate, which increases power and lowers beta, making a real effect easier to detect. That's why increasing the sample is the only single change that improves both error rates together: it lets alpha stay fixed while beta falls.

Which error is worse, Type I or Type II?

Neither is universally worse. It depends on the consequences in the specific context. In medical diagnosis, a Type II error that tells a sick person they're healthy can be fatal, so false negatives are often the graver risk. In drug approval, a Type I error that puts an ineffective treatment on the market is especially costly, so false positives are guarded against with strict alpha levels. A careful researcher states which error is more costly for the question and designs the study accordingly, prioritizing either a strict alpha or high power as appropriate.

Page last reviewed: June 2026. Content reviewed and edited by the Editor World editorial team. Editor World, founded in 2010 by Patti Fisher, PhD, provides professional human-only editing, proofreading, and writing services for graduate students, academics, and researchers worldwide. 100% human editing, no AI at any stage. BBB A+ accredited since 2010 with 5.0 / 5 Google Reviews and 5.0 / 5 Facebook Reviews. More than 100 million words edited for over 8,000 clients in 65+ countries. Recommended by the Boston University Economics Department.