Test Pyramid + 完整測試類型全圖

「Smoke 跟 sanity 差在哪？」「Regression 跟 acceptance 是同個東西嗎？」是 QA 面試最常被搞混的問題。這篇用一張圖把所有測試類型講清楚 — 該寫多少、什麼時候跑、誰寫、為什麼。

經典 Test Pyramid（Mike Cohn）

flowchart TB
    subgraph Pyramid
    E2E["🔺 E2E Tests<br>少 - 5%<br>慢 - 1-30 min<br>跨整個 stack"]
    Int["▱ Integration Tests<br>中 - 25%<br>中 - 5-30s<br>多元件互動"]
    Unit["⬛ Unit Tests<br>多 - 70%<br>快 - <100ms<br>純函式邏輯"]
    end

    E2E --> Int
    Int --> Unit

    style E2E fill:#ef4444,color:#fff
    style Int fill:#f59e0b,color:#fff
    style Unit fill:#10b981,color:#fff

層	比例	速度	抓什麼	寫的人
Unit	70%	< 100ms	邏輯錯	Dev
Integration	25%	5-30s	元件契約	Dev + QA
E2E	5%	1-30min	使用者流程	QA

為什麼是金字塔形狀

底層多：unit test 多、跑快、抓 bug 早、修便宜
上層少：E2E 慢、貴、flaky 風險高、只測 critical path

反金字塔（多 E2E 少 unit）= 開發地獄。

Smoke / Sanity / Regression / Acceptance 一次看懂

flowchart LR
    Build[Code 進來] --> Smoke[Smoke Test<br>還能跑嗎?]
    Smoke -->|過| Sanity[Sanity Test<br>剛改的 OK 嗎?]
    Sanity -->|過| Reg[Regression<br>舊的沒壞?]
    Reg -->|過| Accept[Acceptance<br>使用者要的有嗎?]
    Accept -->|過| Release[發版]

    Smoke -->|fail| Stop1[Build 不能用]
    Sanity -->|fail| Stop2[改壞了]
    Reg -->|fail| Stop3[影響舊功能]
    Accept -->|fail| Stop4[不符需求]

    style Smoke fill:#06b6d4,color:#fff
    style Sanity fill:#a855f7,color:#fff
    style Reg fill:#10b981,color:#fff
    style Accept fill:#f59e0b,color:#fff

類型	跑什麼	何時	多久
Smoke	最基本可用性（登入 / 首頁 / 結帳 happy path）	每次 build 進	5-15 min
Sanity	剛改的功能 + 直接相關	改某 module 後	15-30 min
Regression	所有舊功能	發版前	1-4 hr
Acceptance	User Story / Acceptance Criteria	Sprint end	30 min - 2 hr

區分 Smoke vs Sanity

最容易搞混：

Smoke：「This build runs」（廣度淺、確認基本可用）
Sanity：「This change works」（深度但窄、確認改的對）

例子：

Dev 改了「結帳」功能 → push code

Smoke: 開首頁、登入、加購物車、結帳 — happy path 完整跑
Sanity: 結帳 form 各欄位驗證、信用卡輸入、優惠券、發票

現代變形：Testing Trophy

flowchart TB
    subgraph Trophy
    E2E2["🔺 E2E<br>少"]
    Int2["▱▱▱ Integration<br>多 - 主力"]
    Unit2["▱ Unit<br>中"]
    Static["⬜ Static<br>底 - linter/type"]
    end

    style E2E2 fill:#ef4444,color:#fff
    style Int2 fill:#10b981,color:#fff
    style Unit2 fill:#06b6d4,color:#fff
    style Static fill:#a855f7,color:#fff

Kent C. Dodds 提的、適合前端 React 應用：

Static（TypeScript / ESLint）佔底
Integration（React Testing Library）變主力
Unit 變少（純函式還是寫）
E2E 還是少

理由：前端的「unit」很難純粹、integration 更有 ROI。

現代變形：Testing Honeycomb

flowchart TB
    subgraph Honeycomb
    UI["◇ UI tests<br>少"]
    Implementation["◇◇◇ Integration tests<br>主力"]
    Detail["◇ Implementation detail tests<br>少"]
    end

    style UI fill:#ef4444,color:#fff
    style Implementation fill:#10b981,color:#fff
    style Detail fill:#06b6d4,color:#fff

Spotify 提的、適合微服務：

大量 integration test（服務間互動）
少量 unit（純邏輯）
少量 E2E（critical flow）

重點：金字塔不是唯一答案、看你的 stack 與痛點。

8 種測試「型」一次看懂

mindmap
  root((測試類型))
    Functional 功能
      Unit
      Integration
      E2E
      System
      Acceptance
    Non-functional 非功能
      Performance
      Security
      Accessibility
      Usability
      Compatibility
    時機
      Smoke
      Sanity
      Regression
      Confirmation
    技法
      Black box
      White box
      Gray box
    階段
      Static analysis
      Dynamic
      Manual
      Automation
    Exploratory
      Session-based
      Charter-driven
      Pair testing
    Production
      Canary
      A/B
      Feature flag
      Synthetic monitoring
    特殊
      Chaos
      Mutation
      Property-based
      Snapshot / Visual

功能測試（Functional）

Unit Test

# 純函式、無外部依賴
def test_calculate_discount():
    assert calculate_discount(100, 0.1) == 90

速度：最快（毫秒級）
範圍：單一函式
Mock 外部依賴

Integration Test

# 多個元件互動
def test_create_order_via_api(db, mock_payment):
    resp = client.post('/orders', json={...})
    assert resp.status_code == 201
    assert Order.objects.count() == 1
    mock_payment.assert_called_once()

速度：中（秒級）
範圍：多元件
DB / Redis / API 真連、外部第三方 mock

E2E (System Test)

// 跨整個 stack
test('user can checkout', async ({ page }) => {
  await page.goto('/');
  await page.click('text=加入購物車');
  // ... 完整流程
  await expect(page).toHaveURL('/order-success');
});

速度：慢（分鐘級）
範圍：整個系統
全程真實

Acceptance Test

驗證 business 是否接受。

User Story: 使用者買 3 件以上 9 折
Acceptance Criteria:
  - 2 件無折扣
  - 3 件 9 折
  - 5 件 9 折（不疊加）
  - 跨類別也算

Confirmation Test

Bug 修了後確認真的修好。跟 regression 不同 — confirmation 是針對該 bug。

非功能測試（Non-functional）

Performance Test

延伸閱讀：Performance Testing 入門

子類型： - Load Test（正常量） - Stress Test（超量找崩潰點） - Spike Test（突發爆量） - Soak Test（長時間找 memory leak）

Security Test

Usability Test

真實使用者操作
觀察 friction 點
A/B test 變體
User journey map

Compatibility Test

跨瀏覽器（Chrome / Firefox / Safari / Edge）
跨裝置（iOS / Android / Windows / Mac）
跨螢幕尺寸
跨網路（4G / WiFi / 慢速）
跨語系 / 地區

技法（Technique）

flowchart LR
    Tech[測試技法] --> BB[Black Box<br>不看 code]
    Tech --> WB[White Box<br>看 code]
    Tech --> GB[Gray Box<br>知道一些]

    BB --> BB1[等價分割]
    BB --> BB2[邊界值]
    BB --> BB3[決策表]
    BB --> BB4[狀態轉換]

    WB --> WB1[Statement coverage]
    WB --> WB2[Branch coverage]
    WB --> WB3[Path coverage]

    GB --> GB1[API + DB 知道結構]
    GB --> GB2[Detox 灰盒]

    style BB fill:#06b6d4,color:#fff
    style WB fill:#10b981,color:#fff
    style GB fill:#a855f7,color:#fff

Shift-Left vs Shift-Right

flowchart LR
    Old[傳統] --> O1[Dev 寫] --> O2[QA 測] --> O3[發版] --> O4[Prod]

    SL[Shift-Left] --> SL1[Spec review] --> SL2[Dev + Test 一起寫] --> SL3[CI 早抓]

    SR[Shift-Right] --> SR1[Canary deploy] --> SR2[A/B test] --> SR3[Production monitoring]

    style Old fill:#9ca3af,color:#fff
    style SL fill:#06b6d4,color:#fff
    style SR fill:#a855f7,color:#fff

Shift-Left（往左 — 早抓）

Spec review
TDD / BDD
Pair programming
Pre-commit hooks
早期 unit test

抓 bug 在 dev 階段。

Shift-Right（往右 — Prod 測）

Canary release（1% 流量試）
Blue / Green
Feature flags
A/B testing
Real-user monitoring
Synthetic monitoring

真實環境抓 dev 抓不到的事。

現代 QA：兩個都做。

Production Testing

flowchart TD
    Prod[Production Testing] --> P1[Synthetic monitoring<br>定時跑 happy path]
    Prod --> P2[Canary deploy<br>1% 流量先試]
    Prod --> P3[Feature flag<br>內部先開]
    Prod --> P4[A/B test<br>對照組]
    Prod --> P5[Chaos engineering<br>故意搞破壞]
    Prod --> P6[Real-user monitoring<br>看真實 metric]

    style Prod fill:#ef4444,color:#fff

Netflix 等大公司：testing in production 是日常。

特殊類型

Mutation Testing

測試你的 test 好不好。

工具改你的 code（mutate）、看 test 抓不抓得到。

// 原 code
function isAdult(age) { return age >= 18; }

// Mutant 1
function isAdult(age) { return age > 18; }  // 改 >= 為 >

// 如果你的 test 還是 pass → test 不夠好
// 工具: Stryker (JS), PIT (Java), mutmut (Python)

Property-based Testing

不寫具體 case、寫「規律」。

from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_reverse_twice_is_identity(lst):
    assert reverse(reverse(lst)) == lst

Hypothesis 會自動產 100 個 case 試。

Visual Regression

比對 screenshot 差異。

await expect(page).toHaveScreenshot('homepage.png');

工具：Playwright snapshot / Percy / Applitools / Chromatic

Snapshot Testing

把 component 渲染結果存檔、改了就 diff。

// Jest
expect(render(<Button />).toJSON()).toMatchSnapshot();

Chaos Engineering

故意搞破壞、看系統撐不撐。

隨機 kill pod
切斷網路
延遲增加
DB 過載

工具：Chaos Monkey / Litmus / Chaos Mesh

該怎麼選

flowchart TD
    Q[要測什麼?] --> Q1{業務邏輯<br>對嗎?}
    Q --> Q2{多元件<br>協作?}
    Q --> Q3{使用者<br>體驗?}
    Q --> Q4{快不快?}
    Q --> Q5{安全嗎?}

    Q1 --> Unit
    Q2 --> Integration
    Q3 --> E2E
    Q4 --> Perf[Performance]
    Q5 --> Sec[Security]

    style Unit fill:#10b981,color:#fff
    style Integration fill:#f59e0b,color:#fff
    style E2E fill:#ef4444,color:#fff
    style Perf fill:#a855f7,color:#fff
    style Sec fill:#06b6d4,color:#fff

QA 在每層的角色

Unit: Dev 寫。QA 看 coverage、推動寫。
Integration: Dev + QA 一起。QA review test scenario。
E2E: QA 主力。Dev 偶爾協助。
Smoke: 自動化、QA 設計、CI 跑。
Regression: QA 維護 + 自動化、Release 跑。
Acceptance: QA 主導、PM 簽核。
Performance / Security / A11y: 特定 QA 或 specialist。
Exploratory: QA 個人能力。

反模式

flowchart TD
    Anti[Pyramid 反模式] --> A1["反金字塔：100 個 E2E 0 個 unit"]
    Anti --> A2["E2E 占 50%：跑半天"]
    Anti --> A3["只看 coverage % 數字"]
    Anti --> A4["smoke 寫太細：變 regression"]
    Anti --> A5["regression 不維護：越跑越爛"]
    Anti --> A6["不分 unit/integration、混在一起"]
    Anti --> A7["完全沒 shift-right"]

    style A1 fill:#ef4444,color:#fff
    style A2 fill:#ef4444,color:#fff
    style A3 fill:#ef4444,color:#fff
    style A4 fill:#ef4444,color:#fff
    style A5 fill:#ef4444,color:#fff
    style A6 fill:#ef4444,color:#fff
    style A7 fill:#ef4444,color:#fff

面試考題

Q: 解釋 Test Pyramid

好答案：

由下而上 unit（最多）、integration、E2E（最少）。底層多因為快、便宜、抓邏輯 bug；上層少因為慢、貴、易 flaky。比例約 70/25/5。但這是經典 — 現代有變形如 Testing Trophy（前端）跟 Honeycomb（微服務）— 重點是依 stack 跟痛點選比例。

Q: Smoke 跟 sanity 差在哪

好答案：

Smoke 是「廣度淺」確認 build 還能用；sanity 是「深度窄」確認剛改的功能對。Smoke 每次 build 跑、sanity 改某模組後跑。

Q: 你怎麼決定要寫 unit 還是 E2E

好答案：

看抓的 bug 類型。純邏輯 / 純函式 → unit；多元件契約 / 服務間 → integration；使用者完整流程 → E2E。原則：能在底層抓就在底層抓、E2E 留 critical path。

給 QA 的 5 句

Pyramid 不是 dogma、是 starting point
每個類型該寫多少看你的 stack + 痛點
Smoke vs Sanity vs Regression 分清楚是 process 健康指標
Shift-left + Shift-right 兩個都要
比起背名詞、設計合適你 team 的測試組合更重要

最後

懂測試類型的 framework 不是炫術語、是幫你決定該寫什麼、跑什麼、跳過什麼。team 沒 unit test → 推 dev 開始寫；E2E 過多 → 改成 integration；regression 越跑越慢 → 分 smoke + 主力。理論工具拿來解現實問題、才有價值。

Test Pyramid + 完整測試類型全圖 — Unit / Integration / E2E / Smoke / Sanity / Regression 一次說清

Test Pyramid + 完整測試類型全圖

經典 Test Pyramid（Mike Cohn）

為什麼是金字塔形狀

Smoke / Sanity / Regression / Acceptance 一次看懂

區分 Smoke vs Sanity

現代變形：Testing Trophy

現代變形：Testing Honeycomb

8 種測試「型」一次看懂

功能測試（Functional）

Unit Test

Integration Test

E2E (System Test)

Acceptance Test

Confirmation Test

非功能測試（Non-functional）

Performance Test

Security Test

Accessibility (a11y)

Usability Test

Compatibility Test

技法（Technique）

Shift-Left vs Shift-Right

Shift-Left（往左 — 早抓）

Shift-Right（往右 — Prod 測）

Production Testing

特殊類型

Mutation Testing

Property-based Testing

Visual Regression

Snapshot Testing

Chaos Engineering

該怎麼選

QA 在每層的角色

反模式

面試考題

Q: 解釋 Test Pyramid

Q: Smoke 跟 sanity 差在哪

Q: 你怎麼決定要寫 unit 還是 E2E

給 QA 的 5 句

最後

📚 相關文章