Error Handling Patterns — Retry, Fallback & Alert

Your workflow will break. Not maybe — it WILL. APIs go down, rate limits get hit, data arrives in unexpected formats, network connections drop. The difference between an amateur automation and a production system is how it handles failure. This lesson teaches you the three pillars of error handling in n8n: retry, fallback, and alert.

Why Error Handling Matters

code

WITHOUT error handling:
Workflow runs → API returns 500 → Workflow stops → Data lost
→ You don't know it failed until client complains 3 days later

WITH error handling:
Workflow runs → API returns 500 → Retry 3 times → Still failing
→ Fallback to alternative service → Alert you on Slack
→ Data preserved → Client never notices

The Three Pillars

code

┌─────────────────────────────────────────┐
│           ERROR HANDLING                 │
│                                         │
│  1. RETRY    → Try again (transient)    │
│  2. FALLBACK → Use alternative (down)   │
│  3. ALERT    → Notify human (critical)  │
│                                         │
│  Order: Retry first → Fallback second   │
│         → Alert always                  │
└─────────────────────────────────────────┘

Pillar 1: Retry

When to Retry

Error Type	Should Retry?	Why
500 Server Error	Yes	Server is temporarily overloaded
429 Rate Limited	Yes (with delay)	Wait for rate limit window to reset
Timeout	Yes	Network blip or slow response
Connection Refused	Yes (limited)	Server may be restarting
400 Bad Request	No	Your request is wrong — retrying won't fix it
401 Unauthorized	No	Your credentials are wrong
404 Not Found	No	Resource doesn't exist

n8n Built-In Retry

code

Any node → Settings (gear icon):
  ☑ Retry On Fail: Yes
  Max Tries: 3
  Wait Between Tries (ms): 2000

This retries the same node up to 3 times with 2-second gaps.
Works for: HTTP Request, Email, any external API call.

Exponential Backoff (Advanced)

For rate-limited APIs, constant retry intervals aren't smart. Use exponential backoff:

code

Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds

This prevents hammering a rate-limited API.

Implementation in n8n (Function node before retry):

const attempt = $json.retry_count || 0;
const waitMs = Math.pow(2, attempt) * 1000;  // 1s, 2s, 4s, 8s

// Use the Wait node with dynamic wait time
return [{
  json: {
    ...$json,
    retry_count: attempt + 1,
    wait_seconds: waitMs / 1000
  }
}];

Retry with Circuit Breaker

If an API keeps failing, stop hammering it:

code

[HTTP Request] → Error?
    │
    ├── Attempt 1 failed → Wait 2s → Retry
    ├── Attempt 2 failed → Wait 4s → Retry
    ├── Attempt 3 failed → STOP RETRYING
    │                       │
    │                       ▼
    │                [Circuit Open: Skip this API for 5 minutes]
    │                       │
    │                       ▼
    │                [Fallback or Alert]
    │
    └── Success → Continue workflow

Pillar 2: Fallback

Fallback Patterns

Pattern A: Alternative Service

code

Primary: Send email via SMTP
Fallback: Send email via Gmail API
Fallback 2: Send email via SendGrid

[SMTP Node] → Error?
    │
    ├── Success → Continue
    │
    └── Failed → [Gmail Node] → Error?
                     │
                     ├── Success → Continue
                     │
                     └── Failed → [SendGrid HTTP Request] → Error?
                                      │
                                      ├── Success → Continue
                                      └── Failed → [Alert: All email services down]

Pattern B: Degraded Output

code

Primary: Get full enrichment (Hunter + Wappalyzer + WHOIS)
Fallback: Return partial data (whatever succeeded)

[HTTP: Hunter API] → Error? → Set hunter_data = null
[HTTP: Wappalyzer] → Error? → Set wappalyzer_data = null
[HTTP: WHOIS] → Error? → Set whois_data = null

[Merge all results]
[IF: All null → Alert]
[ELSE: Continue with partial data + note what's missing]

Pattern C: Queue for Later

code

Primary: Process immediately
Fallback: Save to retry queue, process later

[HTTP Request] → Error?
    │
    ├── Success → Process normally
    │
    └── Failed → [Google Sheets: Add to "Retry Queue" sheet]
                  Columns: payload, error_message, timestamp, retry_count

[Separate workflow: Schedule Trigger every 30 min]
    → Read "Retry Queue" sheet
    → Retry each failed item
    → If success: Remove from queue
    → If still failing: Increment retry_count
    → If retry_count > 5: Move to "Dead Letter" sheet + Alert

Implementing Fallback in n8n

code

n8n Error Workflow Pattern:

1. On the node that might fail:
   Settings → On Error → "Continue" (don't stop the workflow)

2. After the node, add an IF node:
   Condition: {{$json.error}} exists OR {{$node["HTTP Request"].error}} exists

3. Route:
   Error path → Fallback nodes
   Success path → Normal processing

Pillar 3: Alert

What to Alert On

Severity	When	Alert Channel	Response Time
Critical	All retries + fallbacks failed, data loss risk	WhatsApp + Slack + Email	Immediate
Warning	Fallback activated, degraded but working	Slack + Email	Within 1 hour
Info	Retry succeeded, temporary blip	Slack only	No action needed

Alert Message Template

code

n8n WORKFLOW ALERT

Severity: 🔴 CRITICAL / 🟡 WARNING / 🔵 INFO
Workflow: [Workflow Name]
Node: [Node that failed]
Error: [Error message]
Time: [Timestamp]
Attempts: [X retries exhausted]
Fallback: [Activated / Not available]
Data affected: [X items / specific item ID]
Action needed: [What the human should do]

Link to execution: [n8n execution URL]

Slack Alert Node Configuration

code

[Slack Node]
    Channel: #automation-alerts
    Message:
      🔴 *WORKFLOW FAILURE*
      *Workflow:* {{$workflow.name}}
      *Node:* {{$node["HTTP Request"].name}}
      *Error:* {{$json.error.message}}
      *Time:* {{$now.format('YYYY-MM-DD HH:mm:ss')}}
      *Action:* Check the API status and retry manually if needed
      *Execution:* {{$execution.url}}

Email Alert (If No Slack)

code

[Gmail Node]
    To: your-email@gmail.com
    Subject: [ALERT] n8n Workflow Failed — {{$workflow.name}}
    Body:
      Workflow "{{$workflow.name}}" failed at node "{{$json.failed_node}}".

      Error: {{$json.error_message}}
      Time: {{$now.toISO()}}
      Items affected: {{$json.items_count}}

      Please check the n8n dashboard:
      {{$execution.url}}

Complete Error Handling Template

Here's a production-ready pattern combining all three pillars:

code

[Trigger] → [Process Data] → [HTTP Request: External API]
                                    │
                              ┌─────┴─────┐
                              │ Settings:  │
                              │ On Error:  │
                              │ Continue   │
                              │ Retry: 3   │
                              │ Wait: 2000 │
                              └─────┬─────┘
                                    │
                              [IF: Error?]
                              │           │
                         NO (Success)   YES (All retries failed)
                              │           │
                         [Continue]    [Fallback: Alternative API]
                                          │
                                    [IF: Fallback Error?]
                                    │              │
                               NO (OK)        YES (Both failed)
                                    │              │
                              [Continue      [Queue for later]
                               + Warning]         │
                                              [CRITICAL Alert:
                                               Slack + Email]

Practice Lab

Task 1: Build a Retry Workflow Create a workflow that calls an HTTP endpoint. Simulate failure by using a URL that doesn't exist. Configure retry with 3 attempts and 2-second wait. After all retries fail, send yourself an email alert with the error details.

Task 2: Fallback Chain Build a workflow that tries to send an email via SMTP. If SMTP fails (simulate by using wrong credentials), fall back to Gmail API. If that also fails, log the unsent email to Google Sheets with all details for manual sending later.

Task 3: Complete Error Handling System Create a "canary" workflow that runs every hour, checks if 3 critical APIs are responding (Google Sheets, Gmail, and any other API you use). If any fail, send a Slack/email alert. This is your early warning system.

Pakistan Case Study

Meet Arslan — runs automation services for 6 small businesses in Karachi.

His nightmare scenario (before error handling):

Client's Shopify order sync workflow broke silently on a Friday night
API credentials had expired — every call returned 401
47 orders came in over the weekend — none were synced
Client discovered on Monday that no WhatsApp confirmations were sent
12 customers had called the client's helpline confused about their orders
Arslan spent 6 hours manually processing the backlog
Client demanded a discount on his retainer

His error handling overhaul:

Workflow 1 (Order Sync): Added 3-retry + queue fallback + WhatsApp alert

If Shopify API fails → retry 3x → queue to "retry sheet" → alert Arslan
If WhatsApp API fails → retry 3x → fallback to SMS → fallback to email → alert

Workflow 2 (Canary Monitor): Runs every 30 minutes

Pings all critical APIs (Shopify, WATI, Google Sheets, Stripe)
If any fail → immediate WhatsApp alert to Arslan
Daily summary: "All 4 APIs healthy" or "WATI had 2 blips at 3 PM"

Workflow 3 (Dead Letter Queue): Runs every hour

Checks "retry sheet" for items that failed earlier
Re-attempts each item
If 5 retries exhausted → moves to "dead letter" sheet → alerts Arslan

Results after implementing error handling:

Undetected failures: 3-4/month → 0
Average detection time: 2-3 days → 2 minutes
Client complaints about automation: 2/month → 0 in 6 months
Manual intervention needed: 5 hours/week → 20 minutes/week
Arslan's reputation: "The automation guy whose stuff just works"
Raised his retainer by 25% after demonstrating the reliability improvement

His rule: "Every workflow I build now has retry + fallback + alert before I show it to any client. It takes 15 extra minutes to add, and saves hours of firefighting."

Key Takeaways

Every production workflow MUST have error handling — failure is guaranteed
The three pillars: Retry (try again), Fallback (use alternative), Alert (notify human)
Retry works for transient errors (500, 429, timeout) — NOT for 400/401/404
Use exponential backoff for rate-limited APIs (1s, 2s, 4s, 8s intervals)
Fallback patterns: alternative service, degraded output, or queue for later
Alert severity levels: Critical (immediate), Warning (1 hour), Info (no action)
Include execution URL in alerts so you can jump straight to the failed workflow
A "canary" monitoring workflow catches problems before they affect clients
Dead letter queues prevent permanent data loss from repeated failures
Error handling takes 15 minutes to add but saves hours of incident response

Next lesson: Workflow monitoring with Slack and email notifications — building a real-time dashboard for all your automations.

7.1 — Error Handling Patterns — Retry, Fallback & Alert

Error Handling Patterns — Retry, Fallback & Alert

Why Error Handling Matters

The Three Pillars

Pillar 1: Retry

When to Retry

n8n Built-In Retry

Exponential Backoff (Advanced)

Retry with Circuit Breaker

Pillar 2: Fallback

Fallback Patterns

Implementing Fallback in n8n

Pillar 3: Alert

What to Alert On

Alert Message Template

Slack Alert Node Configuration

Email Alert (If No Slack)

Complete Error Handling Template

Practice Lab

Pakistan Case Study

Key Takeaways

Lesson Summary

Quiz: Error Handling Patterns — Retry, Fallback & Alert