7.1 — Error Handling Patterns — Retry, Fallback & Alert
Error Handling Patterns — Retry, Fallback & Alert
Your workflow will break. Not maybe — it WILL. APIs go down, rate limits get hit, data arrives in unexpected formats, network connections drop. The difference between an amateur automation and a production system is how it handles failure. This lesson teaches you the three pillars of error handling in n8n: retry, fallback, and alert.
Why Error Handling Matters
WITHOUT error handling:
Workflow runs → API returns 500 → Workflow stops → Data lost
→ You don't know it failed until client complains 3 days later
WITH error handling:
Workflow runs → API returns 500 → Retry 3 times → Still failing
→ Fallback to alternative service → Alert you on Slack
→ Data preserved → Client never notices
The Three Pillars
┌─────────────────────────────────────────┐
│ ERROR HANDLING │
│ │
│ 1. RETRY → Try again (transient) │
│ 2. FALLBACK → Use alternative (down) │
│ 3. ALERT → Notify human (critical) │
│ │
│ Order: Retry first → Fallback second │
│ → Alert always │
└─────────────────────────────────────────┘
Pillar 1: Retry
When to Retry
| Error Type | Should Retry? | Why |
|---|---|---|
| 500 Server Error | Yes | Server is temporarily overloaded |
| 429 Rate Limited | Yes (with delay) | Wait for rate limit window to reset |
| Timeout | Yes | Network blip or slow response |
| Connection Refused | Yes (limited) | Server may be restarting |
| 400 Bad Request | No | Your request is wrong — retrying won't fix it |
| 401 Unauthorized | No | Your credentials are wrong |
| 404 Not Found | No | Resource doesn't exist |
n8n Built-In Retry
Any node → Settings (gear icon):
☑ Retry On Fail: Yes
Max Tries: 3
Wait Between Tries (ms): 2000
This retries the same node up to 3 times with 2-second gaps.
Works for: HTTP Request, Email, any external API call.
Exponential Backoff (Advanced)
For rate-limited APIs, constant retry intervals aren't smart. Use exponential backoff:
Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
This prevents hammering a rate-limited API.
Implementation in n8n (Function node before retry):
const attempt = $json.retry_count || 0;
const waitMs = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s, 8s
// Use the Wait node with dynamic wait time
return [{
json: {
...$json,
retry_count: attempt + 1,
wait_seconds: waitMs / 1000
}
}];
Retry with Circuit Breaker
If an API keeps failing, stop hammering it:
[HTTP Request] → Error?
│
├── Attempt 1 failed → Wait 2s → Retry
├── Attempt 2 failed → Wait 4s → Retry
├── Attempt 3 failed → STOP RETRYING
│ │
│ ▼
│ [Circuit Open: Skip this API for 5 minutes]
│ │
│ ▼
│ [Fallback or Alert]
│
└── Success → Continue workflow
Pillar 2: Fallback
Fallback Patterns
Pattern A: Alternative Service
Primary: Send email via SMTP
Fallback: Send email via Gmail API
Fallback 2: Send email via SendGrid
[SMTP Node] → Error?
│
├── Success → Continue
│
└── Failed → [Gmail Node] → Error?
│
├── Success → Continue
│
└── Failed → [SendGrid HTTP Request] → Error?
│
├── Success → Continue
└── Failed → [Alert: All email services down]
Pattern B: Degraded Output
Primary: Get full enrichment (Hunter + Wappalyzer + WHOIS)
Fallback: Return partial data (whatever succeeded)
[HTTP: Hunter API] → Error? → Set hunter_data = null
[HTTP: Wappalyzer] → Error? → Set wappalyzer_data = null
[HTTP: WHOIS] → Error? → Set whois_data = null
[Merge all results]
[IF: All null → Alert]
[ELSE: Continue with partial data + note what's missing]
Pattern C: Queue for Later
Primary: Process immediately
Fallback: Save to retry queue, process later
[HTTP Request] → Error?
│
├── Success → Process normally
│
└── Failed → [Google Sheets: Add to "Retry Queue" sheet]
Columns: payload, error_message, timestamp, retry_count
[Separate workflow: Schedule Trigger every 30 min]
→ Read "Retry Queue" sheet
→ Retry each failed item
→ If success: Remove from queue
→ If still failing: Increment retry_count
→ If retry_count > 5: Move to "Dead Letter" sheet + Alert
Implementing Fallback in n8n
n8n Error Workflow Pattern:
1. On the node that might fail:
Settings → On Error → "Continue" (don't stop the workflow)
2. After the node, add an IF node:
Condition: {{$json.error}} exists OR {{$node["HTTP Request"].error}} exists
3. Route:
Error path → Fallback nodes
Success path → Normal processing
Pillar 3: Alert
What to Alert On
| Severity | When | Alert Channel | Response Time |
|---|---|---|---|
| Critical | All retries + fallbacks failed, data loss risk | WhatsApp + Slack + Email | Immediate |
| Warning | Fallback activated, degraded but working | Slack + Email | Within 1 hour |
| Info | Retry succeeded, temporary blip | Slack only | No action needed |
Alert Message Template
n8n WORKFLOW ALERT
Severity: 🔴 CRITICAL / 🟡 WARNING / 🔵 INFO
Workflow: [Workflow Name]
Node: [Node that failed]
Error: [Error message]
Time: [Timestamp]
Attempts: [X retries exhausted]
Fallback: [Activated / Not available]
Data affected: [X items / specific item ID]
Action needed: [What the human should do]
Link to execution: [n8n execution URL]
Slack Alert Node Configuration
[Slack Node]
Channel: #automation-alerts
Message:
🔴 *WORKFLOW FAILURE*
*Workflow:* {{$workflow.name}}
*Node:* {{$node["HTTP Request"].name}}
*Error:* {{$json.error.message}}
*Time:* {{$now.format('YYYY-MM-DD HH:mm:ss')}}
*Action:* Check the API status and retry manually if needed
*Execution:* {{$execution.url}}
Email Alert (If No Slack)
[Gmail Node]
To: your-email@gmail.com
Subject: [ALERT] n8n Workflow Failed — {{$workflow.name}}
Body:
Workflow "{{$workflow.name}}" failed at node "{{$json.failed_node}}".
Error: {{$json.error_message}}
Time: {{$now.toISO()}}
Items affected: {{$json.items_count}}
Please check the n8n dashboard:
{{$execution.url}}
Complete Error Handling Template
Here's a production-ready pattern combining all three pillars:
[Trigger] → [Process Data] → [HTTP Request: External API]
│
┌─────┴─────┐
│ Settings: │
│ On Error: │
│ Continue │
│ Retry: 3 │
│ Wait: 2000 │
└─────┬─────┘
│
[IF: Error?]
│ │
NO (Success) YES (All retries failed)
│ │
[Continue] [Fallback: Alternative API]
│
[IF: Fallback Error?]
│ │
NO (OK) YES (Both failed)
│ │
[Continue [Queue for later]
+ Warning] │
[CRITICAL Alert:
Slack + Email]
Practice Lab
Task 1: Build a Retry Workflow Create a workflow that calls an HTTP endpoint. Simulate failure by using a URL that doesn't exist. Configure retry with 3 attempts and 2-second wait. After all retries fail, send yourself an email alert with the error details.
Task 2: Fallback Chain Build a workflow that tries to send an email via SMTP. If SMTP fails (simulate by using wrong credentials), fall back to Gmail API. If that also fails, log the unsent email to Google Sheets with all details for manual sending later.
Task 3: Complete Error Handling System Create a "canary" workflow that runs every hour, checks if 3 critical APIs are responding (Google Sheets, Gmail, and any other API you use). If any fail, send a Slack/email alert. This is your early warning system.
Pakistan Case Study
Meet Arslan — runs automation services for 6 small businesses in Karachi.
His nightmare scenario (before error handling):
- Client's Shopify order sync workflow broke silently on a Friday night
- API credentials had expired — every call returned 401
- 47 orders came in over the weekend — none were synced
- Client discovered on Monday that no WhatsApp confirmations were sent
- 12 customers had called the client's helpline confused about their orders
- Arslan spent 6 hours manually processing the backlog
- Client demanded a discount on his retainer
His error handling overhaul:
Workflow 1 (Order Sync): Added 3-retry + queue fallback + WhatsApp alert
- If Shopify API fails → retry 3x → queue to "retry sheet" → alert Arslan
- If WhatsApp API fails → retry 3x → fallback to SMS → fallback to email → alert
Workflow 2 (Canary Monitor): Runs every 30 minutes
- Pings all critical APIs (Shopify, WATI, Google Sheets, Stripe)
- If any fail → immediate WhatsApp alert to Arslan
- Daily summary: "All 4 APIs healthy" or "WATI had 2 blips at 3 PM"
Workflow 3 (Dead Letter Queue): Runs every hour
- Checks "retry sheet" for items that failed earlier
- Re-attempts each item
- If 5 retries exhausted → moves to "dead letter" sheet → alerts Arslan
Results after implementing error handling:
- Undetected failures: 3-4/month → 0
- Average detection time: 2-3 days → 2 minutes
- Client complaints about automation: 2/month → 0 in 6 months
- Manual intervention needed: 5 hours/week → 20 minutes/week
- Arslan's reputation: "The automation guy whose stuff just works"
- Raised his retainer by 25% after demonstrating the reliability improvement
His rule: "Every workflow I build now has retry + fallback + alert before I show it to any client. It takes 15 extra minutes to add, and saves hours of firefighting."
Key Takeaways
- Every production workflow MUST have error handling — failure is guaranteed
- The three pillars: Retry (try again), Fallback (use alternative), Alert (notify human)
- Retry works for transient errors (500, 429, timeout) — NOT for 400/401/404
- Use exponential backoff for rate-limited APIs (1s, 2s, 4s, 8s intervals)
- Fallback patterns: alternative service, degraded output, or queue for later
- Alert severity levels: Critical (immediate), Warning (1 hour), Info (no action)
- Include execution URL in alerts so you can jump straight to the failed workflow
- A "canary" monitoring workflow catches problems before they affect clients
- Dead letter queues prevent permanent data loss from repeated failures
- Error handling takes 15 minutes to add but saves hours of incident response
Next lesson: Workflow monitoring with Slack and email notifications — building a real-time dashboard for all your automations.
Lesson Summary
Quiz: Error Handling Patterns — Retry, Fallback & Alert
4 questions to test your understanding. Score 60% or higher to pass.