API Specification
This document defines the REST APIs, authentication mechanisms, and cloud platform specifications for the EsoCore IoT system using Django REST Framework.
Cloud Platform Architecture
Any capable backend can fulfill these roles. The following concrete model assumes Django + Django REST Framework for speed and maintainability.
Django Models / Core Data Model
- Device: id, serial, model, keys, firmware, last_seen, site/customer, tags, status
- Asset: asset_id, site/customer, type (door/cnc/robot/etc.), make, model
- Telemetry Packet (ingest ledger): device_id, upload_id, checksum, status
- Timeseries (partitioned):
- TelemetryPoint: ts, device_id, metric, value, unit, meta (JSON)
- TelemetryWindow: ts, device_id, metric, stats (rms/max/std/bandpower)
- System Events:
- SystemEvent: ts, device_id, event_type, severity, description, duration, resolved
- EventLog: comprehensive audit trail with full context and metadata
- Asset Events: cycle_open, cycle_close, alerts (over‑torque, vibration)
- Event Workflows: automated response rules triggered by system events
- Notification Queue: pending alerts with delivery status and retries
- Config Profile: sampling rates, window sizes, thresholds per device type
- Firmware Bundle: version, hash, channel (stable/beta), rollout policy
- Alert Rule & Notification: thresholds, conditions, recipients, channels
- Customer Portal Models: Site, UserDeviceRole, Dashboards
REST API (Device-Facing)
Authentication Endpoint
POST /api/iot/v1/auth/handshake
Returns nonce & server time (optional)
Telemetry Upload
POST /api/iot/v1/telemetry/batch
Headers:
X-Device-Id
: Unique device identifierX-Auth-Key
: Device authentication keyIdempotency-Key
: UUIDv4 for duplicate detectionContent-SHA256
: Request body hash
Body: gzip/zstd JSON batch
Response:
{
"accepted": 1250,
"duplicates": 3,
"rejected": 0,
"next_hint": "backoff_ms: 5000"
}
Configuration Management
GET /api/iot/v1/config
Returns device configuration (sampling rates, thresholds, NTP servers, endpoints)
OTA Management
GET /api/iot/v1/ota/check
Returns firmware version and signed download URL
POST /api/iot/v1/ota/report
Device reports OTA update status
Authentication Methods
- Django API Key/Secret per Device (using django-rest-framework-api-key), or
- OAuth2 Client Credentials for central key rotation
- JWT Authentication for user sessions
Sync Logic & Protocol
Authentication
- Device API Key/Secret (provisioned once), HMAC on body, and TLS
- Key rotation via OTA updates
Data Batching
- Batch size: ~1–5 MB or 5–10k records per POST
- Compression: gzip or zstd compression
Idempotency & Reliability
- Idempotency-Key header (UUIDv4) + server‑side deduplication
- Exponential backoff with jitter
- Priority queuing: newest data first, then backfill
Clock Synchronization
- NTP at boot and every 12 hours
- Firmware tolerance: ±2 min skew
- Server re-stamping: available on request
Event Processing & Workflow Automation
Event Ingestion Pipeline
System Event → Event Parser → Severity Classification → Workflow Trigger → Notification Queue → Delivery
Workflow Examples
Event Type | Trigger Condition | Automated Actions |
---|---|---|
connectivity_lost |
Duration > 5 minutes | Email to maintenance team, escalate if > 30 min |
power_outage |
Severity = critical | Immediate SMS + email, create maintenance ticket |
safe_shutdown_initiated |
Any occurrence | Critical alert + incident creation, prepare site visit |
supercap_low |
Voltage < 80% rated | Maintenance alert, check backup power system |
safety_edge_triggered |
Frequency > threshold | Safety audit alert, check alignment |
emergency_stop_activated |
Any activation | Immediate safety team notification + incident log |
light_curtain_breach |
Unexpected breach | Security alert if outside operating hours |
server_unreachable |
Retry count > 5 | Notify IT team, check server status |
sensor_fault |
Any sensor failure | Schedule maintenance, disable affected alerts |
tamper_detected |
Security event | Instant security team alert, log incident |
sd_error |
Storage failure | Emergency notification, prepare replacement |
Django Workflow Integration
- Model Workflows: Auto-create maintenance tickets from critical events (using django-workflows or custom logic)
- Email Templates: Customizable notifications with event context (Django templates + django-mailer)
- Notification Hooks: Slack/Teams/webhook integrations (django-notifications or custom webhooks)
- Escalation Rules: Time-based escalation chains (L1 → L2 → L3 support)
- Conditional Logic: Event severity, time of day, on-call schedules (Django signals + custom logic)
- Batch Processing: Group related events to prevent alert fatigue (Django Celery tasks)
Event Correlation
- Pattern Detection: Identify recurring issues across device fleet
- Root Cause Analysis: Link cascading events (power → connectivity → sync failure)
- Anomaly Detection: Machine learning on event patterns for predictive alerting
Admin & UX Features
Dashboards
Per device & per site:
- Live status (online/offline), last sync, SD health, firmware version
- Trends: ambient/stator temperature, RH, torque/current, vibration RMS/bandpower, acoustics level/features
- Asset cycle counters & daily/weekly histograms
Alerting System
- Email/Slack/Webhooks for thresholds (e.g., over‑torque, rising vibration, elevated acoustics/ultrasound events)
- Device offline notifications
- Sync backlog alerts
Data Management
- Data Explorer: filter by site/door/device/metric/time, CSV/Parquet export
- Device Provisioning UI: create Device, print claim QR (Device ID + bootstrap token), see first‑seen time & firmware channel
Access Control
- RBAC: Customer can see only their sites/devices
Storage & Performance
Database Design
- Partitioning: tables by month for telemetry; indexes on
(device_id, ts)
and(metric, ts)
(using django-partitions) - External TSDB (optional): TimescaleDB/ClickHouse via Django integration for heavy analytics
- Cold storage: optional S3 for raw vibration windows (using django-storages)
- Database: SQLite for development, PostgreSQL/MySQL for production
Device Provisioning & Security
Manufacturing Process
- Flash firmware: inject unique
device_id
, API key/secret, and initial config - QR code generation: for technician claiming process
Device Claiming
- Technician process: scan QR → portal links device to customer/site → server enables the key
- Security: device remains inactive until properly claimed
Security Features
- Mutual TLS (optional tier): pin server CA; rotate keys annually; OTA key roll supported
- Secure boot & OTA: signed images, A/B partition, automatic rollback on watchdog
Sync & Reliability Details
Offline Buffer Management
- Minimum capacity: 30 days at nominal rates (size calculated during design)
- Resume capability: maintain upload cursor per file + per record offset
- Recovery: on reconnect, resume at last acknowledged position
Duplicate Protection
- Server storage:
(device_id, batch_id, record_hash)
for 30 days - Idempotency: guarantee exactly-once processing
Backpressure Control
- Server response:
next_hint: backoff_ms
for rate limiting - Dynamic configuration: server can reduce device sampling via
/config
endpoint
Error Handling
HTTP Status Codes
- 200: Successful data ingestion
- 400: Invalid request format
- 401: Authentication failure
- 409: Duplicate data (idempotency key conflict)
- 429: Rate limiting active
- 500: Server error
- 503: Service unavailable (maintenance mode)
Retry Logic
- Exponential backoff: starting at 1s, max 300s
- Jitter: ±25% to prevent thundering herd
- Circuit breaker: fail fast after consecutive failures
- Fallback: local storage with sync on recovery