System Design

Architecture

A deep dive into how SciDx Remote Execution connects local Python environments to remote Kubernetes compute through ZeroMQ message routing, token-based authentication, and on-demand pod provisioning.

System Overview

Rexec enables a local Python client (inside a Jupyter Notebook or script) to transparently execute functions on a remote Kubernetes cluster. The architecture has five logical layers:

Client Layer

The ndp_ep or scidx-rexec Python library running on the user's machine. Serializes function objects and sends them over ZeroMQ.

NDP EP API

FastAPI gateway. Provides the broker address and spawn API URL to authenticated clients. Also provides dataset discovery.

ZeroMQ Broker

Stateless message router deployed inside the cluster. Routes function call frames between N clients and N dedicated server pods.

Server Deploy API

FastAPI service that provisions and tears down per-user rexec-server pods via the Kubernetes API.

Rexec Server Pods

Per-user worker pods running inside the cluster. Connect to the broker's internal address and execute dispatched Python functions.

Auth API

External identity provider. All components validate Bearer tokens against the same AUTH_API_URL endpoint.

Full System Diagram

Rexec Full Architecture Diagram

Source: reference/svg/rexec-arc.svg

ZeroMQ Transport Layer

Rexec uses ZeroMQ as its transport layer for low-latency, high-throughput message passing between clients and server pods.

Socket Pattern: ROUTER–DEALER

┌─────────────────────────────────────────────────────────┐
│                   ZeroMQ Broker Pod                     │
│                                                         │
│   ROUTER frontend          ROUTER backend               │
│   (NodePort 30001)         (ClusterIP 5560)             │
│        │                         │                      │
│        ▼                         ▼                      │
│   [routing table] ←────────→ [routing table]            │
└─────────────────────────────────────────────────────────┘
         │                         │
         ▼                         ▼
   Client (DEALER)          Server Pod (DEALER)
   tcp://<node>:30001       tcp://rexec-broker-internal-ip:5560

Message Frame Structure(outdated; need to update this section to reflect current output stream zmq msg structure

Client → Broker → Server
[client-identity] [empty] [token] [function-dill] [args-dill]
Server → Broker → Client
[client-identity] [empty] [status] [result-dill]

Port Reference

PortService NameTypeWho Connects
30001(external)rexec-broker-external-ipNodePortExternal clients (users)
30002(external)rexec-broker-external-ipNodePortManagement / control
5560(internal)rexec-broker-internal-ipClusterIPServer pods (in-cluster)
/rexecrexec-server-deployment-apiIngress (HTTP)NDP EP API (internal)
/apindp-ep-apiIngress (HTTP)External clients (users)

Broker Internals

The broker is deployed as a single Kubernetes Deployment with a replicaCount of 1 by default. It exposes two Kubernetes Services:

rexec-broker-external-ip  (NodePort)
  └── port 30001/TCP → broker frontend (client-facing)
  └── port 30002/TCP → broker control

rexec-broker-internal-ip  (ClusterIP)
  └── port 5560/TCP  → broker backend (server pod-facing)

Server Pod Lifecycle

 Sysadmin pre-deploys all three components to Kubernetes (via Helm chart):
 ┌─────────────────────────────────────────────────────────────────────────┐
 │  Kubernetes                                                             │
 │  [NDP EP API pod]   [Deploy API pod]   [Broker pod]                    │
 └─────────────────────────────────────────────────────────────────────────┘
 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
User            [NDP EP API pod]     [Deploy API pod]      Kubernetes
 │                      │                    │                    │
 │── GET /api/rexec ───▶│                    │                    │
 │   Bearer token       │── POST /spawn ────▶│                    │
 │                      │   token            │── create ns ──────▶│
 │                      │                    │── create pod ─────▶│  [Server pod]
 │                      │                    │                    │
 │                      │                    │◀── pod running ────│
 │ ◀──────────────────────── {broker_url} ───│                    │
 │                      │                    │  [Server pod] connects to [Broker pod]:5560
 │                      │                    │                    │
 │── ZMQ connect ───────────────────────────────────────────────▶ [Broker pod]
 │   (broker_url:30001) │                    │                    │
 │── send(fn) ──────────────────────────────────────────────────▶ [Broker pod]
 │                      │                    │                      ──▶ [Server pod]
 │                      │                    │                          [executes fn]
 │◀── result ─────────────────────────────────────────────────── [Broker pod]
 |                      │                    │                    │
 

Authentication Flow

Every component in the stack validates the user's Bearer token independently against the configured AUTH_API_URL. This means the identity provider is the single source of truth for all access decisions.

Token validation happens at 3 independent points:

1. NDP EP API   → validates token + (optionally) group membership before returning Rexec Server Spawn API addr
2. Deploy API   → validates token + (optionally) group membership before provisioning server pod
3. ZMQ Broker   → validates token on connection handshake before routing messages
⚠️
All components must use the same AUTH_API_URL. A token issued by IDP-A will be rejected by a component configured to validate against IDP-B.

Kubernetes Network Topology

External Network
┌──────────────────────────────────────────────────────────────┐
│ User Laptop / Jupyter                                        │
│   ndp_ep client ──→ HTTPS /api/*    ──→ [Ingress] NDP EP API │
│   scidx_rexec   ──→ TCP  :30001     ──→ [NodePort] Broker    │
└──────────────────────────────────────────────────────────────┘

Kubernetes Cluster
┌──────────────────────────────────────────────────────────────┐
│  Namespace: rexec                                            │
│  ┌────────────────┐   ┌──────────────────────────────────┐   │
│  │  NDP EP API    │──▶│  Deploy API                      │   │
│  │  (Ingress /api)│   │  (Ingress /rexec)                │   │
│  └────────────────┘   └──────────────────────────────────┘   │
│                                   │ kubectl                  │
│  Namespace: rexec-broker          ▼                          │
│  ┌────────────────┐  Namespace: rexec-server-<user>          │
│  │  Broker Pod    │  ┌───────────────────────────────────┐   │
│  │  :30001 (ext)  │◀─│  rexec-server Pod                 │   │
│  │  :5560  (int)  │─▶│  (ephemeral, per-user)            │   │
│  └────────────────┘  └───────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

External Auth
┌──────────────────────────────────────────────────────────────┐
│  AUTH_API_URL  (e.g. idp-test.nationaldataplatform.org)      │
│  ← All 3 in-cluster components validate tokens here          │
└──────────────────────────────────────────────────────────────┘

Design Decision: Why ZeroMQ?

✅ Low latency

ZeroMQ's lock-free queuing and minimal framing overhead keep round-trip times in the single-millisecond range for typical function payloads.

✅ Language agnostic

ZeroMQ has bindings for 30+ languages. Future server implementations in Julia, R, or Rust are straightforward.

✅ N:M routing

A single ROUTER–DEALER broker naturally handles N clients talking to M server pods without any coordination overhead.

✅ No message queue infrastructure

No Kafka, RabbitMQ, or Redis required. The broker itself is the queue — one fewer service to operate.

Design Decision: Why Per-User Server Pods?

✅ Isolation

Each user gets a dedicated Python process. One user's long-running job or crash cannot affect another user's session.

✅ Resource limits

Kubernetes resource requests/limits can be applied per pod, giving fine-grained CPU/memory accounting per user.

✅ Custom environments

Different users can get different container images (e.g., different Python packages) by configuring the Deploy API.

⚖️ Trade-off: startup time

Pod provisioning takes 5–30s. For interactive workflows this is acceptable; for sub-second latency, a pre-warmed pool approach would be needed.

Scalability Considerations