Horizontal Scaling
ANIP services scale horizontally by running multiple stateless replicas behind a load balancer, sharing a PostgreSQL database. No cluster-wide reconfiguration is needed to add or remove replicas.
Architecture
Any replica can handle any request. Coordination happens through lease tables in PostgreSQL:
- Exclusive invocation locks prevent duplicate execution of the same capability for the same principal across replicas
- Leader election ensures only one replica generates checkpoints at a time
- Shared audit log all replicas write to the same audit table
Setup
The only change from single-instance to cluster is the storage DSN:
- Python
- TypeScript
- Go
- Java
- C#
service = ANIPService(
service_id="my-service",
capabilities=[...],
storage="postgres://user:pass@host:5432/anip",
trust="signed",
key_path="/etc/anip-keys",
checkpoint_policy=CheckpointPolicy(interval_seconds=60),
authenticate=...,
)
const service = createANIPService({
serviceId: "my-service",
capabilities: [...],
storage: "postgres://user:pass@host:5432/anip",
trust: "signed",
keyPath: "/etc/anip-keys",
checkpointPolicy: { intervalSeconds: 60 },
authenticate: ...,
});
svc, _ := service.New(service.Config{
ServiceID: "my-service",
Capabilities: capabilities,
Storage: "postgres://user:pass@host:5432/anip",
Trust: "signed",
KeyPath: "/etc/anip-keys",
CheckpointPolicy: service.CheckpointPolicy{IntervalSeconds: 60},
Authenticate: authenticate,
})
new ANIPService(new ServiceConfig()
.setServiceId("my-service")
.setCapabilities(capabilities)
.setStorage("postgres://user:pass@host:5432/anip")
.setTrust("signed")
.setKeyPath("/etc/anip-keys")
.setCheckpointPolicy(new CheckpointPolicy().setIntervalSeconds(60))
.setAuthenticate(authenticate));
var service = new AnipService(new ServiceConfig {
ServiceId = "my-service",
Capabilities = capabilities,
Storage = "postgres://user:pass@host:5432/anip",
Trust = "signed",
KeyPath = "/etc/anip-keys",
CheckpointPolicy = new CheckpointPolicy { IntervalSeconds = 60 },
Authenticate = authenticate,
});
The runtime creates all required tables automatically on first connection. No manual database setup is needed — just point it at an empty PostgreSQL database.
Signing key distribution
All replicas must use the same signing key material. Options:
Kubernetes Secret (recommended)
volumes:
- name: anip-keys
secret:
secretName: anip-signing-key
containers:
- name: anip
volumeMounts:
- name: anip-keys
mountPath: /etc/anip-keys
readOnly: true
env:
- name: ANIP_KEY_PATH
value: /etc/anip-keys
- name: ANIP_STORAGE
value: postgres://user:pass@postgres:5432/anip
KMS-backed
For AWS KMS, GCP Cloud KMS, or HashiCorp Vault — the key material never leaves the KMS boundary. Custom KeyManager implementations can delegate signing to the external service.
What the runtime handles automatically
When you switch to PostgreSQL, the runtime handles all cluster coordination for you. You don't need to configure or think about any of this — it just works:
- Checkpoints are generated by one replica at a time (automatic leader election). If that replica goes down, another takes over on the next tick. No manual intervention.
- Audit retention runs safely on all replicas simultaneously — cleaning up expired entries is idempotent.
- Duplicate prevention — if the same invocation request hits two replicas simultaneously, only one executes it. The runtime uses short-lived locks in PostgreSQL to prevent double execution.
If you have long-running capability handlers (over 60 seconds), increase the lock timeout:
ANIPService(
...,
storage="postgres://...",
exclusive_ttl=120, # seconds, default is 60
)
Otherwise, no cluster configuration is needed beyond the PostgreSQL connection string.
What stays the same
Scaling from one replica to many changes nothing about the protocol surface:
- Same 9 HTTP endpoints
- Same manifest, same signature
- Same delegation tokens (verified by any replica)
- Same audit log (shared in PostgreSQL)
- Same checkpoints (generated by elected leader)
- Same JWKS (same key material)
Clients and agents don't know or care how many replicas exist behind the load balancer.
Next steps
- Configuration — Storage, auth, and trust setup
- Observability — Logging, metrics, and tracing hooks
- Deployment guide — Full deployment reference