Platform Engineering at Scale: Banking Edition

Platform engineering in banking is different. You’re not just optimizing for speed — you’re optimizing for speed within a complex web of regulations, security requirements, and legacy systems. Here’s what I’ve learned doing this at scale at BTPN.

The Banking Reality

In banking, you can’t just kubectl apply and call it a day. Every deployment needs:

Change management approval
Security scanning (SAST, DAST, container scanning)
Compliance validation
Audit trail generation
Rollback verification plan

A simple config change can require sign-off from three different teams. The platform needs to handle this complexity without slowing engineers down.

Self-Service with Guardrails

The core philosophy: make the right thing the easy thing, and the wrong thing impossible.

Our platform lets engineers provision infrastructure, deploy services, and manage configurations through self-service. But every action flows through automated compliance checks:

deploy:
  checks:
    - sonarqube: quality_gate
    - vault: secrets_rotation
    - trivy: vulnerability_scan
    - opa: compliance_policy

If any check fails, the deployment is blocked. But here’s the key: these checks run automatically, in parallel, and complete in under 2 minutes. Engineers get fast feedback without waiting for manual reviews.

The Audit Trail Problem

“Can you prove that deployment X was approved by the right people?”

In banking, this question comes up regularly — during audits, incident reviews, and regulatory examinations. Our platform generates a complete audit trail for every action:

Who triggered it
What changed
When it happened
Which approvals were obtained
Which compliance checks passed/failed

This isn’t just a log file — it’s structured, queryable, and retained for 7 years (a regulatory requirement).

Integration with Legacy Systems

Not everything runs on Kubernetes. Banks have mainframes, on-premise databases, and proprietary systems that predate cloud computing.

Our platform needs to deploy to OpenShift containers AND configure firewall rules on legacy network appliances. The abstraction layer we built treats everything as a “target” with a common interface:

interface DeploymentTarget {
  name: string;
  type: "openshift" | "vm" | "network" | "database";
  deploy(config: Config): Promise<DeployResult>;
  validate(config: Config): ValidationResult;
  rollback(deploymentId: string): Promise<void>;
}

Engineers don’t need to know the implementation details of each target type. They define their desired state, and the platform handles the rest.

Measuring What Matters

We track DORA metrics religiously:

Deployment Frequency: How often do we ship?
Lead Time for Changes: How long from commit to production?
Change Failure Rate: What percentage of deployments cause incidents?
Time to Restore Service: How fast can we recover from failures?

In the last year, our platform helped reduce lead time by 70% while maintaining a change failure rate below 2%.

The Hardest Part

Platform engineering at scale is 20% technology and 80% organizational change. Convincing teams to standardize their workflows, adopt shared tools, and trust the platform is harder than writing any amount of code.

We treat the platform like a product:

Roadmap presentations to engineering leadership
Regular feedback sessions with platform users
Office hours for hands-on support
NPS surveys to measure satisfaction

If you’re building a platform in a regulated industry, invest in relationships. The best platform in the world is useless if no one uses it.