Back to Blog
Platform EngineeringDevOpsDevExArchitecture

Platform Engineering: Building Internal Developer Platforms

Build self-service infrastructure that accelerates development: golden paths, developer portals, and reducing cognitive load at scale.

Azynth Team
15 min read

Platform Engineering: Building Internal Developer Platforms

Platform engineering is DevOps evolved. Instead of every team managing infrastructure, platform teams build self-service tools that let developers ship faster without becoming infrastructure experts.

What is Platform Engineering?

Traditional DevOps:

  • "You build it, you run it"
  • Every team owns their infrastructure
  • Duplicate work across teams
  • Cognitive overload

Platform Engineering:

  • Platform team provides self-service tools
  • Developers use golden paths
  • Centralized best practices
  • Reduced cognitive load

Example: Instead of each team figuring out Kubernetes, CI/CD, and monitoring, platform provides "deploy my app" button that handles everything.

The Core Problem

At scale, infrastructure becomes a bottleneck:

10 product teams × 8 engineers = 80 developers

Each team needs:

  • Kubernetes cluster
  • CI/CD pipelines
  • Database provisioning
  • Monitoring setup
  • Secret management
  • Log aggregation

Without platform: 80 engineers distracted by infrastructure With platform: 80 engineers building features

Platform engineering = force multiplier

Golden Paths: The Foundation

A golden path is the "opinionated but flexible" way to do something:

Bad: "Here's kubectl, good luck"

# Developer has to know: kubectl create namespace my-app kubectl apply -f deployment.yaml kubectl apply -f service.yaml kubectl apply -f ingress.yaml # Plus: secrets, configmaps, RBAC, network policies...

Good: Golden path with CLI

# Platform-provided tool platform deploy \ --app my-service \ --image ghcr.io/company/my-service:v1.2.3 \ --env production \ --replicas 3 # Behind the scenes: # - Creates namespace # - Applies standard manifests # - Configures monitoring/logging # - Sets up secrets from vault # - Configures autoscaling

Developer gets 80% of what they need with zero infrastructure knowledge.

Building a Platform CLI

// platform-cli: Deploy command package cmd import ( "github.com/spf13/cobra" "platform/internal/k8s" "platform/internal/vault" "platform/internal/monitoring" ) var deployCmd = &cobra.Command{ Use: "deploy", Short: "Deploy an application to Kubernetes", RunE: func(cmd *cobra.Command, args []string) error { app := cmd.Flag("app").Value.String() image := cmd.Flag("image").Value.String() env := cmd.Flag("env").Value.String() replicas := cmd.Flag("replicas").Value.String() // 1. Generate manifests from templates manifests := generateManifests(app, image, env, replicas) // 2. Fetch secrets from Vault secrets, err := vault.GetSecrets(app, env) if err != nil { return err } // 3. Apply to Kubernetes if err := k8s.Apply(manifests, secrets); err != nil { return err } // 4. Configure monitoring if err := monitoring.Setup(app, env); err != nil { return err } // 5. Update service mesh if err := configureServiceMesh(app, env); err != nil { return err } fmt.Printf("✅ %s deployed to %s\n", app, env) fmt.Printf("🔗 https://%s.%s.company.com\n", app, env) return nil }, }

Key principles:

  • Sensible defaults (replicas=3, autoscaling enabled)
  • Escape hatches for advanced use cases
  • Error messages that suggest solutions

Developer Portal with Backstage

Backstage (by Spotify) is the de facto standard for developer portals:

# catalog-info.yaml - Service definition apiVersion: backstage.io/v1alpha1 kind: Component metadata: name: payment-service description: Payment processing service annotations: github.com/project-slug: company/payment-service pagerduty.com/integration-key: abc123 grafana/dashboard-selector: "service:payment-service" spec: type: service lifecycle: production owner: payments-team system: checkout providesApis: - payment-api consumesApis: - user-api - fraud-detection-api dependsOn: - resource:postgres-payments - resource:redis-cache

Backstage shows:

  • Service ownership and dependencies
  • Live deployment status
  • Recent deployments and rollbacks
  • On-call rotation
  • Runbooks and documentation
  • Metrics and logs (embedded Grafana)

Custom Backstage Plugin

// packages/plugin-platform/src/components/DeployButton.tsx import React from 'react'; import { useEntity } from '@backstage/plugin-catalog-react'; import { Button } from '@material-ui/core'; export const DeployButton = () => { const { entity } = useEntity(); const handleDeploy = async () => { const response = await fetch('/api/platform/deploy', { method: 'POST', body: JSON.stringify({ service: entity.metadata.name, environment: 'staging', image: 'latest', }), }); if (response.ok) { alert('Deployment started!'); } }; return ( <Button variant="contained" color="primary" onClick={handleDeploy}> Deploy to Staging </Button> ); };

Infrastructure Provisioning: Terraform Modules

Standardize infrastructure with reusable modules:

# modules/service/main.tf module "service" { source = "github.com/company/terraform-modules//service" name = var.service_name environment = var.environment # Defaults provided by platform container_image = var.image replicas = var.replicas cpu_request = "100m" memory_request = "256Mi" # Auto-configured monitoring_enabled = true logging_enabled = true tracing_enabled = true # Service mesh integration istio_enabled = true # Database (optional) database = var.needs_database ? { engine = "postgres" version = "15" size = "db.t3.medium" } : null }

Developer usage:

# teams/payments/main.tf module "payment_service" { source = "../../modules/service" service_name = "payment-service" environment = "production" image = "ghcr.io/company/payment-service:v1.2.3" replicas = 5 needs_database = true }

Platform handles all the complexity: networking, security groups, IAM roles, logging, monitoring.

Self-Service Database Provisioning

# Platform API for database provisioning from fastapi import FastAPI, HTTPException from pydantic import BaseModel import boto3 import terraform app = FastAPI() class DatabaseRequest(BaseModel): team: str service: str engine: str # postgres, mysql, mongodb environment: str size: str = "small" # small, medium, large @app.post("/api/databases/provision") async def provision_database(req: DatabaseRequest): # Validate request if req.engine not in ["postgres", "mysql", "mongodb"]: raise HTTPException(400, "Invalid database engine") # Generate Terraform config tf_config = generate_terraform_config( team=req.team, service=req.service, engine=req.engine, env=req.environment, size=req.size ) # Apply Terraform result = terraform.apply(tf_config) # Store connection info in Vault connection_string = result['outputs']['connection_string'] vault.write( path=f"database/{req.team}/{req.service}/{req.environment}", data={"connection_string": connection_string} ) # Create monitoring dashboard create_database_dashboard(req.service, req.environment) # Send notification slack.send( channel=f"#{req.team}", message=f"✅ Database provisioned for {req.service} ({req.environment})" ) return { "status": "provisioned", "endpoint": result['outputs']['endpoint'], "vault_path": f"database/{req.team}/{req.service}/{req.environment}" }

From developer perspective:

curl -X POST https://platform.company.com/api/databases/provision \ -d '{ "team": "payments", "service": "payment-service", "engine": "postgres", "environment": "staging" }'

5 minutes later: database ready, secrets in Vault, monitoring configured.

Reducing Cognitive Load

Platform engineering is about reducing decisions developers must make:

Example: CI/CD Pipeline

Without platform:

# Developer writes from scratch (100+ lines) name: CI/CD on: [push] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run tests run: npm test # ... 20 more steps build: # ... another 30 lines deploy: # ... another 50 lines

With platform:

# .platform.yaml pipeline: nodejs-service # Platform-provided template tests: npm test deploy_to: - staging - production

Platform generates the full pipeline.

Measuring Platform Success

Track platform adoption and impact:

# Platform metrics metrics = { # Adoption "teams_using_platform": 85, # % of teams "services_on_platform": 120, # Velocity "avg_deployment_time": "8 min", # vs 45 min before "deployments_per_day": 450, # vs 120 before # Quality "incident_mttr": "12 min", # vs 45 min before "deployment_success_rate": 0.98, # Developer satisfaction "nps_score": 65, # Net Promoter Score "support_tickets_per_week": 8, # vs 40 before }

Anti-Patterns to Avoid

1. Building Too Early

Don't build platform before you have 3+ teams with duplicate work.

2. No Escape Hatches

# BAD: Can't override defaults platform deploy --app my-service # GOOD: Advanced users can customize platform deploy \ --app my-service \ --cpu-limit 2000m \ --custom-manifest overrides.yaml

3. Forcing Migration

Evangelize, don't mandate. Show value first.

4. Ignoring Feedback

Platform teams ARE product teams. Listen to your users (developers).

Real-World Example: Shopify

Shopify's platform team provides:

  • Shipit: Deploy any service to Kubernetes
  • Railgun: CI/CD pipeline generator
  • Vault integration: Automatic secret management
  • Dev environments: Spin up full stack in one command

Result:

  • 2000+ engineers using platform
  • 10,000+ deployments per day
  • New service deployed in <30 mins

Getting Started

Phase 1: Identify Pain Points

  • Survey developers: what's slowing you down?
  • Common answer: deployment, database setup, secrets

Phase 2: Build One Golden Path

  • Start small: standardize deployments
  • Get 3 teams using it
  • Iterate based on feedback

Phase 3: Expand

  • Add more golden paths (databases, queues, caches)
  • Build developer portal
  • Automate toil

Phase 4: Scale

  • Self-service everything
  • Treat platform as a product
  • Measure and optimize

Conclusion

Platform engineering is not about controlling developers—it's about empowering them. The best platforms are invisible: developers ship features without thinking about infrastructure.

Signs you need platform engineering:

  • Multiple teams duplicating infrastructure work
  • Deployments take hours/days
  • Developers spending >30% time on ops

Benefits:

  • 5-10x faster deployments
  • Reduced cognitive load
  • Consistent best practices
  • Happier developers

Ready to build your internal platform? Let's talk about your engineering organization.

You might also like