IaC Lessons from Identity Platforms Link to heading

By April 2023, managing identity infrastructure across multiple environments had become increasingly complex. Manual configuration across development, staging, and production environments was error-prone and time-consuming. The move to Terraform-based Infrastructure as Code promised consistency, but brought its own challenges.

The Manual Configuration Problem Link to heading

Before adopting IaC for identity infrastructure, deployment processes typically involve:

  • Manual configuration through admin consoles
  • Screenshot-based “documentation”
  • Copying settings between environments
  • Hoping configuration details are remembered correctly
  • Gradual environment drift over time

This approach has critical flaws:

  • No audit trail: Changes aren’t tracked or versioned
  • Error-prone: Manual processes lead to inconsistencies
  • Not reproducible: Recreation requires extensive documentation
  • Compliance challenges: Difficult to verify consistency
  • Security risks: Manual processes increase misconfiguration risk

Why Terraform for Identity Infrastructure Link to heading

Terraform offers compelling advantages:

  • Declarative approach: Define desired state, not procedural steps
  • State management: Track deployments and detect drift
  • Environment consistency: Identical code creates identical environments
  • Version control: Infrastructure changes follow code review processes
  • Provider ecosystem: Many identity platforms offer Terraform providers

Early Terraform Configurations Link to heading

Our first attempt was straightforward - basic OIDC client configurations:

# main.tf
terraform {
  required_version = ">= 1.0"
  required_providers {
    identity = {
      source  = "identity-provider/identity"
      version = "~> 0.15.0"  # Pre-v1.0 provider
    }
  }
}

provider "identity" {
  client_id      = var.provider_client_id
  client_secret  = var.provider_client_secret
  environment_id = var.environment_id
  region         = var.provider_region
}

# OIDC Application for API services
resource "identity_application" "api_client" {
  environment_id = var.environment_id
  name           = "${var.environment}-api-client"
  description    = "OIDC client for ${var.environment} API services"
  enabled        = true

  oidc_options {
    type                        = "SINGLE_PAGE_APP"
    grant_types                 = ["AUTHORIZATION_CODE", "REFRESH_TOKEN"]
    response_types              = ["CODE"]
    token_endpoint_auth_method  = "NONE"
    redirect_uris              = var.api_redirect_uris
    post_logout_redirect_uris  = var.api_logout_uris
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Service     = "api"
  }
}

# OIDC Application for mobile clients
resource "identity_application" "mobile_client" {
  environment_id = var.environment_id
  name           = "${var.environment}-mobile-client"
  description    = "OIDC client for ${var.environment} mobile applications"
  enabled        = true

  oidc_options {
    type                        = "NATIVE_APP"
    grant_types                 = ["AUTHORIZATION_CODE", "REFRESH_TOKEN"]
    response_types              = ["CODE"]
    token_endpoint_auth_method  = "NONE"
    redirect_uris              = var.mobile_redirect_uris
    support_unsigned_request_object = true
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Service     = "mobile"
  }
}

# Identity Provider Configuration
resource "identity_provider" "corporate_saml" {
  environment_id = var.environment_id
  name           = "${var.environment}-enterprise-saml"
  description    = "Enterprise SAML identity provider"
  enabled        = true

  saml_options {
    idp_entity_id      = var.corporate_saml_entity_id
    sso_service_endpoint = var.corporate_saml_sso_endpoint
    sso_binding        = "HTTP_POST"
    sign_request       = true

    verification_certificate = file("${path.module}/certificates/${var.environment}-saml.crt")
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Type        = "saml"
  }
}

The Power of Environment Variables Link to heading

Managing different environments required careful variable organisation:

# variables.tf
variable "environment" {
  description = "Environment name"
  type        = string
  validation {
    condition = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  }
}

variable "environment_id" {
  description = "Identity platform environment ID"
  type        = string
  sensitive   = true
}

variable "provider_client_id" {
  description = "Provider admin client ID"
  type        = string
  sensitive   = true
}

variable "provider_client_secret" {
  description = "Provider admin client secret"
  type        = string
  sensitive   = true
}

variable "api_redirect_uris" {
  description = "Allowed redirect URIs for API client"
  type        = list(string)
}

variable "mobile_redirect_uris" {
  description = "Allowed redirect URIs for mobile client"
  type        = list(string)
}

variable "saml_entity_id" {
  description = "SAML identity provider entity ID"
  type        = string
}

variable "saml_sso_endpoint" {
  description = "SAML SSO endpoint URL"
  type        = string
}

Environment-specific configurations:

# environments/dev.tfvars
environment = "dev"
environment_id = "dev-environment-id"
api_redirect_uris = [
  "https://dev-api.example.com/auth/callback",
  "https://dev-api.example.com/auth/silent-callback"
]
mobile_redirect_uris = [
  "com.example.app.dev://auth/callback"
]
saml_entity_id = "https://dev.example.com/saml"
saml_sso_endpoint = "https://dev-sso.example.com/saml/login"

# environments/production.tfvars
environment = "production"
environment_id = "prod-environment-id"
api_redirect_uris = [
  "https://api.example.com/auth/callback",
  "https://api.example.com/auth/silent-callback"
]
mobile_redirect_uris = [
  "com.example.app://auth/callback"
]
saml_entity_id = "https://example.com/saml"
saml_sso_endpoint = "https://sso.example.com/saml/login"

State Management Lessons Link to heading

One of Terraform’s biggest benefits is state management, but it also introduced new challenges:

Remote State with Locking Link to heading

# backend.tf
terraform {
  backend "s3" {
    bucket         = "terraform-state-identity"
    key            = "identity/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locking"
  }
}

Environment Isolation Link to heading

Each environment needed its own state file:

# Initialise different backends for different environments
terraform init -backend-config="key=identity/dev/terraform.tfstate"
terraform init -backend-config="key=identity/staging/terraform.tfstate"
terraform init -backend-config="key=identity/production/terraform.tfstate"

The Power of Environment Cloning Link to heading

One of Terraform’s biggest advantages was the ability to create identical environments:

# Create a new staging environment identical to dev
terraform workspace new staging
terraform apply -var-file=environments/staging.tfvars

This capability proved invaluable for:

  • Testing: Spin up identical environments for testing new configurations
  • Disaster recovery: Quickly recreate production environment if needed
  • Development: Give developers their own isolated identity environments
  • Compliance: Ensure staging exactly matches production for audit purposes

The Third-Party Provider Challenge Link to heading

Using pre-1.0 Terraform providers for identity platforms teaches painful lessons about bleeding-edge tooling:

Contract Changes Link to heading

The provider was actively developed, which meant breaking changes between minor versions:

# Version 0.12.0
resource "identity_application" "app" {
  # ... configuration
  oidc_options {
    pkce_enforcement = "OPTIONAL"  # This field existed
  }
}

# Version 0.15.0
resource "identity_application" "app" {
  # ... configuration
  oidc_options {
    # pkce_enforcement field was removed
    pkce_required = true  # Replaced with this boolean
  }
}

Misaligned Values and Resource Removal Link to heading

The most painful lesson came when provider changes led to misaligned state values. Terraform detected differences and decided to “fix” them by removing and recreating resources.

The scenario that haunts us:

  1. Provider update changed how certain values were represented
  2. Terraform detected drift between state and actual configuration
  3. Terraform plan showed it would destroy and recreate OIDC clients
  4. We ran terraform apply in development environment
  5. All client IDs and client secrets were regenerated

The Client ID/Secret Problem Link to heading

In OIDC, client IDs and secrets are issued once and cannot be “set” to specific values. They’re generated by the identity provider and must be accepted as-is. When Terraform recreated our OIDC clients:

  • New client IDs: All applications needed to be updated with new IDs
  • New client secrets: All backend services needed new secrets
  • Broken authentication: Applications couldn’t authenticate until updated
  • Manual secret distribution: Had to update secrets across all services

Thankfully, this happened in development. But it taught us that some things are non-replaceable in identity infrastructure.

Protecting Critical Resources Link to heading

After the client recreation incident, we implemented safeguards:

# Protect critical OIDC clients from accidental deletion
resource "identity_application" "production_api_client" {
  # ... configuration

  lifecycle {
    prevent_destroy = true
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Service     = "api"
    Critical    = "true"
  }
}

# Use data sources for existing critical resources
data "identity_application" "existing_mobile_client" {
  environment_id = var.environment_id
  name           = "production-mobile-client"
}

# Reference existing client instead of managing it
locals {
  mobile_client_id = data.pingone_application.existing_mobile_client.id
}

Secrets Management Integration Link to heading

The client secret regeneration incident forced us to integrate proper secrets management:

# Store generated secrets in AWS Secrets Manager
resource "aws_secretsmanager_secret" "oidc_client_secrets" {
  for_each    = toset(["api", "mobile", "web"])
  name        = "identity/${var.environment}/${each.key}-client-secret"
  description = "OIDC client secret for ${each.key} in ${var.environment}"

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Service     = each.key
  }
}

resource "aws_secretsmanager_secret_version" "oidc_client_secrets" {
  for_each  = aws_secretsmanager_secret.oidc_client_secrets
  secret_id = each.value.id
  secret_string = jsonencode({
    client_id     = identity_application.clients[each.key].oidc_options[0].client_id
    client_secret = identity_application.clients[each.key].oidc_options[0].client_secret
  })
}

Monitoring and Drift Detection Link to heading

We implemented monitoring to catch configuration drift:

# CloudWatch alarm for Terraform plan changes
resource "aws_cloudwatch_metric_alarm" "terraform_drift" {
  alarm_name          = "identity-terraform-drift-${var.environment}"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "1"
  metric_name         = "TerraformPlanChanges"
  namespace           = "Identity/Terraform"
  period              = "3600"
  statistic           = "Maximum"
  threshold           = "0"
  alarm_description   = "Terraform plan detected changes in identity infrastructure"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    Environment = var.environment
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

Lessons Learned Link to heading

Pre-v1.0 Provider Risks Link to heading

The Risk: Third-party providers that haven’t reached v1.0 can have breaking changes that cause resource recreation.

The Mitigation:

  • Pin provider versions strictly: version = "= 0.15.0" (not ~> 0.15.0)
  • Test provider updates in isolated environments
  • Always run terraform plan and carefully review changes
  • Identify non-replaceable resources and protect them

State Management is Critical Link to heading

The Learning: Terraform state is the source of truth, but it can become misaligned with reality.

Best Practices Developed:

  • Use remote state with locking from day one
  • Implement state file backups
  • Regular terraform refresh to sync state with reality
  • Monitor for state drift

Environment Parity is Powerful Link to heading

The Benefit: Being able to create identical environments revolutionised our development process.

Applications:

  • Feature testing in production-like environments
  • Disaster recovery scenarios
  • Developer environment provisioning
  • Compliance demonstrations

Some Resources Are Non-Replaceable Link to heading

The Reality: In identity systems, certain values (client IDs, secrets, certificates) cannot be “set” - they’re generated and must be accepted.

Protection Strategies:

  • Use prevent_destroy lifecycle rules
  • Import existing critical resources instead of creating new ones
  • Implement approval workflows for production changes
  • Maintain manual backups of critical configuration

The Bigger Picture Link to heading

Moving to Infrastructure as Code for our identity platform was transformative, but it required learning to work with imperfect tools. The Terraform provider ecosystem is powerful but comes with risks, especially for pre-v1.0 providers.

The incident with client recreation taught us that infrastructure as code isn’t just about automation - it’s about understanding the implications of every change and protecting the resources that cannot be recreated.

Six months later, we had a robust identity infrastructure deployment process that could recreate any environment from code, but we also had deep respect for the power and risks of infrastructure automation.

But that’s a story for another chapter.


What experiences have you had with third-party Terraform providers? How do you protect critical resources from accidental recreation?