IaC Lessons from Identity Platforms Link to heading
By April 2023, managing identity infrastructure across multiple environments had become increasingly complex. Manual configuration across development, staging, and production environments was error-prone and time-consuming. The move to Terraform-based Infrastructure as Code promised consistency, but brought its own challenges.
The Manual Configuration Problem Link to heading
Before adopting IaC for identity infrastructure, deployment processes typically involve:
- Manual configuration through admin consoles
- Screenshot-based “documentation”
- Copying settings between environments
- Hoping configuration details are remembered correctly
- Gradual environment drift over time
This approach has critical flaws:
- No audit trail: Changes aren’t tracked or versioned
- Error-prone: Manual processes lead to inconsistencies
- Not reproducible: Recreation requires extensive documentation
- Compliance challenges: Difficult to verify consistency
- Security risks: Manual processes increase misconfiguration risk
Why Terraform for Identity Infrastructure Link to heading
Terraform offers compelling advantages:
- Declarative approach: Define desired state, not procedural steps
- State management: Track deployments and detect drift
- Environment consistency: Identical code creates identical environments
- Version control: Infrastructure changes follow code review processes
- Provider ecosystem: Many identity platforms offer Terraform providers
Early Terraform Configurations Link to heading
Our first attempt was straightforward - basic OIDC client configurations:
# main.tf
terraform {
required_version = ">= 1.0"
required_providers {
identity = {
source = "identity-provider/identity"
version = "~> 0.15.0" # Pre-v1.0 provider
}
}
}
provider "identity" {
client_id = var.provider_client_id
client_secret = var.provider_client_secret
environment_id = var.environment_id
region = var.provider_region
}
# OIDC Application for API services
resource "identity_application" "api_client" {
environment_id = var.environment_id
name = "${var.environment}-api-client"
description = "OIDC client for ${var.environment} API services"
enabled = true
oidc_options {
type = "SINGLE_PAGE_APP"
grant_types = ["AUTHORIZATION_CODE", "REFRESH_TOKEN"]
response_types = ["CODE"]
token_endpoint_auth_method = "NONE"
redirect_uris = var.api_redirect_uris
post_logout_redirect_uris = var.api_logout_uris
}
tags = {
Environment = var.environment
ManagedBy = "terraform"
Service = "api"
}
}
# OIDC Application for mobile clients
resource "identity_application" "mobile_client" {
environment_id = var.environment_id
name = "${var.environment}-mobile-client"
description = "OIDC client for ${var.environment} mobile applications"
enabled = true
oidc_options {
type = "NATIVE_APP"
grant_types = ["AUTHORIZATION_CODE", "REFRESH_TOKEN"]
response_types = ["CODE"]
token_endpoint_auth_method = "NONE"
redirect_uris = var.mobile_redirect_uris
support_unsigned_request_object = true
}
tags = {
Environment = var.environment
ManagedBy = "terraform"
Service = "mobile"
}
}
# Identity Provider Configuration
resource "identity_provider" "corporate_saml" {
environment_id = var.environment_id
name = "${var.environment}-enterprise-saml"
description = "Enterprise SAML identity provider"
enabled = true
saml_options {
idp_entity_id = var.corporate_saml_entity_id
sso_service_endpoint = var.corporate_saml_sso_endpoint
sso_binding = "HTTP_POST"
sign_request = true
verification_certificate = file("${path.module}/certificates/${var.environment}-saml.crt")
}
tags = {
Environment = var.environment
ManagedBy = "terraform"
Type = "saml"
}
}
The Power of Environment Variables Link to heading
Managing different environments required careful variable organisation:
# variables.tf
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
}
}
variable "environment_id" {
description = "Identity platform environment ID"
type = string
sensitive = true
}
variable "provider_client_id" {
description = "Provider admin client ID"
type = string
sensitive = true
}
variable "provider_client_secret" {
description = "Provider admin client secret"
type = string
sensitive = true
}
variable "api_redirect_uris" {
description = "Allowed redirect URIs for API client"
type = list(string)
}
variable "mobile_redirect_uris" {
description = "Allowed redirect URIs for mobile client"
type = list(string)
}
variable "saml_entity_id" {
description = "SAML identity provider entity ID"
type = string
}
variable "saml_sso_endpoint" {
description = "SAML SSO endpoint URL"
type = string
}
Environment-specific configurations:
# environments/dev.tfvars
environment = "dev"
environment_id = "dev-environment-id"
api_redirect_uris = [
"https://dev-api.example.com/auth/callback",
"https://dev-api.example.com/auth/silent-callback"
]
mobile_redirect_uris = [
"com.example.app.dev://auth/callback"
]
saml_entity_id = "https://dev.example.com/saml"
saml_sso_endpoint = "https://dev-sso.example.com/saml/login"
# environments/production.tfvars
environment = "production"
environment_id = "prod-environment-id"
api_redirect_uris = [
"https://api.example.com/auth/callback",
"https://api.example.com/auth/silent-callback"
]
mobile_redirect_uris = [
"com.example.app://auth/callback"
]
saml_entity_id = "https://example.com/saml"
saml_sso_endpoint = "https://sso.example.com/saml/login"
State Management Lessons Link to heading
One of Terraform’s biggest benefits is state management, but it also introduced new challenges:
Remote State with Locking Link to heading
# backend.tf
terraform {
backend "s3" {
bucket = "terraform-state-identity"
key = "identity/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-locking"
}
}
Environment Isolation Link to heading
Each environment needed its own state file:
# Initialise different backends for different environments
terraform init -backend-config="key=identity/dev/terraform.tfstate"
terraform init -backend-config="key=identity/staging/terraform.tfstate"
terraform init -backend-config="key=identity/production/terraform.tfstate"
The Power of Environment Cloning Link to heading
One of Terraform’s biggest advantages was the ability to create identical environments:
# Create a new staging environment identical to dev
terraform workspace new staging
terraform apply -var-file=environments/staging.tfvars
This capability proved invaluable for:
- Testing: Spin up identical environments for testing new configurations
- Disaster recovery: Quickly recreate production environment if needed
- Development: Give developers their own isolated identity environments
- Compliance: Ensure staging exactly matches production for audit purposes
The Third-Party Provider Challenge Link to heading
Using pre-1.0 Terraform providers for identity platforms teaches painful lessons about bleeding-edge tooling:
Contract Changes Link to heading
The provider was actively developed, which meant breaking changes between minor versions:
# Version 0.12.0
resource "identity_application" "app" {
# ... configuration
oidc_options {
pkce_enforcement = "OPTIONAL" # This field existed
}
}
# Version 0.15.0
resource "identity_application" "app" {
# ... configuration
oidc_options {
# pkce_enforcement field was removed
pkce_required = true # Replaced with this boolean
}
}
Misaligned Values and Resource Removal Link to heading
The most painful lesson came when provider changes led to misaligned state values. Terraform detected differences and decided to “fix” them by removing and recreating resources.
The scenario that haunts us:
- Provider update changed how certain values were represented
- Terraform detected drift between state and actual configuration
- Terraform plan showed it would destroy and recreate OIDC clients
- We ran
terraform applyin development environment - All client IDs and client secrets were regenerated
The Client ID/Secret Problem Link to heading
In OIDC, client IDs and secrets are issued once and cannot be “set” to specific values. They’re generated by the identity provider and must be accepted as-is. When Terraform recreated our OIDC clients:
- New client IDs: All applications needed to be updated with new IDs
- New client secrets: All backend services needed new secrets
- Broken authentication: Applications couldn’t authenticate until updated
- Manual secret distribution: Had to update secrets across all services
Thankfully, this happened in development. But it taught us that some things are non-replaceable in identity infrastructure.
Protecting Critical Resources Link to heading
After the client recreation incident, we implemented safeguards:
# Protect critical OIDC clients from accidental deletion
resource "identity_application" "production_api_client" {
# ... configuration
lifecycle {
prevent_destroy = true
}
tags = {
Environment = var.environment
ManagedBy = "terraform"
Service = "api"
Critical = "true"
}
}
# Use data sources for existing critical resources
data "identity_application" "existing_mobile_client" {
environment_id = var.environment_id
name = "production-mobile-client"
}
# Reference existing client instead of managing it
locals {
mobile_client_id = data.pingone_application.existing_mobile_client.id
}
Secrets Management Integration Link to heading
The client secret regeneration incident forced us to integrate proper secrets management:
# Store generated secrets in AWS Secrets Manager
resource "aws_secretsmanager_secret" "oidc_client_secrets" {
for_each = toset(["api", "mobile", "web"])
name = "identity/${var.environment}/${each.key}-client-secret"
description = "OIDC client secret for ${each.key} in ${var.environment}"
tags = {
Environment = var.environment
ManagedBy = "terraform"
Service = each.key
}
}
resource "aws_secretsmanager_secret_version" "oidc_client_secrets" {
for_each = aws_secretsmanager_secret.oidc_client_secrets
secret_id = each.value.id
secret_string = jsonencode({
client_id = identity_application.clients[each.key].oidc_options[0].client_id
client_secret = identity_application.clients[each.key].oidc_options[0].client_secret
})
}
Monitoring and Drift Detection Link to heading
We implemented monitoring to catch configuration drift:
# CloudWatch alarm for Terraform plan changes
resource "aws_cloudwatch_metric_alarm" "terraform_drift" {
alarm_name = "identity-terraform-drift-${var.environment}"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
metric_name = "TerraformPlanChanges"
namespace = "Identity/Terraform"
period = "3600"
statistic = "Maximum"
threshold = "0"
alarm_description = "Terraform plan detected changes in identity infrastructure"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
Environment = var.environment
}
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
Lessons Learned Link to heading
Pre-v1.0 Provider Risks Link to heading
The Risk: Third-party providers that haven’t reached v1.0 can have breaking changes that cause resource recreation.
The Mitigation:
- Pin provider versions strictly:
version = "= 0.15.0"(not~> 0.15.0) - Test provider updates in isolated environments
- Always run
terraform planand carefully review changes - Identify non-replaceable resources and protect them
State Management is Critical Link to heading
The Learning: Terraform state is the source of truth, but it can become misaligned with reality.
Best Practices Developed:
- Use remote state with locking from day one
- Implement state file backups
- Regular
terraform refreshto sync state with reality - Monitor for state drift
Environment Parity is Powerful Link to heading
The Benefit: Being able to create identical environments revolutionised our development process.
Applications:
- Feature testing in production-like environments
- Disaster recovery scenarios
- Developer environment provisioning
- Compliance demonstrations
Some Resources Are Non-Replaceable Link to heading
The Reality: In identity systems, certain values (client IDs, secrets, certificates) cannot be “set” - they’re generated and must be accepted.
Protection Strategies:
- Use
prevent_destroylifecycle rules - Import existing critical resources instead of creating new ones
- Implement approval workflows for production changes
- Maintain manual backups of critical configuration
The Bigger Picture Link to heading
Moving to Infrastructure as Code for our identity platform was transformative, but it required learning to work with imperfect tools. The Terraform provider ecosystem is powerful but comes with risks, especially for pre-v1.0 providers.
The incident with client recreation taught us that infrastructure as code isn’t just about automation - it’s about understanding the implications of every change and protecting the resources that cannot be recreated.
Six months later, we had a robust identity infrastructure deployment process that could recreate any environment from code, but we also had deep respect for the power and risks of infrastructure automation.
But that’s a story for another chapter.
What experiences have you had with third-party Terraform providers? How do you protect critical resources from accidental recreation?