Automating Terraform provider upgrades with GitHub Agentic Workflows

Terraform provider upgrades often turn into a manual research exercise.

You check the registry, read changelogs, scan the codebase for deprecated resources, update the provider version, add moved blocks, run a plan, and hope nothing breaks.

Using GitHub Agentic Workflows, a Terraform MCP server, and reusable agents and skills, much of that preparation can be automated safely.

The workflow can:

  • detect new provider versions
  • analyse Terraform code for deprecated resources
  • apply safe migrations
  • generate upgrade documentation
  • open a draft pull request for review

Instead of engineers spending time preparing upgrades, they review the PR.

The Real Problem with Terraform Provider Upgrades

Provider upgrades always look trivial at first.

Updating a version constraint takes seconds. Understanding the impact of that change on a real infrastructure codebase is where the work begins.

In practice, provider upgrades usually involve a surprising amount of research:

  • reading provider changelogs
  • checking upgrade guides
  • identifying removed resources
  • spotting deprecated arguments
  • verifying migration steps

Most teams handle this manually.

The upgrade eventually happens, but rarely immediately. Provider versions slowly drift behind.

It’s not usually intentional. The project starts clean – versions pinned, pipelines working, deployments stable. Over time the provider ecosystem moves forward while the codebase stays still.

Then one day someone decides it’s time to upgrade.

That’s when a simple task becomes something more involved.

A typical upgrade process ends up looking something like this:

  1. Check the Terraform registry
  2. Review changelogs and upgrade notes
  3. Identify breaking changes
  4. Update Terraform code
  5. Add moved blocks for renamed resources
  6. Run terraform plan and validate
  7. Write migration notes
  8. Create a pull request

Even when everything goes smoothly, that process usually takes 30 to 60 minutes.

And that’s assuming nothing unexpected appears in the plan output.

If you’ve upgraded the AzureRM provider before, you’ll know how quickly things can snowball. Miss one breaking change and the next pipeline run becomes a debugging session.

A Different Approach: Agentic Workflows

Recently I started experimenting with GitHub Agentic Workflows to automate parts of this process.

The goal wasn’t full automation. Infrastructure changes still need review. Instead the aim was to remove the repetitive preparation work while keeping governance and visibility intact.

Agentic Workflows extend GitHub Actions by allowing workflow logic to be described in Markdown. More importantly, workflows can delegate tasks to agents and external tools.

That changes what a workflow can realistically do.

Instead of executing a fixed set of steps, the workflow can:

  • analyse repository code
  • query provider metadata
  • retrieve upgrade documentation
  • reason about breaking changes
  • decide whether to propose changes or raise an issue

All while still respecting the normal pull request workflow.

For Terraform provider upgrades, this turns out to be a good fit.

GitHub Workflow: how it works

The upgrade workflow itself lives in a Markdown file:

.github/workflows/terraform-upgrade.md

It defines:

  • triggers
  • permissions
  • tool integrations
  • safety controls
  • imported agents and skills

At a high level the workflow process looks like this:

  1. Workflow runs weekly or manually
  2. Terraform MCP server analyses provider versions
  3. Imported Terraform agent evaluates upgrade compatibility
  4. AI updates code if safe
  5. Documentation is generated
  6. A draft pull request is created

Nothing merges automatically.

Everything still goes through code review.

At a high level the workflow looks like this:

---
on:
  workflow_dispatch:
    inputs:
      terraform_version:
        description: 'Target Terraform version to upgrade to'
        required: false
        type: string
      provider_upgrade:
        description: 'Upgrade Terraform providers'
        required: false
        type: boolean
        default: "true"
      create_pr:
        description: 'Create a pull request with changes'
        required: false
        type: boolean
        default: "true"

engine: copilot

permissions:
  pull-requests: read
  issues: read
  contents: read

tools:
  github:
    toolsets: [default, pull_requests]

mcp-servers:
  terraform:
    container: "hashicorp/terraform-mcp-server:0.3.3"
    env:
      TF_LOG: "INFO"
    allowed: ["*"]
  

safe-outputs:
  create-pull-request:
    title-prefix: "[terraform] "
    labels: [terraform, automation]
    draft: true
  create-issue:
    title-prefix: "[terraform] "
    labels: [terraform, recommendations]
  update-issue:

network:
  allowed:
    - defaults
    - github
    - terraform

imports:
  - thomast1906/github-copilot-agent-skills/.github/agents/terraform-provider-upgrade.agent.md@main
  - thomast1906/github-copilot-agent-skills/.github/skills/terraform-provider-upgrade/SKILL.md@main
---

# Terraform Upgrade Workflow

Perform safe Terraform and provider upgrades for this repository.

## Workflow Context

**Trigger**: This workflow runs weekly on Mondays at 9 AM UTC, or can be triggered manually.

**Inputs**:
- `terraform_version`: Optional target Terraform version (e.g., "1.7.0"). If not specified, check for latest stable version.
- `provider_upgrade`: Whether to upgrade providers (default: true)
- `create_pr`: Whether to create a pull request with changes (default: true). If false, create an issue with recommendations instead.

## Your Task

Perform Terraform upgrades according to your imported expertise and methodology. Use the workflow inputs to guide your work.

**Important:** Only modify files under the `terraform/` directory. Do NOT create or modify any files under `.github/`.

In addition to version and breaking-change analysis, you must always perform a deprecation scan.

### Required deprecation handling

1. Detect deprecated Terraform/provider arguments, blocks, and resources used in this repository.
2. Attempt safe auto-fixes for deprecations when there is a clear, low-risk replacement.
3. If an auto-fix is not safe or requires human input, keep the current code and document the recommendation.

### PR/Issue output requirements

When complete, create a pull request (if `create_pr` is true) or an issue (if false) with findings and changes.

Your output must always include a **Deprecations** section with:
- Deprecated item found (argument/resource/block)
- File and resource context
- Recommended replacement
- Auto-fix status: `applied` or `manual-action-required`
- Reason when not auto-fixed

If no deprecations are found, explicitly state: **"No deprecated arguments/resources detected."**

The markdown body explains the upgrade methodology, but the real intelligence comes from the imported agent & skill.

Screenshot showing GitHub Agentic workflow for Terraform upgrade

Terraform MCP Server

A key part of this workflow is the Terraform MCP server.

Instead of relying on static knowledge or scraping documentation, the workflow can query real Terraform data during execution.

The MCP server provides capabilities such as:

  • reading provider versions from Terraform files
  • checking the latest versions in the Terraform registry
  • retrieving upgrade guides
  • analysing breaking changes

Effectively the workflow has access to the same information an engineer would normally gather manually.

Architecture Overview

The key design choice in this setup is separating orchestration from expertise.

The GitHub workflow itself stays intentionally thin. Its job is simply to coordinate the process and invoke the Terraform upgrade agent.

All Terraform-specific knowledge lives elsewhere:

  • A Terraform upgrade agent
  • A reusable Terraform upgrade skill
  • The Terraform MCP server providing infrastructure awareness

Instead of embedding hundreds of lines of instructions directly in the workflow, the upgrade logic lives in a reusable skill repository. If the upgrade methodology improves later, the agent or skill can be updated once and every workflow that depends on it benefits.

For platform teams managing dozens or hundreds of Terraform repositories, this separation becomes extremely valuable.

Reusable Agents and Skills

One of the most powerful parts of this setup is the ability to import agents and skills into the workflow.

Instead of embedding all Terraform upgrade knowledge inside the workflow itself, the workflow imports an expert agent from a dedicated repository:

https://github.com/thomast1906/github-copilot-agent-skills

Inside that repository are two key components:

Terraform Provider Upgrade Agent

https://github.com/thomast1906/github-copilot-agent-skills/blob/main/.github/agents/terraform-provider-upgrade.agent.md

This agent contains guidance on:

  • upgrade strategies
  • provider compatibility
  • resource migrations
  • state migration patterns

Terraform Upgrade Skill

https://github.com/thomast1906/github-copilot-agent-skills/blob/main/.github/agents/terraform-provider-upgrade.agent.md

The skill defines the methodology used during upgrades, including:

  • detecting deprecated arguments
  • identifying safe migrations
  • generating moved blocks
  • producing upgrade documentation

The workflow coordinates the process, but the agent and skill contain the Terraform expertise.

For platform teams managing multiple Terraform repositories, this becomes particularly useful. Improvements to the upgrade methodology can be made once and reused everywhere.

What the Workflow Execution Actually Looks Like

One interesting aspect of agent-driven workflows is the execution logs.

Instead of simple step output, you see the reasoning process as the agent analyses the repository.

The workflow begins by identifying Terraform files:

Find all Terraform files
$ find ... -name "*.tf"

✓ List directory terraform (14 files)

✓ Read terraform/versions.tf (23 lines)
✓ Read terraform/main.tf (66 lines)
✓ Read terraform/sql.tf (59 lines)
✓ Read terraform/app.tf (42 lines)
✓ Read terraform/network.tf (45 lines)

Next it searches for deprecated resources:

Perfect! I've identified the issue. The current configuration uses deprecated `azurerm_sql_*` resources that have been removed in azurerm v4.x. Let me now search for the new resource documentation and create a comprehensive upgrade plan:
● terraform-search_providers
└ Available Documentation (top matches) for resources in Terraform provider has...
● terraform-search_providers
└ Available Documentation (top matches) for resources in Terraform provider has...
● terraform-search_providers
└ Available Documentation (top matches) for resources in Terraform provider has...
● terraform-get_provider_details
└ {"agentInstructions":"The payload was too large for an MCP response. The comp...
● terraform-get_provider_details

Then the agent queries the Terraform registry:

Latest provider versions detected:

Random: 3.8.1
AzureRM: 4.63.0

Once the versions are known, the agent retrieves documentation for the upgrade path.

Now let me get the upgrade guide for AzureRM provider from 3.x to 4.x

At this point the agent has enough context to analyse the upgrade.

Real Example: AzureRM Provider Upgrade

I tested this workflow on a repository using:

azurerm ~> 3.75.0

The latest available version was:

4.63.0

The workflow detected several deprecated resources:

azurerm_sql_server
azurerm_sql_database
azurerm_sql_firewall_rule

These resources were removed in AzureRM 4.x and replaced with MSSQL equivalents.

Instead of simply flagging the issue, the agent automatically migrated them.

Example:

Before

resource "azurerm_sql_server" "example" {
name = local.sql_server_name
}

After

resource "azurerm_mssql_server" "example" {
name = local.sql_server_name
}moved {
from = azurerm_sql_server.example
to = azurerm_mssql_server.example
}

The same migration pattern was applied to the database and firewall rule resources.

The key detail here is the moved block.

This allows Terraform to update state references automatically instead of destroying and recreating infrastructure.

For database infrastructure that’s the difference between a safe upgrade and downtime.

What This Actually Saves

Before automating the process, a typical upgrade usually involved something like this:

StepTime
Reviewing changelog10-15 min
Identifying breaking changes~10 min
Updating Terraform code~10 min
Writing migrations~10 min
Validation and plan checks5-10 min

That easily adds up to 30-60 minutes per upgrade.

With the workflow in place, the process changes.

The workflow performs the research and prepares the migration. Engineers review the generated pull request.

In practice the human effort becomes roughly five minutes of review.

Pull Request Generated by the Agentic Workflow

The workflow produced a draft pull request which you can view here:

https://github.com/thomast1906/github-agentic-workflows-terraform-upgrade/pull/7

The PR contains:

  • provider version updates
  • resource migrations
  • moved blocks
  • documentation
  • validation steps

It also created a migration document automatically.

Example snippet from the generated documentation:

Terraform Provider Upgrade: AzureRM v3.75.0 → v4.63.0
Date: 2026-03-10
Upgrade Type: Major Version (Breaking Changes)

Summary
Upgraded HashiCorp AzureRM provider from ~> 3.75.0 to ~> 4.63.0 and Random provider from ~> 3.5.0 to ~> 3.8.0. This major version upgrade included automatic migration of deprecated SQL resources to their modern MSSQL equivalents with proper state migration using moved blocks.

What Changed
Version Updates
AzureRM Provider: ~> 3.75.0 → ~> 4.63.0 (major version upgrade)
Random Provider: ~> 3.5.0 → ~> 3.8.0 (minor version upgrade, non-breaking)
Resource Migrations Applied
Migrated azurerm_sql_server → azurerm_mssql_server
Migrated azurerm_sql_database → azurerm_mssql_database
Migrated azurerm_sql_firewall_rule → azurerm_mssql_firewall_rule
Breaking Changes Handled
✅ 1. azurerm_sql_server → azurerm_mssql_server
Files Modified: terraform/sql.tf

Changes Applied:

Updated resource type from azurerm_sql_server to azurerm_mssql_server
Added moved block for automatic state migration
All arguments remain compatible (no schema changes required)
Argument Mappings:

✅ name - No change
✅ resource_group_name - No change
✅ location - No change
✅ version - No change
✅ administrator_login - No change
✅ administrator_login_password - No change
✅ identity block - No change
✅ tags - No change
Default Values: No new default value changes that affect existing behavior.

Documentation: azurerm_mssql_server

✅ 2. azurerm_sql_database → azurerm_mssql_database
Files Modified: terraform/sql.tf

Changes Applied:

Updated resource type from azurerm_sql_database to azurerm_mssql_database
Replaced resource_group_name, location, server_name with server_id
Mapped edition="Basic" + requested_service_objective_name="Basic" to sku_name="Basic"
Added moved block for automatic state migration
Argument Mappings:

✅ name → name (no change)
❌ resource_group_name (removed) + location (removed) + server_name (removed)
✅ server_id (new) - Now uses: azurerm_mssql_server.example.id
❌ edition (removed) + requested_service_objective_name (removed)
✅ sku_name (new) - Mapped from edition + requested_service_objective_name → "Basic"
✅ tags - No change
SKU Mapping:

Old: edition = "Basic" + requested_service_objective_name = "Basic"
New: sku_name = "Basic"
Default Values: No new default value changes that affect existing behavior.

Documentation: azurerm_mssql_database

You can see the generated markdown “Terraform Provider Upgrade: AzureRM v3.75.0 → v4.63.0” in the PR here:

https://github.com/thomast1906/github-agentic-workflows-terraform-upgrade/blob/8810c46e6099e5d86b3d63e7ddd7d5ec56b5623e/TERRAFORM_UPGRADE_BREAKING_CHANGES.md

The workflow also performs a full deprecation scan, reporting whether deprecated arguments were auto-fixed or require manual intervention.

Safety Controls with Safe Outputs

AI modifying infrastructure code can sound risky.

GitHub Agentic Workflows address this through Safe Outputs.

The workflow explicitly defines which actions the agent is allowed to perform.

For example:

safe-outputs:
create-pull-request:
title-prefix: "[terraform] "
labels: [terraform, automation]
draft: true
create-issue:
title-prefix: "[terraform] "
labels: [terraform, recommendations]
update-issue:

This ensures:

  • Pull requests are always drafts
  • Human review is required
  • Nothing is merged automatically

If the upgrade is too complex, the workflow simply creates an issue with recommendations instead.

Why This Matters for Platform Engineering Teams

Infrastructure maintenance is a constant part of platform engineering.

Provider upgrades are necessary but rarely prioritised.

The result is provider drift.

Eventually a repository ends up several major versions behind and the upgrade becomes much more complex.

Automating the analysis and preparation of upgrades changes that dynamic.

Instead of reactive upgrades, teams can:

  • detect provider updates early
  • review proposed upgrades via pull requests
  • upgrade incrementally
  • keep infrastructure code closer to current versions

The workflow effectively prepares the work.

Engineers review and approve it.

A Note on Safety

This obviously doesn’t remove the need to review changes carefully.

Provider upgrades can introduce subtle behavioural differences, particularly around default values or resource lifecycle behaviour.

The workflow prepares the upgrade and migrations, but human review remains essential.

Automation like this works best when it reduces preparation work while keeping engineers in the approval loop.

Practical Takeaways

A few lessons from building and testing this workflow:

  • Separate workflow orchestration from domain expertise
  • Use imported agents and skills for reusable engineering knowledge
  • Always generate documentation as part of upgrades
  • Keep upgrades behind pull requests
  • Run upgrade analysis on a schedule to prevent provider drift

The most useful pattern I’ve found is letting AI prepare infrastructure changes, while humans remain responsible for validation and approval

Final Thoughts

Terraform provider upgrades aren’t glamorous work, but they matter.

Left unmanaged, provider drift slowly increases risk across infrastructure codebases.

What I like about this workflow is that it doesn’t try to replace engineering judgement.

Instead it automates the repetitive parts of the process:

  • checking versions
  • analysing changelogs
  • identifying deprecated resources
  • preparing migrations
  • generating documentation

The heavy lifting is handled by the workflow, the Terraform upgrade agent, and the upgrade skill.

Engineers review the result.

In practice that turns a 45-minute upgrade task into a five-minute pull request review.

For something every infrastructure team has to do regularly, that’s a pretty good trade-off.

I do recommend you checking out my other blogs regarding GitHub Agentic Workflows:

Leave a Reply

Discover more from Thomas Thornton Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from Thomas Thornton Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading