Azure Monitor – Taking the Logging and alerting deployment from ARM To PowerShell

Azure Monitor, an all in one comprehensive solution for collecting, analyzing and acting on your cloud and/or on-premises environment.

Along with this, we use both Log Analytics & Application Insights for collecting and analyzing telemetry data that allows us to view real-time metrics of how the application/environment is currently being handled. Application Insights also helps to proactively identify issues that may cause stress to the Application.

Within this blog, I will give an overview of Azure Monitor along with how we swapped our Azure Monitor development from ARM to PowerShell along with some painpoints from the original deployment with ARM and how the move to PowerShell mitigated this.

Image Ref: docs.microsoft.com

What are the types of data that is collected?

The data collected by Azure Monitor is one of two types, metrics or logs. In short, Metrics are numerical type values that show reference to a system at a point in time. Whereas, logs are analyzed with user created queries to view the data they require.

Metrics are viewed using Metrics Explorer:

Image Ref: docs.microsoft.com

Logs are viewed using Logs Analytics

Image Ref: docs.microsoft.com

What types of data do we collect?

Within our environment we collect data from a number of resources to allow us to display a visual representation of the environment along with the ability to alert on abnormal metrics and various activities, examples of data include:-

  • Guest OS
  • Azure Resource
  • Application
  • Azure Subscription

What do we alert on?

Numerous alerts have been created from the logs/metrics we ingest, this includes (only a small sample):-

Metric Alerts (Log Analytics)

  • Barracuda memory/CPU
  • Metrics alerts (Azure monitor)
  • Burstable CPU credits

Log Search Alerts

  • Domain controller events (including new domain user, domain admin changes)
  • AD Sync events
  • Virtual Machine (heartbeat, configuration changes, failed logins, vulnerability/rule alerts etc)
  • Microsoft Antimalware events (malware detected/removed)
  • RDP events (login out of hours)
  • VPN events (login out of hours)

Azure Activity Logs

  • Various alerts re. changing security/authorisation settings

Classic Alerts

  • Network Security Group changes
  • Azure backups
  • Azure service health

How was it originally deployed?

We run our deployments in an Azure DevOps pipeline, with 70% of the above combined into an ARM template, there was 950+ lines of code implemented into one ARM template using numerous references including:-

  • Arrays of variables for alerts
  • Multiple copy loops to deploy resources
  • Numerous concat[] within the code for various operations
  • Idempotent code so can be reused and manipulated at run time

Lets look at some of the code samples, example of a snippet of the array for creating an alert with a metric trigger – below shows the AV check alert

"alerts_withmetrictrigger": [
      {
        "Name": "av_check",
        "SearchQuery": "ProtectionStatus | where ProtectionStatusRank !in (150, 550) | summarize (TimeGenerated, ProtectionStatusRank) = argmax(TimeGenerated, ProtectionStatusRank) by Computer",
        "SearchCategory": "Alert",
        "DisplayName": "AV Check",
        "Description": "AV Check",
        "Severity": "critical",
        "ThresholdOperator": "gt",
        "ThresholdValue": 0,
        "Schedule": {
          "Interval": 15,
          "TimeSpan": 60
        },
        "MetricsTrigger": {
          "TriggerCondition": "Total",
          "Operator": "gt",
          "Value": 15
        },
        "ThrottleMinutes": 0,
        "AzNsNotification": {
          "GroupIds": [ "[variables('actiongroup_resourceid')]" ],
          "CustomEmailSubject": "AV Check"
        }
      }
    ]

Looking at the resource that would deploy this array

{
      "copy": {
        "name": "savedsearch_copy_withmetrictrigger",
        "count": "[length(variables('alerts_withmetrictrigger'))]"
      },
      "name": "[concat(parameters('workspace_name'),'/',variables('alerts_withmetrictrigger')[copyIndex('savedsearch_copy_withmetrictrigger')].Name,'_savedsearch')]",
      "type": "Microsoft.OperationalInsights/workspaces/savedSearches",
      "apiVersion": "[variables('workspace_apiversion')]",
      "dependsOn": [ "[variables('workspace_resourceid')]" ],
      "properties": {
        "etag": "*",
        "query": "[variables('alerts_withmetrictrigger')[copyIndex('savedsearch_copy_withmetrictrigger')].SearchQuery]",
        "displayName": "[variables('alerts_withmetrictrigger')[copyIndex('savedsearch_copy_withmetrictrigger')].DisplayName]",
        "category": "[variables('alerts_withmetrictrigger')[copyIndex('savedsearch_copy_withmetrictrigger')].SearchCategory]"
      }
    }

Along with this ARM deployment, some alerting was manually added due to unavailability of the time to code this into an ARM template, this included:

Metric Alerts (Log Analytics)

  • Barracuda memory/CPU
  • Metrics alerts (Azure monitor)
  • Burstable CPU credits

Classic Alerts

  • Network Security Group changes
  • Azure backups
  • Azure service health

What was the issue?

The ARM template has built up into 950+ lines of json, this over time would potentially cause an issue due to having to trace through the code for multiple references and/or arrays to review.

Interesting? Along with this, the ARM template deployment of Log Analytics would not incremental redeploy after being initially deploy – seems to be a bug with this! So within our Pipeline we had a try/catch for this, to allow the use of Log Analytics Workspace ID & Key.

Any new alerts we were adding was manual and then a retrofit to the original code for future new deployments – wasn’t cool and not DevOps

6 major changes wanted:

  • A more readable format for alerts to be viewed and updated
  • Ability to redeploy LogAnalytics X times
  • Deploy alerts without redeploying LogAnalytics template
  • Automate the manual changes I had mentioned above (Class Alerts & additional Metric Alerts)
  • Idempotent code
  • Lessen the amount of code used, wanting to reduce the current 950+ lines of json

Noting, our Pipeline deployment is heavily PowerShell focused, lets see what we can do with the above changes to influence our heavy use of PowerShell 😊

Time for change…. The DevOps lifecycle – it’s all about “continuous”

Reviewing the ability PowerShell modules, from the initial deploy of our ARM Log Analytics & alerting template to now – there has been a number of modules released to assist in the redevelopment, what are they?

Az.OperationalInsights – For the Log Analytics Deployment
Az.ApplicationInsights – For the Azure Monitor piece (alerting)

The 6 changes wanted, were they succeeded?

Yes! Lets look at how each matched

Ability to redeploy LogAnalytics X times, Deploy alerts without redeploying LogAnalytics template and automate any manual changes previously mentioned

From one ARM template, the deployment is now spilt into three PowerShell tasks – a more streamlined process with the ability to deploy either of the three tasks simultaneously or even taking a snippet of the task and running that, tasks are split into:-

Task 1:- Log Analytics:- Log Analytics Deployment
o Creates Log Analytics WorkSpace
o Adds Solutions to WorkSpace from a PowerShell array
o Add Windows Events from PowerShell array
o Enable IIS Log Collection
o Enable Linux Syslog Collection
o Configures Storage Insights
o Creates Action Group

Task 2:- Monitor Alerts:- Alerting configuration Deployment
o Creates Log Analytics Alerts
o Creates Log Metric Alerts
o Create VM Metric Alerts

Task 3:- Service Health Notifications:-
Deploy Service Notification alerts along with multiple resource type changes. I have blogged how to automate Service Health Notifications along with Enabling Alerting for Azure Recovery Services Vault

Any of the above tasks and/or subtasks can be ran simultaneous! This also completed required change Automate the manual changes I had mentioned above (Class Alerts & additional Metric Alerts)

A more readable format for alerts to be viewed and updated

The alerts themselves were moved to a Yaml file – looks beautiful and easier to read/and or update.

Example below, Alerts is the Key with Name being the Mapping – with each name being unique within Alerts

Looking at the old ARM template equivelant, there is a huge difference in layout and from a readability perspective.

"alerts_withmetrictrigger": [​
      {​
        "Name": "av_check",​
        "SearchQuery": "ProtectionStatus | where ProtectionStatusRank !in (150, 550) | summarize (TimeGenerated, ProtectionStatusRank) = argmax(TimeGenerated, ProtectionStatusRank) by Computer",​
        "SearchCategory": "Alert",​
        "DisplayName": "AV Check",​
        "Description": "AV Check",​
        "Severity": "critical",​
        "ThresholdOperator": "gt",​
        "ThresholdValue": 0,​
        "Schedule": {​
          "Interval": 15,​
          "TimeSpan": 60​
        },​
        "MetricsTrigger": {​
          "TriggerCondition": "Total",​
          "Operator": "gt",​
          "Value": 15​
        },​
        "ThrottleMinutes": 0,​
        "AzNsNotification": {​
          "GroupIds": [ "[variables('actiongroup_resourceid')]" ],​
          "CustomEmailSubject": "AV Check"​
        }​
      }​
    ]​

As you can see, YAML uses indentation to represent document structure (as opposed to JSON, which uses brackets and braces). 

Using PowerShell, manipulation of the data is easier and most definitely during runtime can be changed and handled multiple ways! Lets have a look at an example

Loaded into PowerShell at run time using module – ConvertFrom-Yaml

Example below shows some YAML being loaded into PowerShell

$yaml = (Get-Content -Path $AlertsYAMLPath) -join "`n" | ConvertFrom-Yaml​

Now the fun begins, we can manipulate line by line, example below shows the Alert Name being referenced

What does this mean?

Can work with the data, alerting made easier – ability to select what I want from the YAML being ingested, as mentioned can be manipulated – covered later in making the code Idempotent

Looking at this in action, you can see multiple references of $alert.xxxxxx

foreach ($alert in $yaml.Alerts) {

        Write-Output "Configuring Alert ($($alert.Name))..."

        $source = New-AzScheduledQueryRuleSource -Query $alert.query -DataSourceId $la_workspace.ResourceId -WarningAction Ignore
        $schedule = New-AzScheduledQueryRuleSchedule -FrequencyInMinutes $alert.FrequencyInMinutes -TimeWindowInMinutes $alert.TimeWindowInMinutes -WarningAction Ignore
        $trigger_condition = New-AzScheduledQueryRuleTriggerCondition -ThresholdOperator $alert.ThresholdOperator -Threshold $alert.Threshold -WarningAction Ignore
        $azns_actiongroup = New-AzScheduledQueryRuleAznsActionGroup -EmailSubject $alert.EmailSubject -ActionGroup $actiongroup -WarningAction Ignore
        $action = New-AzScheduledQueryRuleAlertingAction -AznsAction $azns_actiongroup -Severity $alert.Severity -Trigger $trigger_condition -WarningAction Ignore
        $action.AznsAction.ActionGroup[0] = $actiongroup.Id

        New-AzScheduledQueryRule `
            -ResourceGroupName $la_workspace.ResourceGroupName `
            -Location $la_workspace.Location `
            -Action $action `
            -Enabled $true `
            -Description $alert.Description `
            -Schedule $schedule `
            -Source $source `
            -Name $alert.Name `
            -WarningAction Ignore | Out-Null

}

Idempotent Code

Idempotent code, everyone loves code that you can reuse easily – saves time, cost and its just super handy!

Lets look at the query, taken from the YAML input, it’s a Log Analytics query for RDP Logins to our Jumphost Virtual Machines

Notice the % highlighted in red? Within each is a variable name that we have referenced in the environment variables.

During runtime quite a number of environment variables are loaded in, including Virtual machine names, IP addressing, Domain names etc

So each %Variable_Name% is an environment variable reference, each new environment deployed will have the same variable names but different references

Idempotent, lets go!

A Little RegEx magic shown in a Function, we take the variable_name and replace it with the actual variable during the Loading of YAML into the runtime PowerShell

Function Variable-Replace {​
    param([string]$Line)​
    $result = $Line​
    $result = [Regex]::Replace($result, '%([^!]*?)%', { param($match) (Get-Item "Env:\$($match.Groups[1])").Value })​
    $result = [Regex]::Replace($result, '%!(.*?)%', { param($match) (Get-Variable $match.Groups[1].Value).Value })​
    Return $result​
}​

# Load YAML​
Try {​
    $all_vms = (Get-AzVm -ResourceGroupName $env:main_resource_group).Name -join ','​
    $yaml = (Get-Content -Path $AlertsYAMLPath | %{Variable-Replace $_}) -join "`n" | ConvertFrom-Yaml​
} Catch {​
    Throw "Could not load YAML. Error: $($_.Exception.Message)"​
}​

So when the Alert code is ran, it will be using the run time environment variable – pretty cool!

Lessen the amount of code used, wanting to reduce the current 950+ lines of json

The last of the changes, as mentioned previously the ARM template became quite bloated ended with 950+ lines of json to look at..

With the new refactored PowerShell scripts, around 300 times of code – including the ability to run snippets of the code..

Example shown between creating an alert with PowerShell with a % of the code for ARM, some difference!

The end result…

  • A streamlined deployment
  • Deployment of an area now achievable (mainly alerting)
  • Better code – everyone loves good looking code!?

Keep up to date with what is out there in relation to Azure & Automation, continuously updating!

2 thoughts on “Azure Monitor – Taking the Logging and alerting deployment from ARM To PowerShell

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.