Azure – A Cloud Guy

A Cloud Architect’s Guide to Azure SQL Service Selection and Pricing

September 29, 2025September 14, 2025 by Kumaran

For cloud architects, designing the right database platform in Azure is never just a question of “where should SQL Server live?” The real challenge lies in matching the correct Azure SQL deployment option with workload requirements while keeping cost predictable and optimized.

Azure SQL offers multiple deployment models, pricing levers, and architectural trade-offs. This guide unpacks those details, giving architects a framework to design solutions that balance performance, availability, and cost.

Azure SQL Deployment Models: Choosing the Right Service

Azure SQL exists in three primary forms, each targeting different modernization or migration scenarios:

Azure SQL Database (Single Database or Elastic Pools)

What it is

Fully managed database service.
Ideal for new apps, SaaS solutions, or when you don’t need full SQL Server feature parity.
Target Workloads: Modern cloud-native apps, SaaS multi-tenant apps, microservices.
Service Nature: Microsoft manages patching, backups, HA, and DR.
Granularity: Provisioned as a single logical database or grouped into elastic pools to share resources across multiple databases.
Architectural Fit: Best when apps can tolerate some redesign and do not depend on server-level features (SQL Agent, cross-database queries).

Pros

No patching, backups, or HA configuration needed → Microsoft handles it.
Automatic scalability with options like serverless compute.
High availability built-in (99.99% SLA).
Geo-replication is a couple of clicks away.
Elastic pools save costs when workloads vary.
Lowest administrative overhead → focus shifts from infrastructure to schema design and performance.

Cons

Feature gaps vs. on-prem SQL (no SQL Agent, limited cross-database queries, no CLR).
Limited control (can’t access OS, registry, or certain trace flags).
Migration friction → legacy apps that rely on SQL Server features might break.

Azure SQL Managed Instance (MI)

What It Is

Target Workloads: Enterprise apps requiring high SQL Server compatibility.
Service Nature: Managed PaaS with near 100% SQL Server feature parity.
Key Features: SQL Agent, linked servers, cross-database queries, CLR, Service Broker.
Architectural Fit: Lift-and-shift scenarios from on-prem SQL Server without major refactoring.

Pros

Instance-level features supported (SQL Agent, cross-database queries, Service Broker, CLR, linked servers).
Easier migration → lift-and-shift many existing apps.
Same PaaS perks as Azure SQL Database → backups, patching, HA, geo-replication, and scaling.
Native VNET support → better isolation and security for enterprises.

Cons

Higher cost than Azure SQL Database.
Longer deployment times (can take hours to provision).
Less flexibility than SQL on VM (e.g., OS-level customization not possible).
Scaling isn’t instantaneous – downtime may be needed to resize.

SQL Server on Azure Virtual Machines (IaaS)

What It Is

Target Workloads: Legacy workloads, OS-level dependencies, or when granular control over patching/registry/config is required.
Service Nature: IaaS: full control of the VM OS, SQL Server, and patching.
Architectural Fit: Transitional strategy or last resort when dependencies block PaaS adoption.

Pros

Full control over SQL Server and OS (registry edits, custom trace flags, SSRS/SSIS, third-party agents).
Feature parity with on-prem SQL → nothing is missing.
Migration simplicity → true lift-and-shift; just move your VM.
Choice of HA/DR strategy (AlwaysOn AG, Failover Cluster, log shipping).

Cons

You manage everything → patching, backups, HA, DR, monitoring.
High operational overhead → DBAs become sysadmins again.
Scaling is manual and disruptive (resize VM, manage storage IOPS, etc.).
More expensive in the long run when you include ops and licensing overhead.

Tip: Start with a PaaS-first mindset (SQL Database or Managed Instance). Default to VMs only if blockers exist (custom extensions, OS-level agents).

Azure SQL Options: Side-by-Side Comparison

Feature / Factor	Azure SQL Database	Azure SQL Managed Instance	SQL Server on Azure VMs
Service Type	PaaS (database-level)	PaaS (instance-level)	IaaS (full VM + SQL Server)
Admin Overhead	Lowest (Microsoft handles HA, patching, backups)	Low (Microsoft manages infra; you manage schema & jobs)	Highest (you manage OS, SQL, HA/DR, patching, backups)
Compatibility	Limited (no SQL Agent, limited cross-DB)	Near 100% with on-prem SQL	Full parity (exact same as on-prem)
Scalability	Serverless, auto-scale, elastic pools	Scale up/down (some downtime)	Manual VM resize; downtime
HA / DR	Built-in, automatic 99.99% SLA	Built-in, automatic 99.99% SLA	You design and manage AlwaysOn, Failover Clusters, etc.
Networking	Public endpoint or Private Link	Native VNET integration	Full VNET and OS-level networking
Best For	Modern apps, SaaS, dev/test, greenfield workloads	Migrating existing apps needing SQL Agent / cross-DB queries	Legacy apps, full OS control, custom SQL features
Cost	Lowest overall	Moderate (higher than SQL DB)	Highest (infra + admin + licensing)

Deployment Methods, Service Tiers, and Compute Tiers

One reason Azure SQL pricing can feel complex is the number of layers that influence cost. Your bill depends not only on the pricing model (DTU or vCore) but also on the deployment method, service tier, and compute tier you choose.

Deployment Methods

Azure SQL supports three deployment approaches:

Single Database – A dedicated database with isolated resources. Best for applications requiring their own database or where workloads are independent.
Elastic Pool – A group of databases sharing a common resource pool. Ideal when you have multiple small databases with varying usage patterns.

Service Tiers

Service tiers define performance, availability, and resilience:

General Purpose (Standard) – Balanced performance and cost. Uses remote storage and is sufficient for most production workloads.
Business Critical (Premium) – Built on SSD-based local storage with low latency and higher resilience (Always On availability groups). Best for mission-critical OLTP workloads.
Hyperscale – A distributed architecture scaling up to 100 TB. Ideal for massive OLTP or hybrid transactional/analytical systems. Offers fast scaling, rapid backups, and read-scale out replicas.

Compute Tiers

Compute is billed in one of two ways:

Provisioned Compute – Fixed number of vCores allocated and always running. Best for steady, predictable workloads that require consistent performance.
Serverless Compute – Automatically scales with demand and can auto-pause during idle times. Billed per second of usage, making it excellent for dev/test or production workloads with unpredictable demand.

Provisioned vs. Serverless Compute Tiers

Within the General Purpose and Hyperscale service tiers of Azure SQL Database, you have two choices for compute: Provisioned or Serverless. Each model is designed for different workload patterns.

Provisioned Compute

How it works: You pre-allocate a fixed number of vCores that are always running.
Best for: Predictable, steady workloads with consistent CPU demand (e.g., production OLTP systems).
Pricing: You pay for the compute at the provisioned rate, regardless of whether the database is actively used.
Advantages:
- Predictable performance.
- Easier to estimate monthly costs.
- Ideal for 24×7 production workloads.

Serverless Compute

How it works: Compute scales automatically based on workload demand, measured in vCores per second. The database can auto-pause during idle times and resume on activity.
Best for: Bursty, intermittent, or unpredictable workloads (e.g., development/test databases, SaaS apps with variable usage).
Pricing:
- Billed by the second for actual vCore usage.
- You also pay for storage and backup regardless of activity.
Advantages:
- Significant cost savings for infrequently used databases.
- Auto-pause eliminates unnecessary compute costs.
- Ideal for dev/test or seasonal apps.
Considerations:
- Cold-start latency when resuming from auto-pause.
- Less predictable monthly cost if workload patterns fluctuate heavily.

Architectural Guidance

Use Provisioned when your database is part of a business-critical production system that needs consistent throughput.
Use Serverless for non-production or spiky workloads where compute demand is irregular, and cost efficiency outweighs absolute performance consistency.
Both tiers are available in General Purpose (balanced performance) and Hyperscale (for very large databases with auto-scale capability).

Tip: Think of these decisions as layers. Choose the deployment method (Single, Elastic Pool, Managed Instance), the service tier (General Purpose, Business Critical, Hyperscale), and then the compute tier (Provisioned or Serverless). Together, these define your performance profile and cost structure.

Pricing Models in Depth

Azure SQL has two distinct billing approaches:

DTU-Based Model (Simplified)

Bundles compute + storage + I/O into Database Transaction Units.
Pros: Easy to size small workloads, predictable flat pricing.
Cons: Limited transparency and flexibility; not ideal for scaling or enterprise workloads.

vCore-Based Model (Flexibility and Transparency)

Separates compute, memory, and storage, priced per virtual core.
Licensing Advantage: Supports Azure Hybrid Benefit, allowing reuse of existing SQL Server licenses with Software Assurance.
Performance Tiers:
- General Purpose – Balanced, built on remote storage.
- Business Critical – Premium SSD, low latency, built-in HA.
- Hyperscale – Cloud-native architecture with distributed log and page servers; supports up to 100 TB.

Another key difference: in the vCore model, costs are itemized. You pay separately for compute, storage, I/O, and backup retention. This provides transparency and control but also requires more planning.

One of the most valuable features for enterprises is the ability to apply the Azure Hybrid Benefit. If you already own SQL Server licenses with Software Assurance, you can bring them to Azure and save 25–30% on vCore costs. It’s a cloud equivalent of “bring your own license,” similar to how you can bring Windows Server licenses to Azure VMs.

In short: vCores deliver granularity and cost optimization opportunities, making them the preferred model for complex or enterprise workloads that demand control and transparency.

It’s important to note that vCores are not a direct replacement for DTUs. Instead, the vCore model represents the next level of Azure SQL Database, designed for more complex scenarios where organizations require greater control over the allocation of CPU, memory, and storage resources.

Tip: For enterprise-grade deployments, vCore is the default choice. It aligns with on-prem licensing models, offers reserved capacity discounts, and provides transparency into resource allocation.

Azure SQL Deployment Options: Architecture & Pricing Comparison

Service	Service Type	Best-Fit Workloads	Pricing Model(s)	Performance Tiers	Key Strengths	Key Considerations
Azure SQL Database (Single DB / Elastic Pool)	PaaS	Cloud-native apps, SaaS multi-tenant DBs, microservices	DTU (legacy) or vCore	GP, BC, Hyperscale, Serverless	Fully managed; auto-scale; elastic pools for cost efficiency	Limited SQL Server feature set (no SQL Agent, limited cross-DB features)
Azure SQL Managed Instance (MI)	PaaS (near-full SQL Server compatibility)	Lift-and-shift enterprise apps, monolithic DBs needing SQL Agent/cross-DB joins	vCore only	GP, BC, Hyperscale (preview in some regions)	High compatibility; automatic patching/HA; Hybrid Benefit support	Higher base cost vs SQL DB; some network isolation/latency considerations
SQL Server on Azure Virtual Machines (IaaS)	IaaS	Legacy workloads with OS-level dependencies, apps requiring custom agents or specific SQL configs	Pay-as-you-go or Reserved VM pricing + SQL License (License Included or Hybrid Benefit)	Depends on VM size and storage	Full control (OS, registry, SQL versioning); easy migration path	Full responsibility for patching, backups, HA/DR; more ops overhead
Azure SQL Hyperscale (subset of SQL DB/MI)	PaaS (distributed architecture)	Large OLTP or analytical workloads >1TB scaling up to 100TB	vCore	Hyperscale only	Cloud-native architecture; instant scale-out read replicas; rapid auto-grow	Higher cost if workload doesn’t need hyperscale; feature differences vs BC

Cost Components and Architectural Impact

When planning an Azure SQL deployment, architects must account for multiple pricing levers:

Compute – Scales by vCores; reserved capacity can cut cost by up to 33%.
Storage – Performance tier matters (standard vs premium SSDs, hyperscale distributed).
Backup Storage – 7–35 days included; Long Term Retention adds Blob storage charges.
Licensing – Hybrid Benefit reduces compute costs by up to 55%.
HA/DR – Geo-replication and zone redundancy add cost but may be non-negotiable for SLAs.

Service Selection Framework for Architects

When designing, use these guiding questions:

Is workload cloud-native and isolated per app? → Azure SQL Database (single/elastic pool).
Is workload monolithic or requires server-level features? → Managed Instance.
Is workload highly legacy-dependent with OS-level needs? → SQL on VM.
Does workload require >100 TB? → Hyperscale.
Do SLAs demand low latency and multiple replicas? → Business Critical tier.
Is budget licensing-sensitive? → Leverage Hybrid Benefit and reserved capacity.

Cost Optimization Strategies for Architects

Designing for performance is only half the story, true architectural success in Azure SQL also requires cost optimization. Here are the most impactful strategies cloud architects should apply:

Leverage Azure Hybrid Benefit for Licensing Savings
If you own on-premises SQL Server licenses with Software Assurance, apply Azure Hybrid Benefit to reduce vCore costs. This is particularly valuable for production workloads running in provisioned compute tiers, where dedicated resources are always billed.
Right-Size Workloads with Telemetry
Continuously monitor performance using Azure Monitor, Query Performance Insight, and Advisor recommendations. These tools help you identify underutilized resources, high-cost queries, or inefficient scaling patterns so you can adjust compute, storage, or tier accordingly.
Choose the Correct Service Tier for the Workload
Don’t overprovision. Use General Purpose/Standard for most workloads, and only choose Business Critical/Premium or Hyperscale when latency, throughput, or database size requirements justify the added cost.
Use Serverless Compute for Variable or Intermittent Demand
In General Purpose and Hyperscale tiers, serverless compute automatically scales based on demand and can auto-pause when idle. This ensures you only pay for compute when it’s used, making it ideal for dev/test, SaaS with unpredictable workloads, or seasonal applications.
Consolidate Databases with Elastic Pools or Multi-Tenant Models
Running many small, underutilized databases separately increases both compute and administrative overhead. Elastic pools allow databases with varying usage patterns to share a common pool of resources, improving efficiency and lowering costs.
Commit to Reserved Capacity for Predictable Workloads
If you know your workload will run continuously, commit to 1- or 3-year reserved capacity. This can reduce compute costs by up to 30–33% compared to pay-as-you-go.
Optimize Backup and Retention Policies
Automated backups are included, but long-term retention adds extra cost. Define retention policies that match business and compliance needs, and periodically prune old backups to avoid unnecessary storage charges.
Plan Geo-DR Strategically
High availability and geo-replication provide resilience but come at a cost since each replica incurs additional compute. Evaluate whether geo-replicas are required for all workloads, or if backup-based disaster recovery is sufficient for less critical databases.
Leverage Dev/Test Pricing for Non-Production
For development and testing environments, use discounted Dev/Test subscriptions to save significantly on licensing and compute.

Closing Thoughts

For cloud architects, Azure SQL pricing is an architectural decision as much as it is a financial one. Choosing between Database, Managed Instance, or VM-based SQL requires evaluating workload architecture, licensing posture, SLAs, and growth trajectory.

The good news: Microsoft has matured Azure SQL into a flexible platform. With the right design approach, PaaS-first mindset, workload alignment, and proactive cost governance, architects can deliver resilient, performant, and cost-effective SQL services that scale with the business.

Key takeaway: Don’t default to “lift and shift”. Evaluate features, tiers, and pricing levers deliberately. Cost optimization starts at design.

Thank you for stopping by. ✌️

Understanding Azure’s Special IP Address — 168.63.129.16

October 10, 2025September 7, 2025 by Kumaran

In the depths of Azure’s networking fabric exists an unchanging, quietly critical IP address: 168.63.129.16. Though rarely seen in dashboards or diagrams, this address is fundamental to nearly every virtual machine and service deployed in Microsoft Azure. It’s the hidden backbone that connects your workloads to essential Azure platform services, from DNS and DHCP to load balancer health probes and guest agent communication.

This post demystifies 168.63.129.16, explaining its purpose, behavior, use cases, and best practices for architects and developers designing around it.

What Is 168.63.129.16?

168.63.129.16 is a special, reserved virtual public IP address owned by Microsoft and used across all Azure regions and national clouds. It acts as a gateway between Azure virtual machines and the Azure platform, enabling key infrastructure services that keep virtual networks and workloads functioning properly.

Because every customer defines their own private address space in Azure, this IP provides a consistent, platform-controlled endpoint for all system-level communication. It’s not tied to any specific tenant, subscription, or region, it’s a global constant within Azure’s control plane.

Core Functions of 168.63.129.16

The 168.63.129.16 address serves multiple roles simultaneously within Azure’s virtual network infrastructure. Let’s break down its major functions.

1. Azure DHCP and DNS Services

This IP facilitates:

DHCP leases – It allows VMs to obtain dynamic IP addresses within Azure virtual networks.
DNS resolution – It acts as a DNS virtual server, offering name resolution for VMs that do not use a custom DNS server. Azure filters DNS results so that only internal hostnames within your resources are resolved, maintaining network isolation.

💡 Tip: If you prefer to use a custom DNS solution, you can block outbound traffic to this IP on ports 53 (UDP/TCP) or create an outbound NSG rule using the AzurePlatformDNS service tag.

2. Azure Load Balancer Health Probes

The Azure Load Balancer uses 168.63.129.16 to perform health probe checks on backend pool members. These probes determine whether an instance is healthy and eligible to receive traffic.

By default, NSGs include a rule that allows this communication via the AzureLoadBalancer service tag. Blocking this traffic can cause probe failures and remove VMs from load balancer rotation.

3. Azure VM Agent and WireServer Communication

168.63.129.16 also represents the WireServer, a core Azure component that handles communication between the VM Agent and the Azure platform.

Through this channel, the VM Agent:

Signals the VM’s “Ready” state to the Azure Fabric Controller.
Manages VM extensions (installation, updates, and removal).
Sends health and telemetry data.
Exchanges guest heartbeat messages for PaaS roles.

Required Ports

80/TCP – Primary communication for WireServer and metadata queries.
32526/TCP – Used for VM Agent heartbeat and internal signaling.

This communication is not subject to NSGs or user-defined routes, ensuring that Azure management traffic always succeeds. However, these ports must be open locally on the VM for proper function.

4. Instance Metadata Service (IMDS)

The same address also routes requests to the Azure Instance Metadata Service (IMDS) – an API endpoint that provides runtime details about the VM such as SKU, region, networking configuration, and tags.

Example – PowerShell:

Invoke-RestMethod -Headers @{"Metadata"="true"} -Method GET `
-Uri "http://168.63.129.16/metadata/instance?api-version=2021-02-01" | ConvertTo-Json -Depth 64

Example – Linux:

curl -s -H "Metadata:true" --noproxy "*" \
"http://168.63.129.16/metadata/instance?api-version=2021-02-01" | jq

IMDS is used by automation scripts, configuration management tools, and extensions to dynamically adjust VM configurations or retrieve identity tokens.

Security Model and Accessibility

Despite being a public IP, 168.63.129.16 is not routable from the internet.
Only Azure’s internal platform infrastructure can originate traffic from this address.

It behaves as a virtual IP of the host node, meaning it:

Cannot be directly accessed externally.
Bypasses user-defined routes.
Is safe to allow within local firewalls.

Blocking this IP on your VM firewall can cause:

Load balancer health probe failures.
DNS or DHCP issues.
Broken VM Agent or extension installations.
Incorrect guest health reporting.

✅ Best Practice: Always allow outbound communication from VMs to 168.63.129.16 on ports 80/TCP, 32526/TCP, and optionally 53/TCP/UDP if using Azure-provided DNS.

Common Misconceptions

Misconception	Reality
“It’s a public IP, so it’s unsafe.”	It’s public in scope but only Azure can source traffic from it, not the internet.
“I can change or override it.”	The IP is fixed and managed by Azure, not customizable or replaceable.
“NSGs or UDRs can block it.”	Not true. Communication to this IP bypasses NSGs and UDRs at the platform level.
“It’s region-specific.”	It’s identical in all Azure regions, providing a universal endpoint for platform communication.

Design & Troubleshooting Tips for Architects

Always Whitelist 168.63.129.16 in VM Firewalls
Outbound traffic must be allowed to ensure Azure services like IMDS and Load Balancer probes function correctly.
Avoid Using It in Custom Network Routing
Because it’s a host virtual IP, it doesn’t adhere to user-defined routes. Don’t attempt to NAT, proxy, or filter this address in network appliances.
Validate Connectivity for Diagnostics Test-NetConnection -ComputerName 168.63.129.16 -Port 80 Test-NetConnection -ComputerName 168.63.129.16 -Port 32526 Successful responses confirm VM-Agent connectivity.
Be Aware of DNS Dependencies
If using a custom DNS solution, ensure Azure-provided DNS via 168.63.129.16 is properly disabled or redirected.
Monitor Extension Failures
Many extension or guest agent issues trace back to blocked communication with this IP. Always verify connectivity before deeper troubleshooting.

Why This IP Is Special

The beauty of 168.63.129.16 lies in its consistency, isolation, and necessity.
It’s not just an address, it’s the nerve link between every Azure VM and the underlying platform services that keep Azure stable, secure, and self-healing.

From DHCP leases and DNS queries to health checks and metadata services, this single IP silently powers the operations that ensure Azure workloads function predictably and securely.

Next Steps Checklist

Allow outbound access to 168.63.129.16:80, 32526, and 53 if using Azure DNS.
Verify VM Agent health via Test-NetConnection or traceroute.
Don’t modify or route traffic through this IP – treat it as a platform constant.
Use AzurePlatformDNS or AzureLoadBalancer service tags when refining NSG rules.
Educate your security and network teams – blocking this IP breaks essential Azure operations.

Final Thoughts

The 168.63.129.16 address is one of those invisible Azure building blocks that most engineers never think about until something breaks.

Understanding how this IP works helps you architect resilient, compliant, and secure solutions in Azure. Think of it as Azure’s heartbeat: quiet, constant, and indispensable.

Thank you for stopping by. ✌️

Azure Cost Optimization Deep Dive: Real-World Techniques for Smarter Cloud Spend

October 15, 2025August 16, 2025 by Kumaran

If you’ve ever opened your Azure bill and felt a mild heart attack coming on, you’re not alone. Cloud costs can balloon faster than a DevOps sprint backlog. The good news? Azure gives you powerful tools and strategies to rein things in, if you know where to look.

Whether you’re an Azure Administrator, DevOps engineer, or cloud architect, these cost optimization techniques will help you stop wasting money, tighten governance, and keep finance from showing up at your desk with “just a few questions.”

Why Azure Cost Optimization Matters

Let’s start with the “why.” Azure’s pay-as-you-go model is both a blessing and a curse. It offers flexibility and also a thousand ways to quietly overspend.

Financial Efficiency: Every unused VM, oversized disk, or forgotten test environment translates directly into wasted budget.
Resource Utilization: Azure has “infinite” resources, but your credit card doesn’t. Tracking usage ensures you’re not paying for compute that’s just sitting idle.
Governance and Compliance: Cost policies keep your organization’s cloud usage within approved standards and help you survive the next audit with dignity intact.

1. Pick the Right Pricing Model — Because Pay-As-You-Go Isn’t Always a Bargain

Pay-As-You-Go is great for experimentation but expensive for production workloads. Instead, explore these options:

Azure Reservations: Commit for one or three years and save up to 72%. Perfect for steady-state workloads like SQL databases or domain controllers that never sleep.
Spot VMs: Grab unused capacity at up to 90% off, but beware Azure can pull the plug at any time. Ideal for batch jobs, rendering, or other “it’s okay if it fails” tasks.
Azure Hybrid Benefit: Bring your existing Windows Server and SQL Server licenses and save up to 55%. It’s like reusing your own coffee mug at Starbucks, you still get your caffeine fix, but for less money (and a little eco-friendly credit, too).
Savings Plans for Compute: Commit to a steady hourly rate for one or three years for discounts up to 65%, across multiple VM types and regions.

2. Right-Size Everything — Stop Paying for VMs That Could Bench-Press a Truck

Over-provisioning is one of the most common (and costly) cloud sins. Many admins choose high-spec VMs “just in case” and forget to dial them back.

Use Azure Advisor to identify underutilized resources.
Scale down or switch to lower-tier VM series based on actual metrics.
If performance metrics look good after a week, congratulations! you just trimmed your Azure fat.

3. Shut Down or Delete Unused Resources — Not Everything Needs to Stay Running

Old dev environments, orphaned disks, unused storage accounts — they all add up.
Use Azure Cost Management + Billing or Azure Resource Graph to spot these zombie resources and reclaim your budget.

Pro tip: Implement auto-shutdown schedules for non-production VMs. Think of it as “turning off the lights when you leave the room,” but for your cloud.

4. Automate VM Autoscaling — Match Power to Demand

Azure’s autoscaling feature dynamically adjusts your VM count based on load. It’s the cloud equivalent of cruise control — smooth, efficient, and fuel-saving.

Scale up during business hours or traffic spikes.
Scale down when things quiet down.
Combine with Azure Monitor metrics like CPU or memory utilization for precision scaling.

Result: consistent performance, lower bills, and fewer weekend alerts.

5. Tag and Organize Like a Pro — Because “ResourceGroup1” Isn’t Helpful

Tags are your best friend for financial visibility. Use key-value pairs like:

Environment: Production
Department: Finance
Project: ERPModernization

This allows you to allocate costs accurately, automate governance policies, and keep finance from sending cryptic “Who owns this resource?” emails.

Pair this with Azure Policy to enforce tagging standards and prevent untracked sprawl.

6. Use Storage Tiers Wisely — Don’t Store Cold Data in a Hot Tier

Azure Storage offers three main tiers:

Hot: For data you need constantly.
Cool: For infrequently accessed data.
Archive: For long-term retention, super cheap but takes time to retrieve.

Use Azure Blob lifecycle management to automatically move data to cheaper tiers as it ages. Because paying hot-tier prices for archived logs is like buying concert tickets for a show that ended last year.

7. Leverage Azure Dev/Test Pricing — Save Up to 65% on Non-Production Environments

If you’re a Visual Studio subscriber, you get discounted rates for dev/test workloads.
Apply this to VMs, App Service, SQL Database, and even AKS. It’s one of the easiest wins in Azure cost optimization and yet many teams forget to enable it.

8. Monitor Continuously — Don’t Wait for the Bill Shock

Set up Azure Budgets and Alerts to keep tabs on spending. You can:

Define thresholds by subscription or resource group.
Get notified when spending approaches budget limits.
Automate remediation (like shutting down specific VMs) using Azure Logic Apps.

Continuous monitoring is the difference between proactive management and reactive damage control and your CFO will thank you for it.

9. Combine and Conquer — Hybrid Benefit + Reservations = Massive Savings

By stacking Azure Hybrid Benefit with Reservations, you can achieve up to 80% cost reduction compared to pay-as-you-go.
This combo works wonders for long-running workloads like databases or virtual desktops. Just make sure your licensing and terms align before flipping the switch.

10. Regional Pricing Differences — The “Hidden” Discount

Azure pricing isn’t uniform across regions. Running a workload in East US 2 might be 20–25% cheaper than East US.
If compliance allows, choose the cheaper region but always check latency and data residency requirements first.

11. Automate Cost Governance — Let Azure Policy Be Your Budget Bouncer

Use Azure Policy to enforce rules such as:

Restricting VM sizes to approved lists.
Enforcing tagging on new resources.
Blocking expensive SKUs in dev environments.

It’s like having a strict bouncer who refuses entry to anything not on the approved list except your cloud bill will thank you instead of your Friday night plans.

12. Real-Time Cost Anomaly Detection — Catch Surprises Before They Bite

Azure’s Anomaly Detector API can flag unexpected spending spikes, such as a misconfigured script launching 200 VMs at 3 AM. (Don’t laugh, it’s happened. 😐)

Early alerts mean you can fix the problem before it drains your monthly budget faster than a test run gone rogue.

13. Understand Cost Per Unit — The Secret Weapon of FinOps Teams

Don’t just look at total costs; analyze cost per user, per transaction, or per environment.
This “unit economics” approach helps you identify profitable workloads, adjust pricing models, and eliminate waste.

14. The Big Picture: Automate, Audit, Repeat

Azure cost optimization isn’t a one-and-done activity. It’s a continuous cycle of monitoring, analyzing, and fine-tuning.

Use native tools:

Azure Cost Management + Billing
Azure Advisor
Azure Pricing Calculator
Azure Resource Graph
Azure Monitor Workbooks

Together, they give you the full visibility and control needed to run a lean, efficient, and predictable Azure environment.

Final Thoughts: Treat Azure Like a Utility Bill

Think of Azure like your home electricity. You wouldn’t leave the lights, oven, and hairdryer running 24/7, right? (At least, we hope not.)
Similarly, unused or oversized cloud resources quietly burn through budget until you notice usually during end-of-month reconciliation.

So take charge:

Schedule regular cost audits.
Review Azure Advisor recommendations.
Automate shutdowns and budget alerts.
Educate your teams on the real cost of every deployed resource.

Thank you for stopping by. ✌️

Generate Multi-Subscription Azure Cost Reports Using REST API and PowerShell

July 9, 2025June 29, 2025 by Kumaran

Managing cloud costs is like trying to diet at a buffet. Tempting services everywhere, and one bad decision can blow your budget wide open. So, I was tasked for a breakdown of Azure usage across 50+ subscriptions for the month of June, I knew this wasn’t going to be a quick Azure Portal copy-paste job.

Instead, I rolled up my sleeves and built a PowerShell script that uses the Azure REST API to automatically:

Query all accessible subscriptions
Fetch usage-based cost data for a given time range
Export it into a clean Excel report

And I made it smart enough to handle throttling too. Here’s how it all came together.

Goals

Pull Azure cost data from multiple subscriptions
Offer flexible time range selection (this month, last month, custom, etc.)
Authenticate securely with Entra ID (Service Principal)
Export to Excel in a way leadership can digest (bonus points if it opens without errors)

Authentication with Entra ID

I created a Service Principal and assigned it the “Global Billing Reader” role at the billing account level. The script uses the client_credentials flow to authenticate and obtain an access token.

Yes, I temporarily stored the client secret in a plain text variable $clientSecretPlain = 'ENTER_SECRET' because I was still prototyping. Don’t judge me. But for production? Vault it or a managed identity.

Handling Throttling (429 Errors)

Azure’s APIs like to throw shade when you hit them too hard. I added retry logic with exponential backoff and jitter.

PowerShell Script

# Author: Kumaran Alagesan

# Requires: Az CLI, ImportExcel module (Install-Module -Name ImportExcel)
# Authenticate using Entra Application (Service Principal)

$clientId = 'ENTER_APP_ID'
$tenantId = 'ENTER_Tenant_ID'
$clientSecretPlain = 'ENTER_SECRET'

# Get access token using Service Principal
$body = @{
    grant_type    = "client_credentials"
    client_id     = $clientId
    client_secret = $clientSecretPlain
    scope         = "https://management.azure.com/.default"
}
$tokenResponse = Invoke-RestMethod -Method Post -Uri "https://login.microsoftonline.com/$tenantId/oauth2/v2.0/token" -Body $body -ContentType "application/x-www-form-urlencoded"
if (-not $tokenResponse.access_token) {
    Write-Host "Failed to acquire token. Check credentials." -ForegroundColor Red
    exit 1
}
$token = @{ accessToken = $tokenResponse.access_token }


$selection = $null
while (-not $selection) {
    $selection = Read-Host "Select time range: `n1) This month`n2) Last month`n3) This quarter`n4) Last quarter`n5) This year`n6) Last 6 months`n7) Last 12 months`n8) Custom`nEnter number"
    if ($selection -notmatch '^[1-8]$') {
        Write-Host "Invalid selection. Please enter a number from the list (1-8)." -ForegroundColor Yellow
        $selection = $null
    }
}

$today = Get-Date
switch ($selection) {
    '1' { # This month
        $startDate = Get-Date -Year $today.Year -Month $today.Month -Day 1
        $endDate = $today
    }
    '2' { # Last month
        $lastMonth = $today.AddMonths(-1)
        $startDate = Get-Date -Year $lastMonth.Year -Month $lastMonth.Month -Day 1
        $endDate = (Get-Date -Year $lastMonth.Year -Month $lastMonth.Month -Day 1).AddMonths(1).AddDays(-1)
    }
    '3' { # This quarter
        $quarter = [math]::Ceiling($today.Month / 3)
        $startMonth = (($quarter - 1) * 3) + 1
        $startDate = Get-Date -Year $today.Year -Month $startMonth -Day 1
        $endDate = $today
    }
    '4' { # Last quarter
        $currentQuarter = [math]::Ceiling($today.Month / 3)
        if ($currentQuarter -eq 1) {
            $lastQuarterYear = $today.Year - 1
            $lastQuarter = 4
        } else {
            $lastQuarterYear = $today.Year
            $lastQuarter = $currentQuarter - 1
        }
        $startMonth = (($lastQuarter - 1) * 3) + 1
        $startDate = Get-Date -Year $lastQuarterYear -Month $startMonth -Day 1
        $endDate = (Get-Date -Year $lastQuarterYear -Month $startMonth -Day 1).AddMonths(3).AddDays(-1)
    }
    '5' { # This year
        $startDate = Get-Date -Year $today.Year -Month 1 -Day 1
        $endDate = $today
    }
    '6' { # Last 6 months
        $startDate = $today.AddMonths(-5)
        $startDate = Get-Date -Year $startDate.Year -Month $startDate.Month -Day 1
        $endDate = $today
    }
    '7' { # Last 12 months
        $startDate = $today.AddMonths(-11)
        $startDate = Get-Date -Year $startDate.Year -Month $startDate.Month -Day 1
        $endDate = $today
    }
    '8' { # Custom
        $startDate = Read-Host "Enter start date (yyyy-MM-dd)"
        $endDate = Read-Host "Enter end date (yyyy-MM-dd)"
        try {
            $startDate = [datetime]::ParseExact($startDate, 'yyyy-MM-dd', $null)
            $endDate = [datetime]::ParseExact($endDate, 'yyyy-MM-dd', $null)
        } catch {
            Write-Host "Invalid date format. Exiting." -ForegroundColor Red
            exit 1
        }
    }
}

$startDateStr = $startDate.ToString("yyyy-MM-dd")
$endDateStr = $endDate.ToString("yyyy-MM-dd")

# Set headers for REST calls using the service principal token
$headers = @{
    'Authorization' = "Bearer $($token.accessToken)"
    'Content-Type'  = 'application/json'
}

# Get all subscriptions
$subsUrl = "https://management.azure.com/subscriptions?api-version=2020-01-01"
$subscriptions = Invoke-RestMethod -Uri $subsUrl -Headers $headers -Method Get | Select-Object -ExpandProperty value

Write-Host "Fetching cost data for $($subscriptions.Count) subscriptions: " -NoNewline

$totalCost = 0
$results = @()

foreach ($sub in $subscriptions) {
    $costQueryBody = @{
        type       = "Usage"
        timeframe  = "Custom"
    timePeriod = @{
        from = $startDateStr
        to   = $endDateStr
    }
    dataSet    = @{
        granularity = "None"
        aggregation = @{
            totalCost = @{
                name     = "Cost"
                function = "Sum"
            }
        }
    }
} | ConvertTo-Json -Depth 10

    $costUrl = "https://management.azure.com/subscriptions/$($sub.subscriptionId)/providers/Microsoft.CostManagement/query?api-version=2024-08-01"

    $maxRetries = 7
    $retryDelay = 5
    $attempt = 0
    $success = $false

    while (-not $success -and $attempt -lt $maxRetries) {
        try {
            $costData = Invoke-RestMethod -Uri $costUrl -Headers $headers -Method Post -Body $costQueryBody

            $subscriptionCost = 0
            if ($costData.properties.rows -and $costData.properties.rows.Count -gt 0) {
                $subscriptionCost = $costData.properties.rows[0][0]
            }

            $results += [PSCustomObject]@{
                'Subscription Name' = $sub.displayName
                'Total Cost'        = [math]::Round([double]$subscriptionCost, 2)
            }

            $totalCost += $subscriptionCost
            Write-Host "." -NoNewline
            $success = $true
        }
        catch {
            if ($_.Exception.Response.StatusCode.value__ -eq 429 -and $attempt -lt ($maxRetries - 1)) {
                # Add random jitter to delay
                $jitter = Get-Random -Minimum 1 -Maximum 5
                $sleepTime = $retryDelay + $jitter
                Write-Host "`n429 received, retrying in $sleepTime seconds..." -ForegroundColor Yellow
                Start-Sleep -Seconds $sleepTime
                $retryDelay *= 2
                $attempt++
            }
            else {
                Write-Host "x" -NoNewline
                Write-Host "`nError getting cost for subscription $($sub.displayName): $($_.Exception.Message)" -ForegroundColor Red
                $success = $true
            }
        }
    }
}

# Export results to Excel
$excelPath = Join-Path -Path $PSScriptRoot -ChildPath ("AzureCostReport_{0}_{1}.xlsx" -f $startDateStr, $endDateStr)
if ($results.Count -gt 0) {
    # Do not pre-format 'Total Cost' as string; keep as number for Excel formatting

    # Check if file is locked
    $fileLocked = $false
    if (Test-Path $excelPath) {
        try {
            $stream = [System.IO.File]::Open($excelPath, 'Open', 'ReadWrite', 'None')
            $stream.Close()
        } catch {
            $fileLocked = $true
        }
    }
    if ($fileLocked) {
        Write-Host "Excel file is open or locked: $excelPath. Please close it and run the script again." -ForegroundColor Red
    } else {
        $results | Export-Excel -Path $excelPath -WorksheetName 'CostReport' -AutoSize -TableName 'CostSummary' -Title "Azure Cost Report ($startDateStr to $endDateStr)" -TitleBold -ClearSheet
        Write-Host "Excel report saved to: $excelPath"
        # Optionally open the file
        if ($IsWindows) {
            Start-Sleep -Seconds 2
            Invoke-Item $excelPath
        }
    }
}

If you want to email the output as a table in the body to a mailbox, you can replace the ‘Export results to Excel’ section with the code below. Yup! I know Send-MailMessage is obsolete and ideally I’d run this script with in an Azure automation account and set app permissions for the identity to be able to send emails. I’ll cover it in a later post.

# Prepare HTML table for email
if ($results.Count -gt 0) {
    # Add $ symbol to each Total Cost value
    $resultsWithDollar = $results | ForEach-Object {
        $_ | Add-Member -NotePropertyName 'Total Cost ($)' -NotePropertyValue ('$' + [math]::Round([double]$_.('Total Cost'), 2)) -Force
        $_
    }

    $htmlTable = $resultsWithDollar | Select-Object 'Subscription Name', 'Total Cost ($)' | ConvertTo-Html -Property 'Subscription Name', 'Total Cost ($)' -Head "<style>table{border-collapse:collapse;}th,td{border:1px solid #ccc;padding:5px;}</style>" -Title "Azure Cost Report"
    $htmlBody = @"
<h2>Azure Cost Report ($startDateStr to $endDateStr)</h2>
$htmlTable
<p><b>Total Cost (all subscriptions):</b> $([string]::Format('${0:N2}', [math]::Round([double]$totalCost,2)))</p>
<p style='color:gray;font-size:small;'>This is an automatically generated email - Please do not reply.</p>
"@

    # Email parameters (update these as needed)
    $smtpServer = "smtpserver@domain.com"
    $smtpPort = 587
    $from = "alerts@domain.com"
    $to = "emailaddress@domain.com"
    $subject = "Azure Cost Report ($startDateStr to $endDateStr)"

    Send-MailMessage -From $from -To $to -Subject $subject -Body $htmlBody -BodyAsHtml -SmtpServer $smtpServer -Port $smtpPort
    Write-Host "Cost report sent via email to $to"
} else {
    Write-Host "No results to send."
}

What You’ll Get

The final Excel report displays each subscription’s name alongside its total cost for your chosen time period. Whether you’re reviewing it manually or feeding it into FinOps tools, the format is designed for quick analysis and clean presentation.

Practical Applications

Scenario	How It Helps
Automation and scheduling	Supports routine reporting via scheduled tasks or DevOps flows
Multi-subscription environments	Consolidates cost data across departments or teams
Governance and FinOps	Enables proactive budget tracking and reporting

With just a PowerShell script and the Azure Cost Management API, you can unlock instant insights into your cloud spend across all Azure subscriptions. Whether you’re part of a DevOps team, driving FinOps initiatives, or simply managing cloud budgets, this automation makes cost visibility one less thing to worry about.

Lessons Learned

Azure Cost Management API is powerful, but throttling is real.
Microsoft will be retiring the Consumption Usage Details API at some point in the future and does not recommend that you take a new dependency on this API.
Export-Excel is a lifesaver, especially when you want your report to actually be readable.

Room for Improvement

Add Azure MeterCategory per subscription in the email report to give a better idea of where the cost usage is
Move secrets to Azure Key Vault or use Managed Identity
Add monthly trend analysis and forecasting
Push the data to Power BI for richer dashboards

Final Thoughts

This script is now my go-to tool for quickly generating Azure cost reports across environments. It’s flexible, reliable, and gives my leadership team the visibility they need to make informed decisions, without logging into the portal.

Because let’s face it: if you’re managing Azure at scale, you shouldn’t be clicking through billing blades. You should be scripting your way to clarity.

Keep those costs in check, one API call at a time.

Thanks for stopping by. ✌

RBAC vs. ABAC in Azure: Why You Need Both for Cloud Access Control That Actually Works

July 21, 2025June 21, 2025 by Kumaran

Let’s cut to the chase, cloud access control isn’t just a checkmark on your compliance list anymore. It’s a daily battlefield. With global teams, hybrid workloads, and rising security risks, who can do what and under what conditions is now a core pillar of IT strategy.

If you’re working in Azure, you’ve likely heard of RBAC (Role-Based Access Control) and ABAC (Attribute-Based Access Control). But what you may not know is that these aren’t mutually exclusive instead they’re better together.

Let’s unpack what each model does, where they shine (and struggle), and how to combine them for airtight, scalable access governance in Azure.

What is Azure Role-Based Access Control (RBAC)?

Azure RBAC helps you control access by assigning roles to security principals (users, groups, service principals, or managed identities) at a specific scope (subscription, resource group, or resource).

Each role is a bundle of permissions, think of them as job descriptions for Azure resources.

Example RBAC Use Cases

A user who can manage only virtual machines in the Dev subscription.
A group assigned the Reader role at the resource group level.
An app given Contributor access to only one storage account.

RBAC works well when your access needs are role-based and relatively straightforward. But as organizations scale and become more dynamic, things can get messy fast.

Where RBAC Falls Short

RBAC starts to creak when:

You need to create roles for every unique mix of region, team, and resource.
You end up with a Frankenstein monster of roles like:
- VP - Europe
- Manager - Asia
- SalesRep - NorthAmerica - Junior
You have hierarchical or multi-tenant data structures that don’t fit RBAC’s flat model.

The result? Role sprawl, administrative pain, and security gaps.

What is Azure Attribute-Based Access Control (ABAC)?

ABAC adds contextual smarts to access control. Instead of relying solely on roles, it factors in attributes of:

The user (e.g., department = HR)
The resource (e.g., tag = Project:Alpine)
The environment (e.g., access during business hours only)

In Azure, ABAC is implemented through role assignment conditions that filter RBAC permissions.

ABAC in Action

“Chandra can read blobs only if they’re tagged with Project=Cascade.”
“Support engineers can impersonate users only during a help session.”
“Users can access data only in their assigned region or cost center.”

This kind of fine-grained access is powerful, flexible, and crucial in multi-tenant, regulated, or fast-moving environments.

RBAC + ABAC: Not a Choice – A Collaboration

Here’s the mindset shift: RBAC and ABAC are not competing models. They’re complementary.

RBAC defines what actions are allowed.
ABAC defines under what conditions those actions are allowed.

By combining the two, you can:

Keep your role structure simple and understandable.
Layer on access conditions that reflect real-world business rules.

Common Hybrid Patterns

Scenario	RBAC Role	ABAC Condition
Multi-tenant app	Tenant Admin	Only for `tenant_id=X`
Regional access	Sales Manager	Region = “North America”
Subscription tiers	Premium User	Access feature only if `plan=premium`
File access	Editor	Only `owner=user_id` or `shared_with=user_id`
Support scenarios	Support Agent	Impersonation allowed if `user_in_session=true`

Best Practices for RBAC and ABAC in Azure

Let’s bring it home with the golden rules:

RBAC Best Practices

Least Privilege Always: Grant only the permissions needed—nothing more.
Limit Subscription Owners: Three max. The fewer, the safer.
Use PIM for Just-in-Time Access: With Microsoft Entra PIM, elevate access temporarily.
Assign Roles to Groups: Not individuals. Makes scaling and auditing easier.
Avoid Wildcards in Custom Roles: Be explicit with Actions and DataActions.
Script with Role IDs, Not Names: Avoid breakage from renamed roles.

ABAC Best Practices

Tag Strategically: Use meaningful tags like Project, Environment, or Classification to enable ABAC.
Use Conditions to Reduce Role Sprawl: Filter access with precision.
Start Small: Pilot with blob storage conditions before scaling ABAC elsewhere.
Don’t Replace RBAC: Use ABAC as a filter, not a replacement.

Recap: When to Use What

Feature	RBAC	ABAC	RBAC + ABAC
Simplicity	✅	❌	✅
Contextual Flexibility	❌	✅	✅
Scalability	⚠️ (sprawl risk)	✅	✅
Multi-Tenant Scenarios	⚠️	✅	✅
Least Privilege Enforcement	✅	✅	✅✅

Final Thoughts

RBAC gives you structure. ABAC gives you nuance. In Azure, using both gives you power and precision.

Don’t fall into the “either/or” trap. The real magic happens when you combine the predictability of RBAC with the intelligence of ABAC to build access models that scale with your business.

Thanks for stopping by. ✌