0% found this document useful (0 votes)
129 views

Azure Data Factory Monitoring Best Practices

Uploaded by

Vijai b
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

Azure Data Factory Monitoring Best Practices

Uploaded by

Vijai b
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Azure Data Factory Monitoring

Azure Data Factory monitoring Best practices

• As best practice for the Azure Data factory monitoring, log needs to be captured systematically.
• By default, Azure kept the logs for pipeline run maximum up to 45 days. Hence after 45 days your
adf logs will not be accessible thereafter.
• Configure your diagnostic logs to a storage account for auditing or manual inspection. You can
use the diagnostic settings to specify the retention time in days.
• We should configure the log analytics workspace to analyze the logs using the queries.
• Add the Azure Data factory Monitoring service pack from the Azure Marketplace.
(https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft.azuredatafactoryanaly
tics?tab=overview)
• It will provide a one click monitoring solution across the data factories. It has a built-in dashboard
as well for quick access to adf log metrics.
Main Types of Alerts

• Metric Alerts
• Metric alert rules are specified with a target criteria for a metric within the resource to be monitored. Based
on the condition, notifications are sent to an action group when the alert rule fires.

• Here are a few more attributes of metric alerts:


• azure data factory alerts -
• Monitoring and alerting happens for the current state of time.
• Mostly based on performance, usage, and status (Success/Failure/Cancelled) driven metrics.
• Can be checked from the Metrics section in the Azure portal.
Log Analytics Alerts

• These alerts are triggered based on log searches that are automatically run at periodic internals. Advanced
alerting for non-fatal errors, warnings, and business logic errors can be created in Azure Monitor and Log
Analytics.
• A few more attributes of log analytics alerts are:
• Provides long term storage of logs (default ADF logging period is 45 days) which can enable more
sophisticated analytics.
• For example, trending analysis using historical comparisons of pipeline performance and activities.
• Ability to merge various metrics and observe the relationship between them.
• Able to analyze all types of logs including any custom logs written for a specific business case.
• Can be checked from the Log Analytics workspace in the Azure portal.
Azure Data Factory Alerts

• We can implement native ADF Alerts as part of an Azure implementation for a client.
• We can create two alert rules: one to monitor pipeline failures and the other for trigger failures.

• Metric: Failed Pipeline Run


• Severity: 0 to 4
• Dimension: Select the Pipelines and Failure Type to be associated
• Alert Logic: Greater than or Equal to 1(threshold count) based on Count aggregation
• Evaluation Period: Over the last 1 min
• Frequency: Every 1 min
• Notification: Configure Email/SMS/Voice and use an action group for notifications to be sent
Approach
• Enabling Azure Monitor plus Log Analytics
• Developing Azure Data Factory monitoring tool with SDK
• https://docs.microsoft.com/en-us/azure/data-factory/monitor-progra
mmatically

• https://www.bluegranite.com/blog/monitoring-azure-data-factory-v2-
using-power-bi
Custom logs of your data pipelines and how to build a Data Catalog

• ID: Id of the log. This will be an auto generated value (See Constraint).


• ID_TRACK_PROCESSING: Id (in track_processing table) of the table to ingest that triggered the execution of the job.
• SCHEMA_NAME & TABLE_NAME: Schema and table name of the table being inserted/processed.
• PRIMARY_KEYS: In case that the table has Primary Keys and these are being used to perform the Merge.
• STATUS: Process status (Success or Failed).
• RUN_DT: Timestamp of when the job was started.
• TIME_TAKEN: Time needed by the job to finish.
• CREATED_BY_ID: To identify the tool that created the log (Azure Data Factory in our example).
• CREATED_TS: Timestamp of when the log was created.
• DATABRICKS_JOB_URL: URL in which the code and stages of every step of the execution can be found.
• DATAFACTORY_JOB_URL: URL of the ADF pipeline that identified the job as finished.
• LAST_DSTS: Latest timestamp of the table.
• LIVE_ROWS: Number of rows of the table after the execution of the job.
• REPLICATION_ROWS: Number of rows inserted/processed in the latest execution (If FULL LOAD, it will be equal than
LIVE_ROWS).
• COLUMNS: Structure (column names and types) of the table after the ingestion job.
Data Catalog
Pricing
• https://azure.microsoft.com/en-in/pricing/details/monitor/

You might also like