As a successor to one of my previous posts, I would like to share some additional KQL queries which might help you during the troubleshooting sessions of your Sitecore application:
Availability Results:
availabilityResults | where timestamp > ago(7d) | summarize avg(toint(success)) * 100 by bin(timestamp, 1h), name | order by timestamp asc | render timechart
Get unique users per day past 90 days:
union pageViews,customEvents | where timestamp > ago(90d) | summarize Users=dcount(user_Id) by bin(timestamp, 1d) | render timechart
Get 500 errors in dependencies line graph past 90 days:
dependencies | where timestamp > ago(90d) | where resultCode in (“500”, “502”) | summarize count() by bin(timestamp, 1d) | render timechart
Get 404 errors in requests line graph past 90 days:
requests | where timestamp > ago(90d) | where client_Type != “Browser” | where resultCode == 404 | summarize failedCount=sumif(itemCount, success == false) by bin(timestamp, 1d), resultCode | extend request='Overall’ | render timechart
Get server exceptions line graph past 90 days:
exceptions | where timestamp > ago(90d) | where client_Type != “Browser” | summarize count() by bin(timestamp, 1d) | render timechart
Get all exceptions bar chart past 90 days:
exceptions | where timestamp > ago(90d) | summarize count = sum(itemCount) by bin(timestamp, 4h), problemId, tostring(customDimensions.InstanceName) | order by timestamp desc | render barchart
Clientside and react errors line graph past 90 days:
exceptions | where timestamp > ago(90d) | where client_Type == “Browser” | summarize count() by bin(timestamp, 1d) | render timechart
Get WARN total count line graph past 90 days:
traces | where timestamp > ago(90d) | where severityLevel == 2 | parse message with * “WARN " TrimmedMessage | summarize count() by bin(timestamp, 1d) | render timechart
Get Trace ERROR total count line graph past 90 days:
traces | where timestamp > ago(90d) | where severityLevel == 3 | parse message with * “ERROR " TrimmedMessage | summarize count() by bin(timestamp, 1d) | render timechart
Get pageviews line graph past 90 days:
pageViews | where timestamp > ago(90d) | summarize count() by bin(timestamp, 1d) | render timechart
Check your CD(s) for errors:
traces | where timestamp > ago(1d) | where message contains “ERROR” | where (customDimensions).Role contains “CD” | project timestamp, message, (customDimensions).InstanceName | order by timestamp desc
To create a better overview on ERRORS you could extend the timespan and render the query using, for example, a timechart.
traces | where timestamp > ago(144h) | where message contains “ERROR” | where (customDimensions).Role contains “CD” | extend localTime = timestamp + 1h | summarize Amount_Of_Errors=dcount(message) by bin(localTime, 1h) | order by localTime asc | render timechart
Check your CM(s) for errors:
traces | where timestamp > ago(1d) | where message contains “ERROR” | where (customDimensions).Role contains “CM” | project timestamp, message, (customDimensions).InstanceName | order by timestamp desc
To create a better overview on ERRORS you could extend the timespan and render the query using, for example, a timechart.
traces | where timestamp > ago(144h) | where message contains “ERROR” | where (customDimensions).Role contains “CM” | extend localTime = timestamp + 1h | summarize Amount_Of_Errors=dcount(message) by bin(localTime, 1h) | order by localTime asc | render timechart
Check all instances for errors: * please note I extend to use a localTime in some of the querys
traces | where timestamp > ago(1d) | extend localTime = timestamp + 1h | where message contains “ERROR” | where message !contains “INFO” and message !contains “WARN” | project localTime, message, (customDimensions).InstanceName | order by localTime desc
To create a better overview on ERRORS you could extend the timespan and render the query using, for example, a timechart.
traces | where timestamp > ago(144h) | where message !contains “INFO” and message !contains “WARN” and message !contains “Could not update device detection” | extend localTime = timestamp + 1h | summarize Amount_Of_Errors=dcount(message) by bin(localTime, 1h) | order by localTime asc | render timechart
Check your environment on exceptions:
exceptions | where timestamp > ago(14d) | extend localTime = timestamp + 1h | project localTime, (customDimensions).MachineName, assembly, outerMessage
Check your environment on server side exceptions:
exceptions | where timestamp > ago(7d) | where client_Type != “Browser” | summarize Total = count() by problemId | order by Total | project Total, problemId | take 10
Get server exceptions total count:
exceptions | where timestamp > ago(7d) | where client_Type != “Browser” | summarize count()
Get client side exceptions:
exceptions | where timestamp > ago(7d) | where client_Type == “Browser” | summarize Total = count() by problemId | order by Total | project Total, problemId | take 10
Get client side exceptions total count:
exceptions | where timestamp > ago(7d) | where client_Type == “Browser” | summarize count()
Get WARN count by occurrence:
traces | where timestamp > ago(7d) | where severityLevel == 2 | parse message with * “WARN " TrimmedMessage | project TrimmedMessage, customDimensions.Role | summarize Total = count() by TrimmedMessage, tostring(customDimensions_Role) | order by Total | project Total, customDimensions_Role, TrimmedMessage | take 10
Get WARN total count:
traces | where timestamp > ago(7d) | where severityLevel == 2 | parse message with * “WARN " TrimmedMessage | summarize count()
Get Trace ERROR count by occurrence:
traces | where timestamp > ago(7d) | where severityLevel == 3 | parse message with * “ERROR " TrimmedMessage | project TrimmedMessage, customDimensions.Role | summarize Total = count() by TrimmedMessage, tostring(customDimensions_Role) | order by Total | project Total, customDimensions_Role, TrimmedMessage | take 10
Get TRACE ERROR total count:
traces | where timestamp > ago(7d) | where severityLevel == 3 | parse message with * “ERROR " TrimmedMessage | summarize count()
Check your environment on log entry’s with severity > 3:
traces | where severityLevel >= 3
What are the most hit Urls:
requests | where timestamp > ago(14d) | summarize count() by url | order by count_ desc | project url, count_
Give me all 404 status codes:
requests | where timestamp > ago(14d) | extend localTime = timestamp + 1h | where resultCode == “404” | order by localTime desc
and summarize the Urls including a count
requests | where timestamp > ago(14d) | where resultCode == “404” | summarize count() by url | order by count_ desc | project url, count_
Sitecore depends a lot on correct license configuration, keep misconfigurations/issues under your radar:
traces | where timestamp > ago(7d) | where message contains “license” and severityLevel > 1
How many unique sessions hit my environment:
pageViews | where timestamp > ago(7d) | summarize TotalUniqueSessions = dcount(session_Id) by bin(timestamp, 1d) | order by timestamp asc | render barchart kind=default
Top 10 - Most requested pageviews:
pageViews | where timestamp > ago(7d) | summarize count() by url | order by count_ desc | project url, count_ | take 10
Top 10 - Most requested pageviews (combined UA and City):
pageViews | where timestamp > ago(7d) | summarize count() by url, client_Browser, client_City, client_CountryOrRegion | order by count_ desc | project url, count_, client_Browser, client_City, client_CountryOrRegion | take 10
Top 10 - 404 resultCodes:
requests | where resultCode == 404 | where timestamp > ago(7d) | summarize count() by url | order by count_ desc | take 10
Top 10 - 500 Errors in dependencies:
dependencies | where resultCode in (“500”, “502”) | where timestamp > ago(7d) | summarize count() by name, resultCode, target, type | project name, resultCode, target, type, count_ | order by count_ desc | take 10
CPU Usage:
performanceCounters | where timestamp > ago(@{variables(‘timestamp’)}) | where category == ‘Processor’ | where counter == ‘% Processor Time’ | where instance == ‘_Total’ | summarize avg(value) by bin(timestamp, 1h), cloud_RoleInstance | order by timestamp asc | render timechart