System Dashboards

System Menu

The System menu button is visible only for users with the role System Administrator.

Click from the main menu on the System menu button.

Figure 1: System dashboards menu

System Health

Precondition: The System menu is opened.

Click on the Health menu button

Figure 2: System Health dashboard

The System Health dashboard has the following sections:

Alerting Status

At the top of the dashboard is shown Alerting Status

Figure 3: System Health dashboard - Alerting status

This is a list of all heartbeat alerts and their current status configured based on the heartbeat charts above. All instruments described above are also configured to show the alerting status.

Alerting status can be one of the following:

On the heartbeat tables the status is binary:

ANALYSER STATUS

Expand the panel ANALYSER STATUS

Figure 4: System Health dashboard - Analyser status

The execution table shows a list of all analyzers currently running in the system and their status calculated according to their last execution time. Each analyzer is responsible for a specific group of tasks and runs asynchronously on a configured time interval.

HEARTBEAT STATUS

Expand the panel HEARTBEAT STATUS

Figure 5: System Health dashboard -Heartbeat status graphics

Linux and Windows Gateways

The list of the graphics include:

Figure 6: System Health - Heartbeat status - Gateway status

The time period is selected - Last 1 hour (UTC).

DB SYNC Timer

Figure 7: System Health - Heartbeat status - DB SYNC timer

This is an asynchronous timer-based function in Azure that accomplishes data synchronization and other similar tasks. It also sends heartbeats to the back-end services for the purpose of checking whether everything in the cloud works fine without being dependent on the gateways in the client premises.

Linux and Windows adapters

For each gateway and gateway module, there is a dedicated chart showing the history of the number of heartbeats sent by it.

The time period is defined by the selected dashboard period.

The number of heartbeats sent at each interval corresponds to the number of machines dependent on this module or in the case of a gateway this number is always 1 (the gateway itself).

Based on these charts are configured system notifications that automatically notify of a heartbeat loss.

TELEMETRY

Expand the panel TELEMETRY

Archiving Progress

The panel shows the progress of the telemetry archiving process. It runs daily and archives the telemetry database to a dedicated archive DB then deletes all archived data from the telemetry DB to free up space. 

Figure 8: System Health - Telemetry - Archiving progress

PowerBI Embedded Capacity ON

Figure 9: System Health - Telemetry - PowerBI Embedded Capacity

The PowerBI is used in reports generation. If PowerBI Embedded Capacity is turned ON for a long time the alert is generated. If the capacity is not turned OFF soon, there are expected high costs.

IoT Hub Data Usage(1 Minute)

Figure 10: System Health - Telemetry - IoT Hub data usage (1min)

This chart shows:

IoT Hub Telemetry(1 Minute)

Figure 11: System Health - Telemetry - IoT Hub telemetry (1min)

The chart shows telemetry sent to IoT Hub, telemetry send attempts and difference.

External Sensors Calculated Tags Counts

Figure 12: System Health - Telemetry - External sensors calculated tags counts

Events Ingested per Gateway(stacked)

Figure 13: System Health - Telemetry - Events ingested per gateway

This chart shows:

Linux MTConnect Data

Figure 14: System Health - Telemetry -Linux MTConnect

Edge MT Connect for the selected time period.

Windows MTConnect Data

Figure 15: System Health - Telemetry - Windows MTConnect

LOG STATISTICS

This section shows the information on the system logs for the selected period in the dashboard (usually the last 24 hours).

It contains the following instruments:

Errors and Warnings

Figure 16: System Health - Log Statistics - Error and Exceptions

This graphic shows the number of errors and warnings across system components for the chosen period. The count of errors and warnings is visualized with table colored depending on the threshold - green, yellow, orange and red. If the count of errors is in the red range they should be mitigated.

Errors and Exceptions List

The errors and exceptions list is a table showing the errors and exceptions for the chosen period. The timestamp when the error or exception occurred, resource that generated the message and the message text.

Warnings List

The warnings list is a table showing the warnings for the chosen period. The timestamp when the warning occurred, resource that generated the message and the message text.

REQUESTS & DEPENDENCIES

This section shows information about requests made from, to and between different system components. A dependency is an external component that is called by a specific component. It's typically a service called using HTTP, or a database, or a file system.

The section contains the following instruments

Requests Duration

Figure 17: System Health - Requests & Dependencies -Request duration

This chart shows the duration of requests to specific URL for the chosen period. These statistics usually represent the duration of responses of parts of our internal/integrational APIs, or other similar HTTP endpoints.

The chart is provided with a summary table for each shown metric where we can find the minimum, maximum and average value of each series shown on the chart.

Requests Failed

Figure 18: System Health - Requests & Dependencies - Requests failed

This chart shows the number of failed requests to specific URL for the chosen period. These statistics usually represent the count of failed responses at specific moment on parts of our internal/integrational APIs, or other similar HTTP endpoints. The chart is provided with a summary table for each shown metric where we can find the minimum, maximum and average value of each series shown on the chart.

Dependencies Duration by Target

Figure 19: System Health - Requests & Dependencies - Requests duration by target

It shows the duration of calls to dependent specific URL for the chosen period. The URLs are usually, but not always, external APIs not managed within our platform. These statistics usually represent the duration of responses of parts of those APIs, or other similar HTTP endpoints.

The chart data is grouped by the endpoint being called.

The chart is provided with a summary table for each shown metric where we can find the minimum, maximum and average value of each series shown on the chart.

Dependencies Duration by Type

Figure 20: System Health - Requests & Dependencies - Requests duration by type

This chart is the same as the Dependencies Duration by Target. The difference is that endpoints are grouped by Type (HTTP, SQL, IoT Hub, and others).

The chart is provided with a summary table for each shown metric where we can find the minimum, maximum and average value of each series shown on the chart.

PERFORMANCE

SQL Server Load (DTU)

Figure 21: System Health - Performance - SQL Server load (DTU)

This graphic shows the load on the SQL server databases in number of DTU items used for the chosen period. DTU is Database Transactional Unit and is an Azure SQL specific metric on how much resources a database is using.

SQL Server Data Space Used

Figure 22: System Health - Performance - SQL Server data space used

This graphic shows the percentage of SQL Server Data Space used for ConfigDB and ReportingDB during the selected time period.

SQL Server Successful Connections

Figure 23: System Health - Performance - SQL Server Successful connections

This graphic shows the SQL Server Connections succeeded for the selected time interval.

SQL Server Failed Connections

Figure 24: System Health - Performance - SQL Server Failed connections

This graphic shows the SQL Server Connections failed for the selected time interval.

Service Bug Queues

Figure 25: System Health - Performance - Service bug queues

This chart shows the number of messages in service bus queues across time for the chosen period in the dashboard.

Active queues are those that constantly consume messages. From them different system components are retrieving and processing those messages. If the number of messages raise too much this means that some component is processing them slowly. If the number raises constantly this means that some component is not running at all.

Dead-lettered queues correspond to the specific active queues and they contain messages for which the processing has failed a certain amount of times. These queues should always contain zero items. If there any messages here, then this is an indication of a problem and should be mitigated.

The chart is provided with a summary table for each shown metric where we can find the minimum, maximum, average and current number of messages within each queue for the chosen period.

VM Activity

All virtual machine metrics are evaluated and persisted once per minute.

This section shows the information on the performance and important event logs of the few virtual machines in the system for the selected period in the dashboard.

CPU Usage

Figure 26: System Health - VM Activity - CPU usage

The charts show the usage of CPU by two virtual machines – Influx DB Server VM and Grafana VM.

Memory Usage

Figure 27: System Health - VM Activity - Memory usage

The charts show the usage of memory by two virtual machines – Influx DB Server VM and Grafana VM.

Influx Application Logs

This table contains Influx application messages logs by timestamp.

There are listed the most important event log entries from the Event Log of the Time series DB Server virtual machine for the chosen period in the dashboard. Each message should be carefully reviewed and all identified problems should be mitigated.

Grafana Application Logs

This table contains Grafana application messages logs by timestamp.

There are listed the most important event log entries from the Event Log of the Visualization virtual machine for the chosen period in the dashboard. If some counts are too big, they should be reviewed and if a problem is found it should be mitigated.

SYSTEM INFORMATION

In this section is shown table for all system components: Group, Name, Version, Reported Time and Version Date.

Figure 28: System Health - System Information

Influx DB Metrics

Precondition: The System menu is opened.

Click on the InfluxDB Metrics menu button.

Figure 29: InfluxDB Metrics dashboard

HTTP Queries

Figure 30: InfluxDB Metrics - HTTP Queries

This graphic shows the number of queries executed for the selected time interval.

HTTP Errors

Figure 31: InfluxDB Metrics - HTTP Errors

The graphic shows the number of failed queries for the selected time interval. The errors are split in two types: server and client errors.

Points Read

Figure 32: InfluxDB Metrics - Points Read

The chart shows the number of points read from the database during the selected time interval.

Points Written

Figure 33: InfluxDB Metrics - Points Written

The chart shows the number of points written to the database over time. The data is split in two types: failed and successful writes. For a more focuses view, failed chart is negated and will be shown towards the negative end of the Y-axis

HTTP Reads Duration (99th %)

Figure 34: InfluxDB Metrics - HTTP Reads duration (99th %)

The chart shows the duration, in nanoseconds, of the slowest 1% of all read queries for the selected time interval.

HTTP Writes Duration (99th %)

Figure 35: InfluxDB Metrics - HTTP Writen Duration (99th %)

The chart shows the duration, in nanoseconds, of the slowest 1% of all write queries for the selected time interval.

Number of Series

Figure 36: InfluxDB Metrics - Number of series

The chart shows the number of series per database for the selected time interval.

Runtime

Figure 37: InfluxDB Metrics - Runtime

The chart shows the following runtime statistics over time:

Health Check

Precondition: The System menu is opened.

Click on the Healath Check menu button.

Purpose: Health Check dashboard shows in a table status of execution of HealthCheckAPI requests which are executed every hour. Requests check some basic functions of the Notification server and Persister server.

Figure 38: System - Health Check dashboard

Statuses from the tests can be:

Heartbeat History

Precondition: The System menu is opened.

Click on the Heartbeat history menu button.

Purpose: On this dashboard, user administrator can search for missing heartbeats(2 or 3) for the selected time period. The returned result is grouped by adapters. Each machine heartbeat comes on each minute.

Figure 39: System - Heartbeat history dashboard

The first chart shows for each adapter the amount of downtime for which adapter has interruptions. For example, FanucRobotAdapter has interruptions of 4 min for the last 3 hours (selected by time interval from upper right).

Figure 40: Heartbeat history - Chart Amount of downtime by adapter

For each adapter are shown in the table times when it has interruptions and affected machines:

Times when Heartbeat Stopped, Time delta of interruption, Heartbeat Restored, and list of affected machines.

Figure 41: Heartbeat history - FanucRobotAdapter with interruptions and list of affected machines

Value Analysis

Precondition: The System menu is opened.

Click on the Value Analysis menu button.

Purpose: Dashboard shows tags values statistics for the selected time interval (3). Statistics are shown after selecting the machine from the machine list (1) and tag from the machine tags list (2).

At the top is shown Tag description by ID, Name, Display name, Value type.

Tag Analytics are grouped in two panels: Simple and Advanced.

Figure 42: Value analysis - selected tag for machine

Expand the Simple analytics panel:

Figure 43: Value Analysis - Simple analytics-1

Figure 44: Value Analysis - Simple analytics- 2

  1. Number of values – this panel shows the number of values grouped by 1h for the given period

Figure 45: Simple value analytics - Number of values

  1. Telemetry Stats– in the panel are listed changing from drop-down : Average time for values in sec, Elapsed time, Count of values and Count distinct for the selected time interval

Figure 46: Simple value analytics - Telemetry stats

  1. Most frequent values – this panel shows the most frequent values by days and grouped by 1min for the selected period.

Figure 47: Simple value analytics - Most frequent values

  1. Aggregations – This panel shows the common aggregations (Min, Max, Mean, Standard deviation, Coefficient of variation).
    • Coefficient of variation = Standard deviation/Average

Figure 48: Simple value analytics - Aggregations

  1. Distribution of values – this panel shows all values for the given Tag and period. Standard deviation and coefficient of variation for the tag values.

Figure 49: Simple value analytics - Distributions of values

  1. Variations – the panel shows the common variations for the given series of values (Min, Max, Average, Moving Average, Exponential moving average)

Figure 50: Simple value analytics – Variations

Expand the Advanced analytics panel:

Figure 51: Advanced analytics -1

Figure 52: Advanced analytics -2

  1. Calculated percentage of values that fall below a particular value – the panel calculates the specific value from the series for every percentile/quantile. This shows that the value is X% bigger than the values below it.

Figure 53: Advanced analytics - Calculated percentage of values that fall below a particular value

  1. Outliers – the panel shows values that are in abnormal distance from other values. Either extremely high or extremely low values.

Figure 54: Advanced analytics - Outliers

  1. Outlier trend – outlier trend for the selected period. Graphic shows tag values and lines with high outlier and low outlier values.

Figure 55: Advanced analytics - Outliers trend

  1. Values changing in time – the panel shows the change (how the value changed according to the value before it) between the values for the given Tag and period.

Figure 56: Advanced analytics - values changing in time

  1. Values Histogram - The histogram panel is a graphical representation of the distribution of numerical data - values related to specified tag. It groups values into buckets (sometimes also called bins) and then counts how many values fall into each bucket. Min and Max values for the selected period are shown also.

Figure 57: Advanced analytics - Values histogram

  1. Values Heatmap - The heatmap panel is a graphical representation of the distribution of numerical data - values related to a specified tag. It groups values into buckets (sometimes also called bins) and then counts how many values fall into each bucket. The heatmap is like a histogram, but over time where each time slice represents its own histogram. Instead of using bar height as a representation of frequency, it uses cells and colors the cell proportional to the number of values in the bucket.

Figure 58: Advanced analytics - Values heatmap