Using Data Science to Detect System Log Anomalies
(Photo : pixabay)

With the significant increase in computing devices in recent years, the amount of data transmitted and stored has grown alarmingly. Therefore, system logs are an essential artifact for applying anomaly detection techniques, as they record significant system states and events, helping to debug unexpected behaviors.

In industry, it is common to record detailed software runtime information in system logs, allowing developers and support engineers to analyze system behavior. The rich information recorded by the logs allows developers to conduct a variety of system management tasks, such as diagnosing errors and crashes, ensuring application security, and anomaly detection.

Anomaly detection with machine learning plays a very important role in various communities such as Data Science, Machine Learning, Computer Vision and Statistics, and is probably the most common field to conduct formal and reliable analysis on system logs. This is because it is possible to detect things that went wrong in the execution of the process.

This field, which looks for abnormal system behavior by looking at log data, allows developers to find and resolve issues in a timely manner.

Basic concepts

When a data instance behaves differently from what is expected by the system, it is called an anomaly. The purpose of anomaly detection is to determine all these instances on the basis of data. Anomalies are also called abnormalities, novelties, deviations or discrepancies in the data mining and statistics literature.

3 Types of anomalies

Deviations can be caused by errors in the data, but are sometimes indicative of a new, previously unknown, underlying process. Now, let's look at the three types of classification into which anomalies are divided: point anomalies, contextual anomalies, and collective anomalies.

  • Most works in the literature focus on punctual anomalies, which generally represent an irregularity or random deviation that may not have a particular interpretation.
  • A contextual anomaly, also known as a conditional anomaly, is an instance of data that can be considered anomalous in some specific context, for example, the timestamp, a money-spending pattern, the happening of events in system logs, or any resource used to describe normal behavior.
  • Collective anomalies are a type of anomaly that, individually, appear as normal instances, and when observed as a group, exhibit unusual features.

Anomaly detectors

Latest corporate applications are mostly consists of different micro-services. For example, Software as a Service (SaaS) apps, these are used over web and work as cloud based. To some extent because of their general nature of being distributed, organizing and observing performances in these complicated applications is now very much difficult. When things for example, poor performance cause to happen, it may be not easy to recognize and fix the root errors. The data science monitoring solution has the ability to recognize and classify irregular system behaviors, and thus improve performances by fixing the issues.

Setting accurate anticipation

Before installing a system for anomaly detection, it's essential to set some accurate anticipation. Baron Schwartz mentioned in his book, what an ideal anomaly detector should do, general false impression surrounding it, implication, and functions, and what a real anomaly detector can do.

An ideal anomaly detector application would:

  • Detect unfamiliar activities in system automatically
  • Forecast major issues with full accuracy
  • Provide an analysis of the main causes in simple way, this way, the relevant service providers are able to know precisely what issues to fix to solve the issues

It is not possible to develop such a 100% accurate anomaly detector that could present fully accurate analysis. Always, there will be some false aspects in analysis and some aspects may have any relation with performance signs. The experts must often conclude the connection by merging the analysis of anomaly detector with their expert knowledge.

Applications: Robust anomalies

Anomaly detection can be applied in many contexts, including to identify robust discrepancies. In this case, we have an anomaly detection approach based on logs.

LogRobust seeks robust and accurate detection, considering that real-time data logging is constantly changing. Because of logging data instability, the efficiency of detecting present anomalies is considerably affected.

The LogRobust architecture adopts the Bi-LSTM attention-based neural network to handle unstable log sequences. Since different logging events have different impacts on the classification result, the attention mechanism was introduced to the Bi-LSTM model to assign different weights to log events. In addition, the noise impact of data can also be reduced, as noisy events tend to be of lesser importance and are more likely to receive little attention.

Applications: Sentiment analysis

Anomaly detection can also be used in the context of sentiment analysis.

In his work, Hudan Studiawan (2020) proposes a new feeling analysis technique based on deep learning to check for anomalous activities in operating system (OS) logs. This problem is considered to be sentiment analysis of two classes: positive and negative feelings.

Studiawan used a Deep Learning technique that provides high accuracy and flexibility over previously unseen data. Specifically, a Gated Recurrent Unit model is used to detect sentiment in operating system log messages.

In real-life operating system logs, the number of negative messages is much smaller than positive ones, causing class imbalance. And to achieve a balance between the two classes of feeling, Tomek's link method is used. Balance will produce a better deep learning model; therefore able to more accurately detect anomalous activities.

Conclusion

Finally, unsupervised methods are widely used when you don't have labeled data. Several Deep Learning frameworks that address challenges in detecting unsupervised anomalies are proposed and shown to produce state-of-the-art performance.

Kengo Tajiri (2020) proposes a method of monitoring ICT (Information and Communication Technology) systems for continuous anomaly detection, considering that vector dimensions change frequently. Anomaly detection methods based on Auto-encoders, which train a model to describe "normality", are promising for monitoring the state of systems.

There is a great need to develop general purpose and custom anomaly detection techniques for problems. These methodologies must be adapted to keep pace with the latest changes in technology that can result in new vulnerabilities in various systems. Anomaly detection techniques need to be efficient enough to capture the small amount of outliers in large data streams and also smart enough to find anomalies in short or long periods of time.