Blue Elephant-Operation and Maintenance Data Center (HCS version)

Artificial Intelligence for IT Operations (AIOps) platforms combine big data, AI machine learning and other technologies to support all major IT operations functions with proactive, personalized and dynamic insights. Major banks in the financial field, GAs and power grids in various parts of the country, as well as large industrial enterprises, etc. have very large information network architectures and related application systems. In daily work, there will be network and business-related incidents.

Service provider information:Beijing Hailian Xixun Technology Co., Ltd.
Invoicing subject:Beijing Hailian Xixun Technology Co., Ltd.
Payment method:License
Product pictures
Product details
Product Highlights
  • Knowledge graph abstracts the ontology model of operation and maintenance objects based on cmdb, establishes a resource graph, and forms a dynamic resource portrait.
  • Trend prediction, dimensionality reduction through principal component analysis and other means, classification and marking through automatic classifiers, and then the indicator trend prediction model is obtained
  • Container load prediction, using the load change correlation between containers for more accurate load prediction, modeling the load data as a graph to predict the load at the next moment
  • Dynamic baseline, time series data enters the real-time data warehouse. The data warehouse has management models for data sources and indicators. According to the constructed indicator model
  • Network traffic anomaly detection builds a network graph, integrates links at all levels, and integrates time series data to deeply combat the network alignment framework.
Product parameters
Delivery method
License
Delivery SLA
60 calendar days
Service supervision
If you purchase goods that involve service supervision, you should enter the buyer center after purchase to submit your requirements and promptly accept them.
involving
Invoicing subject
Beijing Hailian Xixun Technology Co., Ltd.
operating system
Linux/Windows/Android/iOS/Unix/Others
Version
V1.0
Release date
2022-05-05
Category
Log analysis
Product description

Major banks in the financial field, GAs and power grids in various parts of the country, as well as large industrial enterprises, etc., have very large information network architectures and related application systems. In daily work, network and business-related failures will occur. If the operation and maintenance method is manually analyzed and positioned, it will be difficult to adapt to the current increasingly rapid business needs. The ever-expanding scale of IT systems, increasingly complex system architectures, and massive amounts of IT operation and maintenance data are all a burden to operation and maintenance personnel using traditional devOPS. In 2017, Gartner, the world's most authoritative IT research and consulting company, proposed the concept of AIOps in its report. Gartner defines AIOps as: Artificial intelligence for IT operations (AIOps) platforms combine big data, AI machine learning and other technologies to support all major IT operations functions with proactive, personalized and dynamic insights. After that, the entire operation and maintenance industry quickly regarded AIOps as the inevitable direction to solve operation and maintenance problems in the future. Therefore, our company has developed its own AI-empowered data middle-end product Blue Elephant based on more than 10 years of in-depth experience in IT operation and maintenance for public security agencies. 


Based on three-dimensional real-time monitoring, full-coverage resource management, and multi-theme data warehouses, data, algorithms, and models are applied to the tasks and processes of automated operation and maintenance, and benchmarked against the forefront of innovation, competition, and productivity—big data—Hailian launched a data middle-end product—Blue Elephant. 


The public component layer of the data center uses mainstream tools such as flume for system docking and data collection, and various data preprocessing tools such as Kylin are used to build Party A's data warehouse. Provide offline, real-time and graph computing capabilities. Supports data modeling and knowledge graph generation, and provides a series of algorithm libraries. At the service support level, data sets are constructed for operation and maintenance, and the learning library for anomaly detection, trend prediction, and root cause analysis is continuously expanded to better support intelligent data services.


 


Application performance monitoring products enable end-to-end performance monitoring and analysis of business systems by proactively monitoring the real experience of end users and the operating status of application systems, helping enterprises proactively discover and handle anomalies in all aspects of the business system to ensure that application systems can achieve expected service levels. Application performance monitoring corresponds to each core link of the business application system, including mobile, browser, application and host modules, respectively realizing mobile APP user experience management, Web page user experience management, back-end application system performance monitoring and analysis and host performance monitoring.


Data collection is through an open public interface adaptation platform, including functions such as unified authentication, data access, and API encapsulation. In this framework, any third-party monitoring data, as well as third-party application and network monitoring data can be flexibly accessed. After access, due to the heterogeneity of access data from each third-party platform, a flexible data mapping tool is provided, which can be stored in the CMDB, indicator library, and alarm library of the data center in a unified standard.


 


First of all, we provide a flexible data access framework that can be connected to any third-party system and pull the data from the third-party monitoring system to build Party A's data warehouse, provide offline computing, real-time computing and graph computing capabilities, support data modeling and knowledge graph generation, and build an AI empowerment engine dedicated to model construction of anomaly detection in operation and maintenance, root cause analysis, capacity planning in operation, service portrait, etc., to provide intelligent data services.


 


The data warehouse adopts the snowflake model and builds various data models in the DWD layer based on various data in the ODS layer. For example, the fact table of indicator data and various dimension tables are displayed on the PPT. Corresponding theme libraries are built based on various themes in the DWS layer to support the large-screen display of various themes.


 


Knowledge graph first abstracts the ontology model of operation and maintenance objects based on cmdb. 2. Establish a resource map based on the ontology model to reflect resource relationships, and introduce time series data such as indicators to build a time series knowledge map to form a dynamic resource portrait.


 


For trend prediction, for the historical indicator data in the data warehouse, we first use principal component analysis and other methods to reduce the dimensionality, extract features, classify and label them through automatic classifiers, and build a training data set. Then model training is performed to obtain an indicator trend prediction model, and the accuracy of the output is provided through weight optimization. For example, we found that different types of container business, such as computing containers and access containers, have different load patterns. We use automatic classifiers to distinguish different types of containers, predict the load, and adapt to those containers with unclear characteristics through weighted adjustments. The first benefit of forecasting is to understand changing trends and be forward-looking in operations. At the same time, it can assist in fault discovery and early warning.


 


Container load prediction: In an online business system deployed with a microservice architecture, related services are deployed to different containers. A business flow will cause the load indicators of the upstream and downstream containers in the call chain to change synchronously. Therefore, we consider using the load change correlation between containers for more accurate load prediction, modeling the load data as a graph, using the containers as vertices in the graph, and representing the load correlation between different containers through the edges between vertices. Due to the dynamic deployment and expansion and contraction of business containers, this correlation between containers is dynamic, and we cannot obtain the topological relationship between nodes in advance. Therefore, we built a correlation learning model, generated a multi-variable time series graph G, and then built a load prediction model based on spectral graph convolution. Convert G to the spectral domain representation to learn the rules between sequences; then convert it from the spectral domain to the frequency domain to learn the rules within the sequence; and finally predict the load at the next moment through the fully connected layer (FC).


 


Dynamic baseline, time series data enters the real-time data warehouse, which has a management model for data sources and indicators. According to the constructed indicator model, historical and real-time time series data are preprocessed to distinguish stationary, periodic and fluctuating types. Then, appropriate anomaly detection methods are selected for different characteristics, abnormal points are detected, and written into the anomaly library for abnormal feedback. For fluctuating data, it is difficult to detect anomalies using year-on-year, month-on-month, and fixed threshold methods. Deep learning models must be used for detection.


 


Regarding device anomaly detection, we implemented an anomaly detection method based on a single indicator. First, the interval fluctuation rate of change method is used for feature extraction, and then the spatial features and temporal features are fused through convolutional neural networks and long short-term memory to enrich the feature representation of the data. Finally, the classification and judgment module performs anomaly detection. The classification and judgment module uses full connection to extract nonlinear relationships, and makes judgments based on the k-σ principle to detect the status of the equipment.


 


In terms of network traffic anomaly detection, we first built a network graph, integrated links at all levels, and integrated time series data; then we designed a deep adversarial network alignment framework, which first uses deep adversarial learning to discover complex network mappings to align the embedding distributions of the two networks. Then, using our learned mapping function (generator), an efficient nearest neighbor node alignment is performed, resulting in efficient detection of network traffic anomalies.


 


In terms of service anomaly detection, we adopt the idea of ​​predicting first and then detecting to solve this problem. Through the service call graph of time series topological relationships and the log text graph containing log semantic relationships, feature fusion is performed to obtain a service prediction model, thereby predicting the most likely service calls to occur in the next time slice. At the same time, we trained an unsupervised anomaly detection model through historical indicator data to discover service indicator anomalies. During real-time detection, we use the service prediction model to predict the next service that may be called. Targeted detection based on the predicted services can reduce most of the useless log detection, more effectively detect anomalies and generate alarms.

Sales target

All users

Product pricing

Step pricing

Product price = ordering interval quantity 1 * interval specification unit price 1 + ordering interval quantity 2 * interval specification unit price 2.

For example: the call rate is 0.3 yuan/minute for the part that does not exceed 3 minutes, and 0.2 yuan/minute for the part that exceeds 3 minutes; if the user calls for 8 minutes, the fee charged is 3*0.3+5*0.2=1.9 yuan.

Tier pricing

Product price = order quantity * unit price of the range to which the quantity belongs.

For example: the call rate is 0.3 yuan/minute for no more than 3 minutes, and 0.2 yuan/minute if it exceeds 3 minutes; if the user calls for 2 minutes, the fee charged is 2*0.3=0.6 yuan; if the user calls for 8 minutes, the fee charged is 8*0.2=1.6 yuan.

linear pricing

Product price = order quantity * unit price.

Simple pricing

The product price is the price displayed on the page.

Service support
User Guide
Operation and maintenance data platform V1.0 operation manual.docx Download
Platform support scope
Platform support scope
After-sales service time: 5 * 8 hours
After-sales service content: cooperation consultation, product use and implementation services
Service hotline: (010) 5638 6788
Service email: market@mt-hirisun.com
Contact address: Room 701, 7th Floor, Building 5, Fengchuang Science and Technology Park, No. 18 Kechuang 13th Street, Beijing Economic and Technological Development Zone, Beijing
Open source statement

Click to view "Open Source Software Statement

Goods and Services Agreement

By ordering this product you agree toGoods and Services Agreement