Workload Characterization of Autonomic DBMSs using Statistical and Data mining techniques

No Thumbnail Available

Date

2008-09

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Autonomic configuration is one of the most important components of an autonomic system. Database Management Systems (DBMSs) are one of the areas where autonomic configuration is highly required. In order for a DBMS to configure itself on changing external workloads, it should be able to detect and classify the workloads into their dominant categories, mainly into DSS (Decision Support Systems) and OLTP (Online Transaction Processing). Previous research works in this area have proposed a methodology for classification of workloads. But the tests are performed using limited algorithms and on only one commercial DBMS. In this thesis a model where an autonomic DBMS can identify and characterize the type of workload acting up on it is developed and the most important database status variables which are highly affected by changing workloads are identified. This is important for a self configuring autonomic DBMS because it needs to reconfigure itself based on identified changing workloads. Two algorithms are selected for database workload classification: hierarchical clustering and classification & regression tree for classifying database workloads after running database workloads from TPC benchmark queries and transactions. The costs of these workloads are measured in terms of status variables of the selected DBMS (MySQL). These costs are used to show whether a workload is DSS or OLTP using the selected classification algorithms. After a set of extensive experiments and analyses, we have found out that all the DBMS status variables are not equally important in classifying the collected workloads. In fact, some of the workloads do not have a significant relevance apart from increasing the classification complexity. We have identified these variables and listed them in this thesis. Even though both the selected classification algorithms are good at classifying the collected workloads, hierarchical clustering algorithm has an additional advantage of showing the degree of correlation among clusters. This can be important in the area of database workload shift detection.

Description

Keywords

Workload Characterization

Citation