Workload Characterization of Autonomic DBMSs using Statistical and Data mining techniques
No Thumbnail Available
Date
2008-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Autonomic configuration is one of the most important components of an autonomic system.
Database Management Systems (DBMSs) are one of the areas where autonomic configuration
is highly required. In order for a DBMS to configure itself on changing external workloads, it
should be able to detect and classify the workloads into their dominant categories, mainly into
DSS (Decision Support Systems) and OLTP (Online Transaction Processing). Previous
research works in this area have proposed a methodology for classification of workloads. But
the tests are performed using limited algorithms and on only one commercial DBMS.
In this thesis a model where an autonomic DBMS can identify and characterize the type of
workload acting up on it is developed and the most important database status variables which
are highly affected by changing workloads are identified. This is important for a self
configuring autonomic DBMS because it needs to reconfigure itself based on identified
changing workloads. Two algorithms are selected for database workload classification:
hierarchical clustering and classification & regression tree for classifying database workloads
after running database workloads from TPC benchmark queries and transactions. The costs of
these workloads are measured in terms of status variables of the selected DBMS (MySQL).
These costs are used to show whether a workload is DSS or OLTP using the selected
classification algorithms.
After a set of extensive experiments and analyses, we have found out that all the DBMS status
variables are not equally important in classifying the collected workloads. In fact, some of the
workloads do not have a significant relevance apart from increasing the classification
complexity. We have identified these variables and listed them in this thesis. Even though both
the selected classification algorithms are good at classifying the collected workloads,
hierarchical clustering algorithm has an additional advantage of showing the degree of
correlation among clusters. This can be important in the area of database workload shift
detection.
Description
Keywords
Workload Characterization