• Rough set (RS) theory, introduced by Z. Pawlak at the beginning of the 1980s, is extensively used for pattern recognition. • RS has emerged as a powerful mathematical tool for handling uncertainty, that is, indiscernibility between objects in a set. • The utility of RS is also well recognized in various knowledge discovery processes. • In computer science, a rough set, first described by Polish computer scientist Zdzisław I. Pawlak, is a formal approximation of a crisp set (i.e., conventional set) in terms of a pair of sets which give the lower and the upper approximation of the original set. • In the standard version of rough set theory (Pawlak 1991), the lower- and upper- approximation sets are crisp sets, but in other variations, the approximating sets may be fuzzy sets. • The notion of Rough sets was introduced by Z Pawlak in his seminal paper of 1982 (Pawlak 1982). • It is a formal theory derived from fundamental research on logical properties of information systems. • Rough set theory has been a methodology of database mining or knowledge discovery in relational databases. • In its abstract form, it is a new area of uncertainty mathematics closely related to fuzzy theory. • We can use rough set approach to discover structural relationship within imprecise and noisy data. Rough sets and fuzzy sets are complementary generalizations of classical sets. The approximation spaces of rough set theory are sets with multiple memberships, while fuzzy sets are concerned with partial memberships. • The rapid development of these two approaches provides a basis for “soft computing, ” initiated by Lotfi A. Zadeh. Soft Computing includes along with rough sets, at least fuzzy logic, neural networks, probabilistic reasoning, belief networks, machine learning, evolutionary computing, and chaos theory. • Goals of Rough Set Theory – • The main goal of the rough set analysis is the induction of (learning) approximations of concepts. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data. • It can be used for feature selection, feature extraction, data reduction, decision rule generation, and pattern extraction (templates, association rules) etc. • Identifies partial or total dependencies in data, eliminates redundant data, gives approach to null values, missing data, dynamic data and others. A survey on rough set theory and its applications • Proposed by Professor Pawlak in 1982, the rough set theory is an important mathematical tool to deal with imprecise, inconsistent, incomplete information and knowledge. • Originated from the simple information model, the basic idea of the rough set theory can be divided into two parts. • The first part is to form concepts and rules through the classification of relational database. • And the second part is to discovery knowledge through the classification of the equivalence relation and classification for the approximation of the target. • As a theory of data analysis and processing, the rough set theory is a new mathematical tool to deal with uncertain information after probability theory, fuzzy set theory, and evidence theory. • Because of novel thinking, unique method and easy operation, the rough set theory has become an important information processing tool in the field of intelligent information processing. • And it has been widely used in machine learning, knowledge discovery, data mining, decision support and analysis, etc. • In 1992, the first International Workshop on rough set theory was held in Poland. And the rough set theory was considered as a new research topic in computer science by the ACM in 1995. • 2. Basic concepts of rough sets • One of the main research problems of the rough sets is the approximation of sets, and the other one is the algorithms of the analysis or reasoning for related data. Some basic concepts on rough set theory are reviewed as follows. • Consider a simple knowledge representation scheme in which a finite set of objects is described by a finite set of attributes. Formally, it can be defined by an information system S expressed as the 4-tuple. • S=〈U,R,V,f〉, R=C∪D, • • where U is a finite nonempty set of objects, R is a finite nonempty set of attributes, the subsets C and D are called condition attribute set and decision attribute set, respectively. • V=∪a∈RVa, where Va is the set of values of attribute a, and card (Va) > 1, and f:R→V is an information or a description function. • In Table 1 U={x1,x2,⋯,x6} is a finite nonempty set, also called a universe, and R={Headache,Myalgia,Temperature,Flu} is a finite nonempty set, also called an attribute set. • Definition 1 • (Indiscernible relation) [5] Given a subset of attribute set B⊆R, an indiscernible relation ind(B) on the universe U can be defined as follows, • ind(B)={(x,y)|(x,y)∈U2,∀b∈B(b(x)=b(y))} • Table 1. An information table [5]. Individual Headache Myalgia Temperature Flu number x1 Yes Yes Normal No x2 Yes Yes High Yes x3 Yes Yes Very high Yes x4 No Yes Normal No x5 No No High No x6 No Yes Very high Yes • The equivalence relation is an indiscernible relation. And the equivalence class of an object x is denoted by [x]ind(B), or simply [x]B and [x], if no confusion arises. The pair (U, [x]ind(B)) is called an approximation space. • Definition 2 • (Upper and lower approximation sets) [5] Given an information system S= 〈U,R,V,f〉, for a subset X⊆U, its lower and upper approximation sets are defined, respectively, by • Decision Tree Classification Algorithm • Introduction • Decision Trees are a type of Supervised Machine Learning (that is you explain what the input is and what the corresponding output is in the training data) where the data is continuously split according to a certain parameter. • The tree can be explained by two entities, namely decision nodes and leaves. The leaves are the decisions or the final outcomes. And the decision nodes are where the data is split. • D e c i s i o n Tre e i s a S u p e r v i s e d l e a r n i n g technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree- structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. • In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches. • The decisions or the test are performed on the basis of features of the given dataset. • It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. • It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure. • In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. • A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees. • Below diagram explains the general structure of a decision tree: • Note: A decision tree can contain categorical data (YES/NO) as well as numeric data. • Decision Tree Terminologies • Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets. • Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. • Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions. • Branch/Sub Tree: A tree formed by splitting the tree. • Pruning: Pruning is the process of removing the unwanted branches from the tree. • Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes. • Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or Not. • So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM). Attribute Selection Measure (ASM). • The root node splits further into the next decision node (distance from the office) and one leaf node based on the corresponding labels. • The next decision node further gets split into one decision node (Cab facility) and one leaf node. • Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider the below diagram: • Advantages of the Decision Tree • It is simple to understand as it follows the same process which a human follow while making any decision in real-life. • It can be very useful for solving decision- related problems. • It helps to think about all the possible outcomes for a problem. • There is less requirement of data cleaning compared to other algorithms. • Disadvantages of the Decision Tree • The decision tree contains lots of layers, which makes it complex. • It may have an overfitting issue, which can be resolved using the Random Forest algorithm. • For more class labels, the computational complexity of the decision tree may increase.