Customer Behavior Analysis Using Rough Set Approach

The customer relationship management (CRM) is a business methodology used to build long term profitable customers by analyzing customer needs and behaviors. The customer behavior is analyzed by choosing important attributes in the customer database. The customers are then segmented into groups according to their attribute values. The rules are generated using rule induction algorithms to describe the customers in each group. These rules can be used by the entrepreneur to predict the behavior of their new customers and to vary the attraction process for existing customers. In this paper a new rule algorithm has been proposed based on the concepts of rough set theory. Its performance has been compared with LEM2 (Learning from Examples Module, version 2) algorithm, an existing rough set based rule induction algorithm. Real data set of the customer transaction is used for analysis. Recency(R), Frequency (F), Monetary (M) and Payment (P) are the attributes chosen for analyzing customer data. The proposed algorithm on average achieves 0.439% increase in sensitivity, 0.007% increase in specificity, 0.151% increase in accuracy, 0.014% increase in positive predictive value, 0.218% increase in negative predictive value and 0.228% increase in F-measure when compared to LEM2 algorithm.


Introduction
Customer relationship management (CRM) technology is a mediator between customer management activities in all stages of a relationship (initiation, maintenance and termination) and business performance [41].It helps industries to gain insight into the behavior of customers and their value so that the enterprise can increase their profit by acting according to the customer characteristics.It is classified into operational and analytical.Operational CRM refers to the automation of business processes whereas analytical CRM refers to the analysis of customer characteristics and behaviors.Analytical CRM helps the entrepreneur to discriminate their customers and decide their marketing activities accordingly [30].It consists of four ideologies namely customer identification, customer attraction, customer retention and customer development.Customer identification is the process in which the customers are grouped and their characteristics are analyzed.Customer attraction is the process in which the customers buy for the next time by providing customer service, coupon distribution, direct mailing and discounts.Customer retention is the process in which the customer's needs are satisfied by introducing new products and rectifying their complaints.Customer development involves in expansion of transaction intensity, transaction value and individual customer profitability.Customer identification is the most important phase in analytical CRM because once the customer is identified correctly; he can be retained and developed further.The customer identification phase consists of customer segmentation and target customer analysis.Customer segmentation involves in segmenting customers into predefined number of customer groups.Target customer analysis involves in analyzing customer behavior or characteristics in each customer group.It helps the entrepreneur to vary the attraction process for existing customers and to predict new customer's behaviors [30].Data mining techniques are good at extracting and identifying useful information and knowledge from enormous customer databases, and for making different CRM decisions.The application of data mining techniques in CRM is an emerging trend in the global economy [2].
Data mining is a collection of techniques for efficient automated discovery of previously unknown, valid, novel, useful and understandable patterns in large databases.These patterns are used in an enterprise's decision making process [19].The tasks that can be performed in data mining are clustering, association rule mining, rule induction and classification [21].Clustering is an unsupervised classification used to group data with similar characteristics.It produces clusters for the given input data where data in one cluster is more similar when compared to data in other clusters.Association rule mining produces dependency rules which will predict the occurrence of an attribute based on the occurrence of other attributes in the data base.Rule induction belongs to supervised learning where data are already clustered into groups and it generates rules by finding regularities in the data in each cluster.Rules are in the form of If-Then condition.If part is called as antecedent and Then part is called as consequent.Antecedent contains conditional variables and consequent contains single decision variable.The conditional variables are the attributes in the given data and the decision variable is the cluster number assigned to the data using clustering algorithm.Rules generated for a cluster constitute the rule set for that cluster.Each data in the cluster should be described by at least one rule in the rule set of that cluster.This property of rule induction algorithm is called completeness.Each rule in the rule set of a cluster should be satisfied only by the data in that cluster.Rule set for a cluster should cover all the data within that cluster and no rule should be satisfied by any data in other clusters.This property of rule induction algorithm is called consistency.The data's satisfied by rules are not mutually exclusive because a data can be described by any number of rules.Classification on the other hand also generates rule for describing data in each cluster.The main difference between classification and rule induction is that the classification rules are mutually exclusive which means each data in the database is described by exactly only one rule [2].Rule induction is used to describe the characteristics of the data rather than classification rules because in real data set, each data has to be described by all of its possible combinations of attributes value which means only one rule for each data is not sufficient.Clustering and rule induction of data mining technique is used for customer segmentation and target customer analysis of customer identification phase in CRM.
In this paper, an improved rule induction algorithm based on rough set theory has been developed to generate rules for clustered customer's data.The proposed algorithm has been compared with LEM2, a rough set based approach.The rest of the paper is organized in the following: In Section 2 we describe the overview of customer relationship management, clustering algorithms, rule induction algorithms and LEM2 algorithm.In Section 3 we propose an improved rule induction algorithm based on rough set approach.In Section 4 we compare the prediction results obtained using rule induction algorithms.Finally in Section 5 we conclude the best rule induction algorithm according to the criteria chosen for comparison.

Related Works
Customer Relationship Management comprises a set of processes and enabling systems supporting a business strategy to build long term, profitable relationships with specific customers [27].It is a philosophy of business operation for acquiring and retaining customers, increasing customer value, loyalty and retention, and implementing customer-centric strategies [30].It is an important technology in every business because the business is customer centric.It consists of identifying, attracting, retaining and developing customers.Customer identification requires a comprehensive understanding of enterprise customers [12].It includes target customer analysis and customer segmentation.The clustering algorithms are used for customer segmentation and rule induction algorithms are used for target

Customer Relationship Management
Customer segmentation gives a quantifiable way to analyze the customer data and distinguish the customers based on their purchase behavior [40].It is the process of dividing customers into homogeneous groups on the basis of common attributes [37].It is typically done by applying some form of cluster analysis to obtain a set of segments [5].In this way the customers can be grouped into different categories for which the marketing people can employ targeted marketing and thus retain the customers.Target customer analysis is used to analyze the customers in each cluster or segment so as to predict the new customer to the appropriate cluster.The customers are segmented and then rules are generated to describe them.These rules can be used to classify the new customers to the appropriate cluster who have similar purchase characteristics.The customer identification is followed by customer attraction which motivates each segment of customers in different way.Customer retention and customer development deals with retaining the existing customers and maximizing the customer purchase value respectively [30].
The attributes which describe the purchasing behavior of the customers are first chosen before customer segmentation because it requires a comprehensive understanding of enterprise customers [12].RFM model is used to identify and represent the customer characteristics by three attributes namely Recency (R), Frequency (F) and Monetary (M).R indicates the interval between the time that the latest consuming behavior happens and present.F indicates the number of transactions that the customer has done in a particular interval of time.M indicates the total value of the customer's transaction amount in a particular interval of time [40].
In [7] RFM method, K-means clustering algorithm and LEM2 are used to obtain the classification rules.According to [23], customers with the same pattern of purchasing are only clustered and RFM is used to calculate the value of each cluster.Tsai and Chiu (2004) in [36] proposed a market segmentation methodology based on product specific variables such as items purchased and the associative monetary transactional history of customers and they used RFM to analyze the relative profitability of each customer's cluster.In [38] customer behavior is identified using RFM model and grey correlation model is used for customer targeting.Yeh et al. (2009) in [43] extended the traditional RFM model by including two parameters, time since the first purchase and churn probability.In [20] RFM analysis along with K-means clustering is used to study customer's fluctuations over different time frames.In [25]- [26] customer lifetime value (CLV) is calculated using RFM.In [9], [34] WRFM (Weighted RFM) is used instead of RFM.In this weights were assigned to R, F, and M depending on characteristics of the industry.Stone (1995), suggested for placing the highest weight on the Frequency, followed by the Recency, with the lowest weight on the Monetary measure [34].In Chuang and Shen (2008), Monetary had the most value and Recency had the least value [9].
The attributes chosen to describe the customer behavior and the weightage of the attributes will differ from domain to domain.Here, the RFMP model which has four attributes R, F, M and P with equal weights is used.RFMP model is the modified RFM model where the payment details of the customers are considered.P indicates the average time interval between payment and purchase date.Payment detail of the customer is an important attribute because any two customers with same R, F, M value but different P value cannot be treated equally by the company.The customers are segmented using their consuming behavior via RFMP attributes.This ensures that the standards which cluster customer value are not established subjectively, so that the clustering standards are established objectively based on RFMP attributes [7].
The clustering algorithms for customer segmentation and rule induction algorithms for target customer analysis are discussed in section 2.2 and 2.3 respectively.

Clustering Algorithms
The customers are segmented using clustering based on their important attributes like R, F, M and P. It is an unsupervised classification where there are no predefined classes.The data in the data set is assigned to one of the output class depending upon its distance to other data.The data within each class forms a cluster.The number of clusters is equal to the number of output classes.The clustering technique produces clusters in which the data inside a cluster has high intra class similarity and low inter class similarity.The similarity is measured in terms of the distance between the data.For a numerical dataset, the distance between two data can be calculated using Euclidean, Manhattan and Minkowski distance.
Euclidean distance is given by Minkowski distance is given by In the above equations, n indicates the number of attributes in the given data, x and y are the data in the data set, d(x, y) is the distance between data x and y.In Minkowski distance if p=1 it is similar to Manhattan and if p=2 it is similar to Euclidean.In Euclidean distance the variation in one attribute is different from the variation in another attribute but in Manhattan distance the sum of the variation in each attribute is considered.In our real data set all the attributes R, F, M and P are equally weighted, so the variation in all the attributes is to be equally treated.Thus in this case Manhattan distance is used instead of Euclidean distance.
Clustering is mainly classified into hierarchical and partitioning algorithms.The hierarchical algorithms are further sub divided into agglomerative and divisive.Agglomerative clustering treats each data point as a singleton cluster and then successively merges clusters until all points have been merged into a single cluster.Divisive clustering treats all data points in a single cluster and successively breaks the clusters till one data point remains in each cluster.Partitioning algorithms partition the data set into predefined k number of clusters [14].K-means algorithm is one of the most commonly used clustering algorithms [7].It is a partitioning clustering algorithm which partitions the database D of n objects into a set of k clusters.The output differs when the initial centers for clusters are varied.The distance between objects in same cluster is less when compared to the distance between objects in different cluster.Each object is placed in exactly one of the k non-overlapping clusters [1].The steps in K-means algorithm are as follows: 1. Initialize centers for k clusters randomly 2. Calculate distance between each object to k-cluster centers using the Manhattan distance formula given by Equation 13.Assign objects to one of the nearest cluster center 4. Calculate the center for each cluster as the mean value of the objects assigned to it 5. Repeat steps 2 to 5 until the objects assigned to the clusters do not change

Rule Induction Algorithms
The rule induction algorithms are used to generate rules to describe the characteristics of the customers in each segment.Decision trees (DT), artificial neural networks (ANN), genetic algorithms (GA) and rough set theory (RST) are used to produce rules [39].DT is a flow-chart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent cluster number [21].ANN is a large number of highly interconnected processing elements (neurons) that uses a mathematical model, computational model or non-linear statistical data modeling tools for information processing to capture and represent complex input/output relationships [7].GA, which were formally introduced in the United States in the 1970s by John Holland at University of Michigan, are search algorithms applied to solve problems on a computer based on the mechanics of natural selection and the process of natural evolution [22], [29].In DT, too many instances lead to large decision trees and decrease classification accuracy rate.In ANN, number of hidden neurons, number of hidden layers and training parameters need to be determined, and has long training times.Moreover, ANN served as black box which leads to inconsistency of the outputs, is a trial-and-error process.GA also has some drawbacks such as slow convergence, a brute computing method, a large computation time and less stability.With respect to rough set theory, the advantages are they do not require any preliminary or additional parameter about the data, less expensive or time to generate rules, ability to handle large amounts data, yield understandable decision rules and stable [31]- [33].
It can be used to make decisions in any underlying business [42].In the experimental results of [7], accuracy rate 1is more in LEM2 when compared to DT and ANN.
RST introduced by Pawlak in 1982 is a knowledge discovery tool that can be used to help induce logical patterns hidden in massive data.Some of the applications of RST in the field of knowledge discovery are dimensionality reduction, clustering, rule induction and discretization.The concept LERS (Learning from Examples using Rough Sets) was developed for rule induction.The basic algorithms based on LERS are LEM1, LEM2 and AQ.LEM1 algorithm computes global covering of attributes for producing rules.LEM2 algorithm on the other hand computes local covering and then converts into a rule set, so it gives better results compared to LEM1.AQ algorithm developed by R.S.Mickalski generates cover for each concept by computing stars and selecting from them single complexes to the cover.In the worst case the time complexity of computing conjuncts of partial stars is O(nm) where n is the number of attributes and m is the number of data in the data set.So for large data set, AQ is not efficient when compared to LEM2.LEM2 of LERS is most frequently used since it gives better results [8], [10], [14], [16] and [18].Extensions of LEM2 are MLEM2 and LEM3.MLEM2 extends LEM2 capability by inducing rules from data with both symbolic and numerical attributes including data with missing attribute values.It produces the rules sets with the smallest number of rules but needs an additional tool to simplify conditions using numerical attributes [17].LEM3 is based on incremental learning of production rules from examples so the memory space requirement is minimal but it uses the same rule generating procedure of LEM2 [6].The variable precision rough set model (VPRS), introduced in [44], is a generalization of the original rough set data analysis in the direction of relaxing the strict boundaries of equivalence classes.It assumes that rules are only valid within a certain part of the population, and it is able to cope with measurement errors.In [3], [28] and [35] approaches based on VPRS are dealt.In [13], extension of the rough set theory based on the dominance principle is dealt.This method is mainly based on substituting the indiscernibility relation by a dominance relation in the rough approximation of decision classes.However, the decision rules induced from the lower approximations of the Dominance-based Rough Set Approach (DSRA) are sometimes weak in that only a few objects support them.For this reason, a variant of DSRA, called VC-DRSA, has been proposed in [4].It allows some inconsistency in the lower approximations of sets by a parameter called consistency level.It is more general than the classic functional or relational model and is more understandable for users because of its natural syntax and because it considers the inconsistency of real-life.The problem domain considered in the paper has complete and consistent data, so the algorithms based on LERS has been concentrated and the LEM2 algorithm has been taken for comparison.

LEM2 Algorithm
It is a rule induction algorithm based on rough set theory.It is used to find regularities hidden in the data and express in terms of rules.The clustering algorithm output is given as an input so that rules are generated for each cluster.
Rules are in the form of if (attribute-1, value-1) and (attribute-2, value-2) and ... and (attribute-n, value-n) then (decision, value) In the database, each row is called as a case and each column is called as an attribute.Attributes are independent variables and decision is a single dependent variable.Here, Recency, Frequency, Monetary, Payment are attributes and cluster number is the decision variable.The set of all cases labeled by same decision value is called a concept.A case x is covered by a rule r if and only if every condition (attribute-value pair) of r is satisfied by the corresponding attribute value for x.A concept C is completely covered by a rue set R if and only if for every case x from C there exists a rule r from R such that r covers x.R contains set of rules for each decision value.R is complete if and only if every concept from the data set is completely covered by R. A rule r is consistent if and only if for every case x covered by r, x is a member of the concept C indicated by r.R is consistent if and only if every rule from R is consistent with the data set.Rule induction produces complete and consistent rule set [18].
A block of an attribute-value pair t = (a, v), denoted [t], is the set of all examples that for attribute a have value v.A concept, described by the value w of decision d, is denoted [(d, w)], and it is the set of all examples that have value w for decision d.Let B be a concept and let T be a set of attribute-value pairs.Concept B depends on a set T if and only if Set T is a minimal complex of concept B if and only if B depends on T and T is minimal.Let τ be a nonempty collec- tion of nonempty sets of attribute-value pairs.Set τ is a local covering of B if and only if the following three condi- tions are satisfied: 1. each member of τ is a minimal complex of B, For each concept B, the LEM2 algorithm induces production rules by computing a local coveringτ .Any set T, a minimal complex which is a member ofτ , is computed from attribute-value pairs selected from T (G) of attributevalue pairs relevant with a current goal G, i.e., pairs whose blocks have nonempty interaction with G.The initial goal G is equal to the concept and then it is iteratively updated by subtracting from G the set of examples described by the set of minimal complexes computed so far.Attribute-value pairs from T which are selected as the most relevant, i.e., on the basis of maximum of the cardinality of [t] ∩ G, if a tie occurs, on the basis of the small cardinality of [t].The last condition is equivalent to the maximal conditional probability of goal G given attribute-value pair t.For a set X, │X│ denotes the cardinality of X [15].The procedure of LEM2 is as follows:

Proposed Algorithm
In LEM2 algorithm, the rules generated for each cluster is complete and consistent but it doesn't produce all the consistent rules in a cluster because once a consistent rule is discovered, the objects satisfying that rule is eliminated and rules are discovered for the rest of objects.Due to this the number of rules produced for a particular cluster becomes less and consequently the chances of predicting the customer to the correct cluster becomes less.In order to overcome this disadvantage the proposed rule induction algorithm produces all the consistent rules and complete rules for the objects in the cluster.Target cluster is the cluster for which rules are generated.Remaining clusters are the clusters other than target cluster.A block of an attribute-value pair t = (a, v), denoted [t], is the set of all examples that for attribute a have value v.A block of n attribute-value pair t 1 = (a 1 , v 1 ), t 2 = (a 2 , v 2 ), and so on, t n = (a n , v n ) denoted [t 1 , t 2 , . . .t n ], is the set of all examples that for attribute a 1 have value v 1 , for attribute a 2 have value v 2 , and so on, a n have value v n .A block of size 1 has one attribute -value pair.A block of size n has n attribute -value pairs.For a set X, │X│ denotes the cardinality of X.The procedure for improved rule induction algorithm is as follows: begin U -Set of all objects in the data set B -Set of all objects in the target cluster C := U -B (set of all objects in U but not in B) G := B; clustering algorithm to segment the 3,278 customers into three groups or clusters.The number of actual cluster required is given by the business people.This number is determined by them according to the number of different scheme to be introduced as their promotional activity.Here the company requires three clusters so we segment the customers into three clusters.As a result, cluster1 contains 1,114 customers, cluster2 contains 1,064 customers and cluster3 contains 1,100 customers.LEM2 and proposed rule induction algorithms are used to generate rules for training data (two-third in each cluster).The test data (remaining one-third in each cluster) is given as input for LEM2 and proposed rule induction algorithm to predict the cluster value according to their generated rules for training data.The training and testing data are mutually exclusive.In training data, cluster1 contains 743 customers, cluster2 contains 709 customers and cluster3 contains 733 customers.In test data, cluster1 contains 371 customers, cluster2 contains 355 customers and cluster3 contains 367 customers.The performance criteria for prediction using rule induction algorithms are false positive (FP), false negative (FN), true positive (TP), true negative (TN), sensitivity, specificity, accuracy, precision, positive predictive value (PPV), negative predictive value (NPV), F-measure.
False Positive (FP) is the number of objects that don't belong to a cluster but are allocated to it.False Negative (FN) is the numbers of objects that belongs to a cluster but are not allocated to it.True Positive (TP) is number of objects that are correctly predicted to its actual cluster.True Negative (TN) is the number of objects that get predicted to a cluster but actually don't belong to [19].Sensitivity is also called as true positive rate or recall.Sensitivity relates to the test's ability to identify positive results.It measures the proportion of actual positives which are correctly identified as such.Specificity relates to the ability of the test to identify negative results.It measures the proportion of negatives which are correctly identified.Accuracy is defined as proportion of sum of TP and TN against all positive and negative results.Positive predictive value or precision is defined as proportion of the TP against all the positive results (both TP and FP).Negative predictive value is defined as proportion of the TN against all the negative results (both TN and FN) [11].The F-measure can be used as a single measure of performance of the test.The F-measure is the harmonic mean of precision and recall [24].The formulas are given below: It is observed that the k-means clustering algorithm produces nearly 1000 customers in each cluster.So, LEM2 and proposed rule induction algorithms are repeated numerous times where training data (two-third) and test data (onethird) are randomly chosen from the data set such that training and testing data are mutually exclusive.For many runs, it produces the previously seen run value.So the twenty runs which produces different values are presented in the Table 1.The performance criteria for prediction are calculated for all the twenty cases.The Table 1 shows the false positive, false negative, true positive, true negative produced by the rule induction algorithms for all the twenty cases.The objective of the rule induction algorithm is to minimize false positive, false negative and to maximize true positive and true negative.From the Table 1 it is observed that the proposed rule induction algorithm has minimum FP, minimum FN, maximum TP and maximum TN for all the twenty cases when compared to LEM2.Sensitivity, specificity, accuracy, PPV, NPV and F-measure are calculated using formula 5 to 10 respectively for each algorithm in all the twenty cases.The output is tabularized in Table 2 and Table 3.The objective of the rule induction algorithm is to maximize sensitivity, specificity, accuracy, PPV, NPV and F-measure.From the Table 2 and 3, it is observed that the proposed rule induction algorithm has equal or maximum value than LEM2 in all the twenty cases.The proposed algorithm on average achieves 0.439% increase in sensitivity, 0.007% increase in specificity, 0.151% increase in accuracy, 0.014% increase in positive predictive value, 0.218% increase in negative predictive value and 0.228% increase in F-measure when compared to LEM2 algorithm.The percentage increase in each performance criteria might seems to a smaller value but in real data set where customers are in terms of thousands not in hundreds the proposed algorithm has significant improvement than LEM2.For example, the average accuracy obtained using LEM2 is 99.622% and that of proposed algorithm is 99.773%.LEM2 accuracy for 3278 customers is 3265(i.e.99.622*3278/100) and that of proposed algorithm is 3270(i.e.99.773*3278/100).Here five more customers are predicted correctly using proposed algorithm when compared to LEM2.LEM2 produces only the minimal set of rules whereas proposed rule induction algorithm produces all the possible set of consistent rules to describe the records in the cluster.Thus the proposed algorithm characterizes the customers in each cluster clearly by producing all the consistent rules but eliminates redundant or duplicate rules.Since the number of rules to describe the customer is increased, the prediction accuracy is also improved.This statement is proved experimentally by comparing the performance measure.Hence the chances of judging a customer wrongly is reduced and allotting scheme to the customer is done correctly, which help the business to improve their customer life time value.Though the proposed algorithm produces more rules than LEM2, the computation complexity is m times less than LEM2 algorithm where m indicates the number of attributes considered for analysis.True Positive, True Negative, False Positive and False Negative are the parameters required to calculate the performance criteria measures of prediction.The complexity of calculating these parameters are linear with respect to the number of generated rules.The number of rules generated for each cluster or segment is very less when compared to the number of customers dealt.So this calculation complexity is negligible when compared to the rule induction algorithm complexity.Thus, the proposed algorithm is an improved algorithm in terms of cost benefit analysis.

Conclusion
Customer relationship management is a technology which helps the entrepreneur to improve their business volume by improving customer relationship.The customer identification is the important phase in CRM.It involves in segmenting the customers and analyzing their behavior for further customer attraction, retention and development.In this paper clustering technique in data mining has been used for customer segmentation and rule induction is used for describing customer behavior in each segment.The entrepreneur can employ different benefit schemes for customer in different clusters or segments.So, classifying a customer to the cluster plays an important role in CRM.For a good rule induction algorithm, the customer's behavior in each cluster should be correctly characterized so that the new customers are predicted to the appropriate cluster.The performance evaluation criteria are chosen based on the prediction accuracy of rule induction algorithm.The proposed algorithm on average achieves 0.439% increase in sensitivity, 0.007% increase in specificity, 0.151% increase in accuracy, 0.014% increase in positive predictive value, 0.218% increase in negative predictive value and 0.228% increase in F-measure when compared to LEM2 algorithm.It has been proved that the time complexity of LEM2 is m times more than the proposed algorithm where m indicates the number of attributes chosen for analysis.Thus, it has been evident from the results that the proposed algorithm is an improved rule induction algorithm which produces better performance in prediction and has less computation when compared to LEM2 algorithm.

3 .
τ is minimal (τ has the smallest possible number of members) φ ; T (G) := {t│[t] ∩ G ≠ φ }; while T = φ or [T] ⊃ B begin select a pair t∈ T(G) such that │[t] ∩ G│ is maximum; if a tie occurs, select a pair t∈ T(G) with the smallest cardinality of [t]; if another tie occurs, select first pair; T := T ∪ {t}; G := [t] ∩ G; T(G) := {t│[t] ∩ G ≠ φ }; T(G) := T(G) -T; end {while}; for each t∈ T do if [ T -{t} ] ⊆ B then T := T -{t}; τ := τ ∪ {T}; run exactly │d│times, where │d│ is the number of decision classes.The number of decision classes indicates the number of clusters produced by K-means algorithm.The while loop (G ≠ φ ) is performed at most n times because we may have the whole set as the upper approximation to every decision class.Here n is the number of objects in the training set.To select a pair t ∈T(G) as the best one, we have to iterate n * m times so that all poss- ible pairs of attributes and values are examined.Here m is four which indicates the number of attributes in the training set.T contains m elements at most and τ contains n elements at most.So the computational complexity of for loop (for each t ∈T) is m*n.Therefore the total computational complexity of LEM2 is equal to O (│d│* n * (n * m) * (m * n)) which is simplified as O (│d│* m 2 * n 3 ).

Table 1 :
FP, FN, TP and TN for rule induction algorithms

Table 2 :
Sensitivity, specificity and accuracy for rule induction algorithms

Table 3 :
PPV, NPV and F-measure for rule induction algorithms