Frequent Pattern Mining Approach for a Mobile Web Service Environment Using Service Utility

Mobile web services pattern mining is an emerging field today, in which utility is playing an important role. Utility may be the profit, cost or price of an item. In the case of mobile web services, accessed preference is considered as a utility. With the help of utility mining, one can extract highly interesting frequent patterns of mobile web services. In previous related studies, most of the approaches use utility as an essential parameter to discover interesting patterns, but they also generate a large number of uninteresting patterns too. Another problem is related to computational time; because no filtration is applied and computational time is too much. In this paper, an efficient approach, Utility Based Frequent Pattern Mining, is proposed. It extracts utility based frequent patterns with high filtration in less computing time. The experimental results show that the proposed approach has good performance in terms of execution efficiency and memory utilization.


Introduction
Knowledge discovery is used to extract useful rules or patterns from data sets.To discovered knowledge from massive data, various data mining techniques are used.Frequent pattern mining is one of the more important approaches for generating hidden knowledge from massive data.For frequent pattern mining, Apriori was proposed [1], and it has been revealed that the algorithm has limitations, multiple scans and generating a large number of candidate itemsets.To solve this problem, Han et al. proposed the FP-Growth algorithm [10].This can discover all frequent patterns with only two database scans.Frequent pattern mining can be applied to transactional data as well as sequential data [2].There are different application areas of frequent pattern mining.Frequent pattern mining can be applied to mine knowledge from the mobile service accessed sequences.As we know, smartphones are widely used by people.Smartphones are used to perform online transactions, retrieving information, social messaging, video calling, chatting, etc. Services which are accessed by smartphones or laptop devices are called mobile web services.These web services are lightweight applications used to perform a specific task such as booking a ticket through a mobile app or sending a message through WhatsApp.A particular user may access series of services at different times at different locations or a single location.To extract the interesting pattern of services, data mining techniques are used.By sequential pattern mining [2], [17] web services sequence can be extracted.These sequences are helpful to find the behavior of a specific user.The generated web service patterns are used in different fields like behavior analysis of users, finding most accessed services occurrences, planning a new service, promoting business, etc. Figure 1 shows the simple scenario for mobile web service sequence generation.In figure 1, different mobile web services are accessed by the mobile users at different locations.Here Home, Hospital, Restaurant, Park, etc., are the different locations and WhatsApp, News, Facebook, Chat, etc., are mobile web services which as denoted by S1, S2, S3 and so on.Id 1, 2, 3... n represent different service access sequences, such as Id 1= {S5, S3, S1, S6}, Id 2= {S2, S1, S3}, and so on [16], [17].The traditional sequence pattern mining approach only considers the ítems that are sequentially purchased.They do not include any constraint or factor like price, profit or preferences of items.Sometimes the low frequency of items may be important.For example, let us assume there exists a pattern <gmail, facebook> in a sequence and also assume it is a low-frequency pattern in the sequence database.To handle this, Yun et al. proposed a new research approach, namely weighted sequential pattern mining [29], in which different weights are assigned to items by the importance of each item.Weighted frequent pattern mining [31] considers the importance of items and share frequent pattern mining [7] represents the occurrence of items in transactions as nonprofit binary values.In these frameworks, patterns with high weight value can be extracted even if they occur infrequently.Lan et al. [13] proposed an approach for finding weighted sequential patterns.They apply this approach to the traditional transaction and items.Some items with high profit but low count may not be discovered in a sequence database by using traditional sequential pattern mining approaches.To address this problem, Ahmed et al., proposed a new research approach, namely utility sequential pattern mining, which considers not only quantities and timestamp of items in sequence, but also individual profits of items in a quantitative sequence database [5].To find the most valuable pattern, utility plays an important role in data mining.Utility mining [23] has emerged as one of the most valuable research topics in the frequent pattern mining field.In utility mining, each item has an external utility such as profit, price and internal utility which indicates the non-binary value of items in transactional sequence [32].2. The UBFPM (Utility Based Frequent Pattern Mining) approach has been proposed for finding interesting mobile web services patterns.

Sms
3. The proposed approach speeds up the execution efficiency in finding utility based patterns.
4. In the experiments, various datasets, both real and synthetic, are used to evaluate the performance of the proposed approach with state-of-the-art algorithms.The results show that the proposed UBFPM approach has good performance regarding execution time, memory consumption and a number of candidate generations.
The roadmap of the paper is as follows: the next section briefly recalls the history of the work related to this study.Preliminaries and problems are defined in section 3. Section 4 introduces our approach.Section 5 presents the experimental results on multiple datasets.The last section presents the conclusions.

Definitions and Related Work
To clearly describe frequent pattern mining approach for mobile web service, a set of relevant terms and related study is discussed in this section.It includes problem descrption and relevant definitions.

Description of the Problem and Definitions
Let us assume mobile web services with accessing sequence database given in Table 1, in which each row consists of two features, ID, and services sequence.There are six mobile web services in the dataset, respectively denoted as S1 to S6.We also assume the utility value of each mobile web service as shown in Table 2. Utility is a measure of how useful i.e. profitable a service is.Here accessing preference of a service is considered as an utility value.These values are randomly generated for experiment and assumed in the example.We adopt definitions similar to those presented in the previous works [5], [6], [11], [16], [20], [22], [23], [31].Let a set of web service I be {S1, S2, … Sm}.An itemset X, containing k items, is called k-itemset and its length is k.
Ux(i)is the utility value of the mobile web service that appears in the position i of X.
For example, let X= (S1 S3 S5), based on the utility values presented in the table 2 is 0.9, 0.2, 0.5 respectively and |X|=3 therefore: Definition 4, Utility of a sequence: Let p= (X1 X2 X3 … Xn) The utility value of the service sequence ID2 (Up) is the summation of the utility values of all mobile web services sets which belong to p, divided by the cardinality of p [11].
Ux(i) is the utility value of the mobile web services set Xi that appears in the position i of p.
For example, in Table 2, since the utility values of the sequence <S4S6S2> are 0.8, 0.6 and 0.4 respectively, and the number of mobile web service in sequence <S4S6S2> is 3, then U<S4S6S2>=(0.8+0.6+0.4)/3=0.6.Definition 5, Utility of a subsequence: The utility value of a mobile web services subsequence r, Sr is the summation of utility values of all mobile web services in r over the number of sequence in r [11].That is, Where |r| and Ux are the number of mobile web services in the subsequence r, and the utility value of mobile web services set S in r, respectively.For example, in Tables 1 and 2, since the sixth sequence <S1S2><S4S6> consists of two sequences {S1S2} and {S4S6}, and the utility of two sequences are 0.65 and 0.7 respectively, then U<S1S2><S4S6> =(0.65+0.7)/2=0.675.Definition 6, Sequence max utility and Total Sequence max utility (TSMU): The sequence max utility value of a sequence p, SMU, is the maximum utility value among all mobile web services in the sequence p [11].The total SMU of database D, TSMU, is the summation of the SMU values of all sequences in D [11].That is, Where |D| represents the cardinality of the mobile web services accessing database (D).
For example, if we want to find the USV of sequence ID 4, i.e., <S2S3> then it is (t=3 because p4 is a subsequence of ID 1, 4 and 7, as of definition 1).Definition 8, Utility frequent sequential pattern: A subsequence r is called a utility frequent sequential pattern (FSP) if USV ≥ min_uti, where min_uti is a predefined minimum utility threshold.
For example in Table 1, a S2 service appears 7 times, and its utility is 0.4 in Table 2. Then its USV is (0.4+0.4+0.4+0.4+0.4+0.4+0.4)/8=35%.If min_uti=25%, then the sequence <S2> is a FSP.Definition 9, Utility sequence upper bound: The utility sequence upper bound pattern of a subsequence r, is the sum of SMU of the sequence including r in sequence database over the TSMU of the D [11].It is denoted as SUB and is defined as For example, in Table 1, the sequence <S5>is a subsequence of ID 1, 2, 3, 8, 9 and 10.Therefore, Definition 10, Utility frequent sequence upper bound pattern: A subsequence r is called utility frequent sequence upper bound pattern (FSUBP) if SUB≥ min_uti.
Problem statement: Mobile web service frequent pattern mining is a new application of frequent pattern mining as well as mobile computing.If a particular user visited multiple locations on a full day (24 hours), we have a sequence dataset about their visiting as well as service details which is linked to the user.Based on this information we can extract some new knowledge and facts.Utility based approach can be used to discovered frequent patterns from mobile web services sequences.In this regard, mobile web service accessed preference can be used as a utility value for a mobile web service.Suppose a user accessed a particular service 20 times in a complete day.This can be expressed as 0.2 utility values.To address the above reason, we propose a utility based approach to reduce the large number of generated candidates.The problem is to find a complete set of frequent service patterns in database D.

Frequent Pattern Mining
The process of extracting a set of items or subsequences that occur frequently in a dataset is known as frequent pattern mining.Different studies have been conducted to mine frequent patterns through transactional databases.
Firstly, Apriori started mining frequent itemsets from transactional databases [1].To get a better result than Apriori, FP-Growth method was later develop [10].FP-Growth has only one scan of the database.Hence it improved the efficiency of the algorithm.Several types of databases such as sequential, incremental, and stream are used for frequent pattern mining [31].The relative importance of items in databases can be found through weighted frequent pattern mining [32].In this method weight of a pattern is calculated by dividing the sum of weights of items by pattern length.

Utility Mining
In the transactional database, a profit, weight, importance or performance of an item can be considered as utility value [13], [17].The utility was firstly used by Chan et al. in transactional databases [8].To prune the search space, an estimation method was used named as Umining [24].Level-wise search method uses item discarding approach to reduce candidate generation [14].Various required information on utility mining is maintained using a treestructured, known as Huc-Tree [6].To discover high utility itemset, it is required to maintain downward closure property.It is done by transaction weighted utilization model, which is based on Apriori algorithm [15].Apriori-based utility mining approaches use multiple database scans for candidate generation.Another method, Incremental High Utility Pattern (IHUP) was proposed by Ahmed et al. to avoid multiple database scans that uses FP-Tree concept [5].
To enhance the performance of utility mining and getting higher itemset Tseng et al. proposed UP-Growth [22] algorithm, which included various strategies for mining.Next revised version of UP-Growth is UP-Growth+ [23], it decreases overestimated utilities.A tree-based high utility itemset mining algorithm MU-Growth is proposed by Yun et al. [31] which reduces the number of candidates.

Utility Based Frequent Pattern Extraction
High utility items may occur with a low frequency but have more importance.In a transaction, the gold item may have low frequency, but its value is higher.Transactional association rule mining [1] approaches use 0 or 1 for ítem absent or present in a sequence.Traditional approaches are not sufficient for high utility with low frequencies, then, utilitybased approaches are useful for frequent pattern extraction.To fulfill a business objective, Chan et al. proposed the idea of top-K patterns [8].To discover valuable frequent itemset, weighted itemset mining has been proposed [28].Yun et al. also uses an upper bound model to handle downward closure property [28].Later, to enhance the performance of weighted itemset mining various studies have been proposed [3], [4], [5], [9], [26], [27].

Utility Based Sequential Pattern Extraction
Transactions and timestamps are present in a sequential dataset.This dataset consists of the transaction Id, consumer detail and list of buying items.Frequent pattern generation is possible using these datasets.Agrawal et al.
proposed the AprioriAll, AprioriSome and DynamicSome algorithms for sequential pattern mining [2].Generalized Sequential Patterns (GSP) [21] and PrefixSpan [19] approaches were later developed for enhancing execution efficiency in sequential pattern mining.Yun et al. proposed various approaches to find weighted sequential patterns in sequential databases [30].Shie et al. proposed a valuable pattern mining approach to discover high utility itemsets in different shopping websites using quantities and profits [20].Next, Ahmed et al. proposed a new research approach, high utility sequential pattern mining, in which they consider the relationship order of an itemset with quantity and profit.According to traditional sequential pattern mining, the count of a pattern in a sequence was only regarded as one even if the subsequence appeared multiple times in a sequence.Based on this concept, max utility concept could be more suitable regarded as the estimated utility for subsequence in quantitative sequences [5,11].Consumers' purchase behavior extraction is the main use of sequential pattern mining.Max utility concept may be more appropriate for finding high utility sequential patterns in various real life problems, such as getting high utility business policies or finding mostly accessed services based on their preferences [11].Lan et al. proposed a sequence utility upper bound model for generating patterns [12].This model did not adopt any strategy to handle the high utility sequential pattern mining task.Lots of unpromising subsequences still need to be generated.In addition, the USpan approach has to spend a great deal of execution time using an LQS-Tree structure [25].Thus, the utility upper bound reduction for subsequences in mining is quite important.The aim of this study is to develop an efficient approach to extract frequent patterns from mobile web service sequences.

FSP-1 and FSUBP-1
In this step, sequence database is scanned, and the SMU value of each sequence is calculated.It is shown in Table 3.

Modified SMU and Sequence Generation
Four services, S2, S4, S5 and S6 are above the minimum utility threshold.These services can be used to generate the next level sequential patterns.They are also used to modify sequences and SMU values.These modified values are shown in Table 5.

Postfix Sequence Generation
According to FSUBP-1 patterns, postfix sequences are generated.In postfix sequence, only services which follow FSUBP-1 patterns are considered.Below Table 6 shows the postfix sequence of FSUBP-1 pattern <S4>.

Recursively Generating FSP-n and FSUBP-n
The above steps are recursively applied to generate FSP-n and FSUBP-n patterns.Based on the FSUBP-1 pattern S4, the following subsequence-2 can be generated.
In a similar way, all the FSP-n and FSUBP-n patterns can be generated.The algorithm UBFPM for generating utility frequent sequential patterns is shown below.

Experiments
In this section, we evaluate the performance of our algorithm UBFPM.The experiment was performed on a Pentium Dual-Core 3.3 GHz processor with 8 GB of memory, using the Java programming language.The experiments ran in the Windows 7 operating system.The simulation is performed on both a synthetic and a real database.The performance of the proposed UBFPM approach is compared with state-of-the-art pattern mining approaches such as IHUP [5],Up-growth [22], UP-Growth+ [23] and MU-Growth [31].

Experiment on Synthesis Dataset
In this experiment, we use the public IBM data generator (Site 2).This data generator produces the mobile web services sequence data.The parameters used in the IBM data generator were the average length of transaction per sequence S, the average length of services per transaction T, the average length of maximum potentially frequent services set I, the total number of distinct mobile web services N, and the total number of sequences D. For each service sequence dataset generated, a corresponding utility table was also produced in which a utility value in the range from 0.0 to 1.0 was randomly assigned to a service.The simulation model was similar to that used in Liu et al. [15], to generate the utilities of the services in the sequence.Figure 2 shows the utility-value distribution of all the mobile web services generated by the simulation model in the utility table.

Performance Comparison on the Synthetic Dataset
Figure 4 shows the experimental results of performance evaluation on a synthetic dataset.Figures 3 (a) and (b) present the results of total execution time.Figures 3(c) and (d) present the number of FSUBP patterns on fixed data size (200k) and varied data set size, respectively.In figure 3, the proposed approach UBFPM has the best performance in terms of total execution time as well as the lowest memory consumption.Other approaches generate a more FSUBP pattern, while UBFPM generates less frequent patterns.In figures 3 (a) and (b) the approach takes less time as compared to different state-of-the-art approaches because these use tree-based pruning strategy.Treebased pruning requires more time to construct the tree first, and then prunes based on a minimum utility threshold.In terms of execution time, the approach is more efficient while the minimum utility threshold is less than 0.60%.As seen in figure 3 (a), when the minimum utility threshold increases from 0.20% to 0.60%, execution time is varied for all approaches.But when the minimum utility is higher than 0.60% this variation goes down and above 1% execution time is approximately similar.The same thing is applicable to different data sizes.As figure 3

Performance Comparison on Real Datasets
We present the experimental results of the compared approaches under varied minimum utility values in figure 4. For this performance comparison, the kosarak and retail dataset are used.For both datasets different minimum utility thresholds have been used.In figure 4 (a), the runtime of the UBFPM is best among all other approaches on the kosarak dataset.The results show that the proposed approach is more suitable while minimum utility threshold is increasing from smaller to higher.In addition, it is observed that the approach generates the least number of frequent FSUBP patterns.Another comparison is shown in figure 4 (b) on the retail dataset.When minimum utility threshold is increased from 5000 to 20000, the execution time decreases.In this figure, it is shown that all the approaches have a good execution time beyond the 20000 minimum utility thresholds.Figure 4 shows that MU-Growth also performs well for both datasets.

Memory Usage
Figure 5 shows the memory usage of the different approaches to different datasets.UBFPM always usage less memory than the other algorithms.The reason is that these algorithms have to reserve a very large amount of memory to store candidate itemsets during the execution process, while UBFPM does not.Figures 5 (a

Discussion
Having run the above experiments, the proposed approach UBFPM is shown to outperform the current state-of-theart algorithms.To mine interesting patterns, almost all existing algorithms first generate candidate itemsets and subsequently compute the exact utility of each candidate to identify interesting patterns.The UBFPM approach does not generate candidate sets as it stores only postfix sequences of services.Experimental results showed that UBFPM extracts frequent patterns faster than the state-of-the-art approaches [5], [22], [23], [31].Figures 3, 4 and 5 show that the UBFPM approach reduces the execution time as well as memory usage.The number of frequent patterns generated by the proposed approach was clearly less than that of existing algorithms.The main reason for this is that the maximum utility value in a services sequence was more suitable as an upper bound of any subsequence in a sequence.Today mobile devices and services are getting more and more popular for various applications.Effective data mining techniques for ensuring user's requirements in both of reliability and timeliness on the mobile devices with limited resources is still a crucial challenge.Regarding reliability, our proposed approach is

Conclusion
In this paper, we propose an efficient approach named UBFPM, for service frequent pattern extraction using utility as the preference of service.We also propose an algorithm which is based on the postfix sequence generation of service sequence.More accurate frequent upper bounds are also computed for enhancing the filtration of service sequence.The proposed approach can discover highly frequent FSUBP patterns and sequential FSP patterns of service sequences.These discovered patterns are very useful for mobile web service users and business analysts.If a service provider knows the frequent patterns of any sequence beforehand, they can take decisions to enhance their business effortlessly.The experimental results show that UBFPM is better than previously developed approaches.With the help of this approach, mobile web services prediction and maintenance becomes simpler and easier.In the next steps, we will attempt to handle the dynamic maintenance problem of utility based sequential patterns, when sequences are dynamically modified.

Figure 1 :
Figure 1: Mobile web service sequence generation scenario

Figure 2 :
Figure 2: The utility-value distribution in the synthetic dataset

Figure 3 :
Figure4shows the experimental results of performance evaluation on a synthetic dataset.Figures3 (a) and (b) present the results of total execution time.Figures3(c) and (d) present the number of FSUBP patterns on fixed data size (200k) and varied data set size, respectively.In figure3, the proposed approach UBFPM has the best performance in terms of total execution time as well as the lowest memory consumption.Other approaches generate a more FSUBP pattern, while UBFPM generates less frequent patterns.In figures 3 (a) and (b) the approach takes less time as compared to different state-of-the-art approaches because these use tree-based pruning strategy.Treebased pruning requires more time to construct the tree first, and then prunes based on a minimum utility threshold.In terms of execution time, the approach is more efficient while the minimum utility threshold is less than 0.60%.As seen in figure3(a), when the minimum utility threshold increases from 0.20% to 0.60%, execution time is varied for all approaches.But when the minimum utility is higher than 0.60% this variation goes down and above 1% execution time is approximately similar.The same thing is applicable to different data sizes.As figure3 (b)shows for small data size (100k) all algorithms take approximately the same execution time, but when the number of sequences is increased (about 200k or more), the previous algorithm takes more time, while UBFPM performs well.

Figure 4 :
Figure 4: Execution time comparison on real datasets

Figure 5 :
Figure5shows the memory usage of the different approaches to different datasets.UBFPM always usage less memory than the other algorithms.The reason is that these algorithms have to reserve a very large amount of memory to store candidate itemsets during the execution process, while UBFPM does not.Figures5 (a) and (b)show memory consumption on the kosarak and retail dataset, respectively.Figure5(a) indicates that when the minimum utility increases from 1000000 to 3000000, memory usage gradually decreases.In this figure the rate of memory usage also decreases for other approaches, but UBFPM frees more memory space for execution.Memory usage is also shown in figure5(b) for the retail dataset.When minimum utility increases from 5000 to 25000, memory usage decreases.

Table 1 :
Service sequence dataset

Table 3 :
SMU values of sequenceAfter getting SMU and TSMU, we generate Subsequence-1 SUB and USV values of all the service which is shown in

Algorithm 1: Algorithm of generating utility frequent sequential patterns Input
: Web service sequence database D, utility values and min_util threshold