Differences in Buyer Journey between High-and Low-Value Customers of E-Commerce Business

The knowledge of high-value customers provides the possibility to make decisions ensuring profitability of the company. By analyzing and optimizing a buyer’s journey, companies can better understand their customers and optimize marketing costs in the way that will generate a higher return on investment. The primary objective of this paper is to define the current state of multichannel attribution and, based on the literature, study and analyze the data regarding the buyer’s journey of highand low-value customers of selected e-commerce business. To accomplish the main objective of our study, we retrieved and analyzed top conversion paths from Google Merchandise Store, the e-commerce website selling goods branded by Google, with the use of Markov chains and heuristic models. A difference between highand low-value customers regarding the acquisition by marketing channels before the purchase was found. Moreover, it was found that high-value customers' journeys consist of more interactions compared to those of low-value customers.


Introduction
Approximately 96% of website visitors are not ready to purchase a product during their first website visit (Site 4).On the contrary, since the first visit towards the conversion (purchase), the visitors move through the process called the Buyer's Journey.The Buyer's Journey represents the sequence of the steps taken by customers during the move from awareness states, decision-making phase to purchase phase [17].The mapping of the buyer's journey is a model of mapping all interactions of the customer with the brand to improve these interactions.This process should increase sales and customer satisfaction [23].Through the progress of digital advertising and technological innovations, companies can track digital footprints of the customer on a granular level, bringing the knowledge about the customer's behavior.Moreover, they can measure the impact of displaying the particular marketing channels to the customer on conversions [11].
Companies usually do not rely solely on one marketing channel to acquire the customer.Several marketing channels are used while working in cohesion to accomplish the company's goals.The value of importance should be assigned to each of these channels.Attribution modeling is a set of rules based on which the credit for conversion or purchase is assigned to particular marketing channels [5], [20].In our previous research [10], we defined a problem connected to the evaluation of the utility of marketing channels in a sales cycle.Despite executed surveys, for customers, it is often difficult to determine which channels they were in interaction with along their buyer's journey.This issue can be solved by using attribution models because each touchpoint of the customer with the company would be evaluated.Szulc et al. (Site 8) claim that the use of attribution modeling helps to optimize the allocation of marketing budget, supports marketing budgeting, ensures more precise planning of marketing campaigns, ensures the accuracy of cost-per-acquisition calculation and helps to optimize payments to affiliate partners.The latest study by Econsultancy [8] came with the finding that 66% of client-side marketers and 70% of agencies are ambivalent about their attribution models.This number has risen by 62% on a year-to-year basis.
In the current web analytics tools (such as Google Analytics), there are several heuristic models available to determine the merits of each marketing channel.They are called heuristic because they follow the strict rules when assigning the credit to the marketing channels:  Last-click (100% of the credit is assigned to the channel prior the conversion),  First-click (100% of the credit is assigned to the first channel customer got in interaction with),  Linear model (the equal amount of credit is assigned to all channels customer interacted with during the journey),  Time-decay (the highest value is assigned to the last channel or campaign, and the assigned value decreases towards the first channel),  Position-based,  Custom model [5], [12], [20].Barajas et al. [4], Anderl et al. [1], Sterne [27] Abhishek et al. (Site 1) and Bryl (Site 3) reported that the use of heuristic attribution models is not proper for attribution purposes.Barajas et al. [4] claim that heuristic models assign a value to each displayed and converting channel.However, they ignore hypothetical reaction without a user being in touch with the advertisement.Abhishek et al. (Site 1) state that heuristic models are not data-driven.Anderl et al. [1] discuss that despite heuristic models are not accurate, the use of more sophisticated attribution approaches found its place in managerial practice.Sterne [27] argues that heuristic models are artificial and based on conjecture.Therefore, their use in analytics is not accurate.Schultz and Dellnitz [26] claim that heuristic models are not objective because of their limited predictive accuracy, especially when single touch models (first touch or last touch) are used.Bryl (Site 3) claims that heuristic models are not proper to use because of their quantity, while it takes a managerial decision to choose the right one that will suit the company's data.
There have been several studies that offered more data-driven approaches to the attribution to overcome the weaknesses of heuristic models.Yadagiri et al. [31], Nissar and Yeung [20] use Shapley value in their nonparametric approach to attribution as a game theory based model.In his thesis, Rentola [22] used two models: binary logistic regression to classify customers to converters and non-converters (purchasers/non-purchasers), as well as logistic regression model with bootstrap aggregation.On the other hand, Shao and Li [24] used bagged logistic regression and a probabilistic model in their study.In their study, Li and Kannan [17] used a hierarchical Bayesian model.Geyik et al. [11] developed their attribution algorithm MTA to solve two problems: spending capability calculation for a sub-campaign and return-on-investment calculation for a sub-campaign (more in [11]).On the contrary, Wooff and Anderson [30] offer an attribution mechanism based on appropriate time-weighting of clicks using sequential analysis.Hidden Markov Model was used in the studies conducted by Abhishek et al. (Site 1) and Markov chain models developed by Berger and Nasr [6] are appropriate for customer relationships modeling applied to people or organizations who have already purchased from the firm.Blattberg and Deighton [7] formulated a model based on Markov chains for customer lifetime value to support the decision-making regarding the optimal balance between customer acquisition and retention spending.Pfeifer and Carraway [21] identified and demonstrated flexibility and its probabilistic character as advantages of Markov chain models for modeling customer relationships.EsmaeiliGookeh and Tarokh [9] mapped frequent usage of Markov chain models in modeling customer lifetime value.Scholz [25] demonstrated modeling of clickstreams (a sequences of click events for exactly one session with an online store user) by using Markov chains.In our opinion, the most advanced model is provided by Zheng et al. [32].It is the three-stage choice model of a customer when selecting a company in the whole industry.
However, data collection necessary for the model might be challenging, as it uses the data that are not commonly available for each company.Based on this claim, the data collection might take so much time that the results are not relevant at the moment of accessing the results of the analysis itself (it also applies for Zheng's study in which data of older date are used).The behavior of online customers changes quickly and thus, the nearly real-time analysis is necessary to obtain results based on the executed analysis.
We can see many approaches to the attribution.However, we incline to the Markov chain model proposed by Anderl et al. [1], [2], [3] discussed in the following parts of our study.Anderl et al. [1] and the following study by Anderl et al.
[3] use a higher order Markov chain model to attribute the value of the marketing channels.They propose that for practical reasons, the third order is the most efficient when calculating the outcome of particular marketing channels.Anderl et al. [1] further reported that a Markov model meets the following criteria: objectivity, predictive accuracy, robustness, interpretability, versatility and algorithmic efficiency (details in [1]).Anderl et al. [3] also stated that heuristic models undervalue display advertising and pay-per-click campaigns, social media and e-mail activities.On the other hand, Markov chains distribute the value of the channel more evenly.Based on these criteria, we selected Markov chains to be a suitable method for the analysis of our study.Note that the objective of this study is not to propose the new attribution model.We also selected this method based on the following criteria:  The export of the buyers' journeys (consisting of marketing channels that customers used to come to the website before the purchase) is among the standard features of Google Analytics Free.This ensures that our analysis might be executed broadly by a company of any size and budget.
 Attribution analysis using Markov chains can be easily executed in software The R Project (Site 7) with a couple lines of code using the package ChannelAttribution (Site 2).The package also allows users to compare the results with the standard heuristic models.The data exported from Google Analytics almost precisely suits the structure supported by this package.
Our goal is to determine the differences between high-and low-value customers (transactions with high-and loworder value).The contribution of our paper towards the existing literature does not lie in the new method that is used for attribution.According to our literature review, there is no study that would analyze the customer journeys of customers with high-and low-value transactions using Markov chains.Moreover, the literature review also showcased that in terms of marketing attribution, Markov chain is the most suitable method regarding the available data sources and tools to conduct the analysis.Based on the contributions of other authors, we adopted this method to examine the problem that was set.Therefore, the contribution of our study lies in findings of how the buyer's journey changes when the customer is making a transaction with high or low purchase amount.In particular, we reveal the difference in how various marketing channels (social media, organic search, paid search and more) are used before the purchase as well as the change in the length of the buyer's journey.So far, we were not able to find the study that addresses the similar issue, therefore the contribution lies in new topic that was opened with the objective of motivating another authors to extend the knowledge in this area.As Sterne [27] claims, the ability to distinguish these two types of customers is crucial for the profitability of the company.Low-value customers might produce a negative customer lifetime value by the possible return of the goods, the use of customer service above the standard level and other.This might result in higher cost for achieving the desired level of customer satisfaction without hurting the customer experience and word-of-mouth of the company.When looking at customer lifetime value, one might argue that high-value customers are not necessarily the ones who spend a lot during one purchase.High-value customers might be adding their incremental value in small regular purchases.We accept this claim and consider it to be one of the limitations of our study.On the other hand, Barker [5] discusses the situation in which ecommerce customers spend less during their first purchase (as a test of e-commerce store's quality), and once the store proves to be trustworthy, they are open to making higher-value purchases.We also agree with this point and consider the high-value purchases in our data to be a proof of store's quality.Therefore, its acceptance by customers who can bring high value in terms of profit to the company.Another reason for separating the behavior of low-and high-value customers is based on the primary goal of businessto generate a profitand therefore spending marketing budget on customers with the highest possible value for the company.

Markov Chain and its Use for Attribution Modeling
Markov chain is a probabilistic model which represents the dependences among sequences of observations of the random variable.Markov model is a stochastic process.The problem of finding the optimal path in the stochastic process has been studied for a long time in the graph theory [18].Consider the set of states S = {s1, s2,...,sr}.Given state in time t will be denoted as Xt.The process starts in one of the states from the set S while moving from one state into another state.This transition will be denoted as a step.The particular sequence of states X1, X2, ... , Xt will be denoted as a trajectory.If the chain situates in state si then in the next step it will move to the state sj with the probability of pij and this probability does not depend on the state in the chain before the current step.Probabilities pij will be denoted as transition probabilities.The process can stay in the current state while this state occurs with the probability of pij.An initial distribution of probabilities defined by state S denotes the initial step.In Markov chain, the initial state is usually represented by a particular state [11].In Markov chain, the following state depends solely on the current steps, and previous steps are not taken into account.This preference is denoted as a Markov first order [22].Anderl et al. [1] suggest using the first or the second order for attribution because these models can be easily compared to heuristic models.For our study, a set of states is represented by available marketing channels.
Trajectory describes the sequence of marketing channels the customer is in touch with during the buyer's journey.
Transition probability is represented by the probability of a customer using given marketing channel in the next step of the journey.
Markov transition matrix is also given.The transition matrix is a square matrix (each possible states are in the rows and columns) which includes the transition probabilities from one state to another [13].For our study, the transition matrix is a matrix which presents the probabilities for the customer to move from one marketing channel to another.Mathematic expression of the transition matrix is as follows: ( where i, j ∈ {1, 2, … , r}, the transition probability is greater than 0, but less than 1 and the total value of probabilities is equal to 1 [1].The transition matrix can be displayed as a transition diagram.The transition diagram is a graphical representation of Markov chain, and it is also equivalent of the transition matrix.Each node represents the possible state (marketing channel), and edges represent transition probabilities [14].Anderl [2] suggests using a removal effect for attribution modeling.The removal effect is defined as the probability to achieve conversion from the start state if some of the state (si) is removed from the model.As the removal effect reflects the change in conversion rate, if the given state si is removed, the value (or importance) of the given marketing channel can be determined.If N conversions are generated without the particular channel (compared to the number of conversions in the full model), the removed channel determines the change in the total number of conversions (Site 3).Markov chain described in this section defined the methodical framework used in our analysis conducted in the following parts of the study.

Objectives and Methods
The primary objective of this paper is to define the current state of multichannel attribution, and, based on the literature study, analyze the data regarding the buyer's journey of high-and low-value customers of selected ecommerce business.By decomposition of the primary objective, partial objectives were set.The first partial objective is to determine the current state of use of attribution modeling.The second partial objective is to analyze the multichannel paths of a selected company with the use of Markov chains and heuristic models, and compare the results describing differences in the buyers' journeys.Based on the primary objective of the paper and based on the above-mentioned necessity to determine the differences between low-and high-value customers discussed by Sterne [27], as well as claims stated by Barker [5], we developed the following working hypothesis: H1: There is a difference in the buyer's journey taken by high-and low-value customers in terms of using the particular marketing channels when conducting product research and purchase.
To achieve the objectives mentioned above and to test hypotheses, we analyzed the data from the e-commerce website of Google -Google Merchandise Store.Google Merchandise Store sells apparel, bags, drinkware, electronics, accessories, and other branded by Google or related Google's products (Site 6).The e-commerce data was retrieved from web analytics platform Google Analytics.Data about transactions of Google Merchandise Store are freely available in the Demo Account provided by Google Analytics platform.Top conversion paths were analyzed using both heuristic models and Markov chains as defined in the previous section of the study.Besides, the elements of descriptive statistics were used (table, bar chart, pie chart, radar chart), too.The data was analyzed using The R Project for Statistical Computing (Site 7), Google Analytics (Site 5) and MS Excel.

Analysis of the Buyers' Journeys
As it was mentioned in the previous sections, the data for Google Merchandise Store was exported from Google Analytics for the date range from 1 August 2016 to 31 March 2017.We analyzed 16,330 transactions representing purchases on e-commerce website of the company.The data in the data set was structured as follows: 1.The path consisting of channels used to visit the website before the conversion -interactions (e. g.Social media > Paid search > Organic search > Direct > Direct).
2. The number of path occurrences (how many times the exactly same path appeared) 3. The value (in USD) generated by customers who took a particular path.
Kim et al. [15] or Kumar et al. [16] consider customer lifetime value to be a value driver for a company, as it sums up all purchases made by customer during his relationship with a company.We were not interested in this view regarding the buyer's journey analysis.Instead, we focused on the financial value spent during the single transaction (purchase).During customer's relationship with a company, there might be both higherand lower-value purchases.However, we assume that despite the customer purchased from the company before, in high-value orders, there will be different nuances before the purchase than before low-value order (as low-value order does not require as much decision-making as high-value one).As our goal is to understand these nuances, and as we understand the effect of repeated purchases, we tried to select the data set with a company data consisting of the lowest amount of repeat purchasers from all available data we had at disposal.The data from Google Merchandise Store contained less than 25% purchases from repeat customers, and therefore the previous relationship with the company affects the decision-making before the purchase with a less significant impact.However, we consider the repeat purchasers to affect the buyers' journeys in one significant wayit might eliminate the decision-making and therefore shorten the buyer's journey itself.By other means, the buyer's journey might consist of a smaller amount of interactions (the website visits) before the purchase.
To separate customers into two groups -high-value and low-value based on the amount spent during a single transaction -we divided transaction value into quantiles.The results are presented in Table 1.As there was a significant difference in value between 80% quantile and 100% quantile, we decomposed this quantile into another five quantiles to analyze transaction value at more granular level.As it is possible to see, there is a dramatic increase in transaction value between 99% quantile and 100% quantile.For the purpose of our analysis, we consider low-value customers to be between 0% and 40% quantile (0.01-110.06 USD) and high-value customers to be between 80% and 100% quantile (238-17,855.5USD).In case of high-value customers (transactions), we used Pareto principle 80:20 [29] and extracted 20% of transactions what accounts for approximately 80% of revenue.We did not use data in the quantiles between 41% and 80% to truly distinguish between value generated by these customers.We also wanted to maintain at least a similar amount of high-and low-value transactions used for the analysis.We used this approach instead of separating customers based on average order value in order to avoid possible small differences between the most profitable customers among low-value customers and least-profitable customers among those of high-value.After dividing transactions into the groups mentioned above, we exported top conversion data for both groups of customers.In a high-value category, 2025 transactions were included with the path length with 2 or more interactions (the number of marketing channels used before the purchase).In a low-value category, 3503 transactions were included.In Google Analytics, it was set that conversion journey will not be taken for a period longer than 30 days before the transaction.When we exported the data, we were able to determine the following set of states S = {Direct, Organic search, Paid search, Referral, Social network, Display advertising, Other advertising}.First of all, it is vital to define the meaning of particular marketing channels:  Direct -a user typed the URL address of the store into the browser, used bookmark or visited a website from mobile application (e.g.Facebook mobile application).
 Referral -a user landed on the website based on the outbound link included on the website previously visited.
Based on the package ChannelAttribution (Site 2), we added two additional states to the set of states -Start (representing start of the buyer's journey) and Conversion (representing the purchase).Thus, the final set of states was following: S = {Start, Conversion, Direct, Organic search, Paid search, Referral, Social network, Display advertising, Other advertising}.We did not include the Null state (the journeys without purchases) as Google Analytics does not track these paths and implementation, and data collection would take the additional amount of time.
After splitting customers into groups based on their value, we used heuristic models and the Markov chain model to find out how credit is assigned to the particular marketing channels.The results are presented in Figure 1 and Figure 2. When comparing results in Figure 1 and Figure 2, it is obvious that the majority of credit is assigned to the direct traffic by all attribution models.The second most frequent driver of conversions is referral traffic, especially in the awareness phase (as visible by focusing on the first touch model).The differences might be spotted when comparing credit for organic and paid search.In this case, more credit is assigned to these channels when it comes to low-value customers.As we work with the assumption that during the first purchase the customer only purchases a few goods to try the seller [4], it is natural that more searches are performed by users in the initial research and evaluation of the suitability of the potential seller.In both cases, other channels are not significant drivers of conversions for acquiring low-value and high-value clients.In this case, we can see that the amount of money spent during the purchase might affect the information sources when accessing information about the product which customers intend to purchase.As was presented in the theoretical background of this paper, the use of heuristic models has certain limitations.Because of this fact, we will not use these models to perform a deeper analysis of customers.Instead, we are going to use Markov chains and its particular features (transition matrix, transition graph) to understand both types of customers in detail.Table 2 and Table 3 provide a comparison of transition matrices (and transition probabilities to shift from one state into another) between high-and low-value customers.By comparing transition probabilities in the following matrices, we can see several differences.First of all, high-value customers more likely convert (make a purchase) when visiting the website from social network or referral source.
Assuming that high-value customers might be repeat customers, we consider it to be more natural that customers trust a particular brand more across all marketing channels before the purchase itself.Given this, it is possible to see that regarding low-value customers, the most probable transition (except purchase) will be made from any source to the direct visit.The higher level of engagement via this channel is presented by low-value customers compared to high-value ones.Low-value customers are also more likely to use a search engine as their next step (marketing channel).When analyzing the differences between groups of customers, the emphasis was also put on the length of the buyer's journey itself (understood as a number of the interactions/website visits with marketing channels between transactions).During comparison, it was found that the median value of the number of interactions regarding highvalue customers was slightly higher (6 vs. 5).The same results were found regarding maximum number of interactions (49 vs. 36).Boxplot in Figure 5 provides the visual comparison of the two groups.It can be seen that also inter-quartile range is higher when it comes to high-value customers.This finding points to the possible fact that high-spenders probably want to make more sure when it comes to their purchases, so they visit the website multiple times before the transaction itself.

Channel to
Based on the results of our analysis, we can conclude that there is a difference (although slight) in the buyer's journey between high-value and low-value customers, and thus we can accept our working hypothesis H1.
By using Markov chains, we were able to confirm almost all of these criteria except predictive accuracy as we did not use any predictive modeling (e. g.ROC curves discussed by Anderl et al. [1]).However, we were able to confirm objectivity compared to heuristic models (as these are not data-driven).We also met the criterion of robustness, as we obtained the similar results each time we ran the analysis.The results of our study confirm that the criterion of interpretability was also met.Consequently, we provided conclusions based on the results.Versatility and algorithmic efficiency were confirmed when we tried to run the algorithm on two datasets.We were able to use the method on multiple datasets, and we obtained the results in a short amount of time.
Limitations of the study:  We were not able to segment customers and distinguish between first-time purchasers and repeated customers.The results might be affected as the behavior of these two groups is probably different (mentioned above).
 The first Markov order assumes that the probability to move to another step depends only on the current position.However, there are indeed customers who remember their previous touchpoints with the company.
These customers can be assigned and evaluated by the higher order Markov chain (that looks back more steps).The choice of the right Markov order will be the focus of our future studies.
 There is no presence of the Null state.As we omitted the journeys without purchase, the model was not fully optimized for the accuracy.On the other hand, the journeys that have not end up as conversions now might end up as conversions in the future.
 We took into account only touchpoints resulting in the visit of the company's website.Some of the customers whom the advertisement might be displayed to, do not click on it.However, these customers might remember the advertisement and search for the company or visit the website directly.
 For the buyer's journey, we analyzed only the data containing touchpoints executed by customers 30 days before the conversion.However, there might be existing buyers' journeys that are longer than 30 days.
 As it was mentioned in the description of the states, some of the states might also represent other states (e. g. direct state might consist of visits from mobile apps, bookmarks, possibly offline advertisement).This lowers the accuracy of the attribution modeling.
 We were not able to prove if the results of our study are right as we have no power to change processes in Google Merchandise Store.However, the results will be presented to the stakeholders to point out the detected differences between customers.

Conclusions
For companies, it is valuable to know how to acquire high-value customers in order to prioritize activities that bring more financial output to the company.The analysis of the buyer's journey is one of the steps when mining this insight.Attribution modeling allows companies to determine which marketing channels are capable of acquiring customers to fine-tune the buyer's journey and optimize marketing cost.The primary objective of this paper was to define the current state of multichannel attribution and, based on the literature, study and analyze the data regarding the

Figure 1 :
Figure 1: Conversion attribution in high-value transactions (our processing)

Figure 3 and
Figure3and Figure4provide the visual representation of transition matrices.It is possible to state that for low-value customers (in terms we agree on the assumption by[4] mentioned earlier in the paper), a company should adjust more marketing channels to acquire these customers.

Figure 5 :
Figure 5: Comparison of number of interactions between high-and low-value clients (our processing)

Table 1 :
Transaction value divided into quantiles