A semantic query approach to personalized e-Catalogs service system

With the emergence of the e-Catalog, there has been an increasingly wide application of commodities query in distributed environment in the field of e-commerce. But e-Catalog is often autonomous and heterogeneous, effectively integrating and querying them is a delicate and time-consuming task. Electronic catalog contains rich semantics associated with products, and serves as a challenging domain for ontology application. Ontology is concerned with the nature and relations of being. It can play a crucial role in e-commerce as a formalization of e-Catalog. User personalized catalog ontology aims at capturing the users' interests in a working domain, which forms the basis of providing personalized e-Catalog services. 
 
In this paper, we present an ontological model of e-Catalogs, and design a semantic personalized e-Catalog service system (SPECSS), which achieves match user personalized catalog ontology (UPCO) and domain e-Catalog ontology (DECO) based on ontology integrated and focus on four key technologies: user personalized catalog ontology generation, domain and local e-Catalog construction, semantic match between them and e-Catalog semantic query system based on heterogeneous catalog database.


Manual definition based on the classification standards
Australian scholar Martin Hepp proposed the method to generate domain catalog ontology in [13], who issued eCl@ss domain catalog ontology based on OWL, and used the parent-child structure of classification standard to obtain is-kind relation between the concepts of catalog ontology. South Korean scholar Hyunja Lee [14] manually enriched the semantics of e-Catalog ontology by increasing product properties and metric units and pointed out that products, classification scheme, properties and UOMs as the key concepts and developed a framework in which various types of relationships existing in product ontology are exploited for the score propagation in [2]. At least five deficiencies exist:


Have not considered the method to extract e-Catalog ontology from distributed enterprise product catalogs (database);


Have not studied the syntactic and semantic integration problems of distributed e-Catalog;


Have not yet researched the integration of international classification standards;  Manually defined domain e-Catalog ontology, and included product properties, but have not involved extracting individuals;  Domain e-Catalog ontology service is only for customer query, but not considering personalized e-Catalog service. [15] designed retrieving metadata semantics-based IPIS service system. However, this method is not suitable for dynamic changes of product information for the e-commerce scenarios.

Automatic or semi-automatic generation methods of catalog ontologies
The method extracts local catalog ontology from enterprise product information system (such as ERP), Web site formation. Obrst, Wray and Liu [16] discuss the main challenges of building and aligning ontologies for products and services in B2B e-commerce scenarios. [17] designed DOME, an enterprise e-Catalog ontology management system structure, and put forward building standardized local e-Catalog ontologies from enterprise information systems mapping with UNSPSC classification standard. This method just established a centric e-Catalog structure, but omitted the syntax and semantics integration among distributed e-Catalogs and the integration of international classification standards.

E-Catalog Ontology Integration
E-Catalog integrations include syntactic integration and semantic integration. We need to integrate international product classification standards, and enterprise e-Catalogs or product classification databases. There are more than 25 kinds of e-Catalog classification standards in the world. [18] provided a catalog management method for the various classification standards. The standard integration XML-based can solve syntactic and structural heterogeneity, for example, CEN/ISSS(European Committee for Standardization/Information Society Standardization System) put forward e-Catalog standards integration [3]. Corcho and Gómez-Pérez [19] also show how multiple standards for classifying products and services can be integrated using ontological mappings, and sketch a prototype implementation based on the WebODE platform. Catalog ontologies based on various international product classification standards are heterogeneous [20], and then the key of e-Catalog semantic integrations becomes catalog ontologies integration. There are three methods for current ontology integration [21] (1) schema integration based on ontology properties; (2) integration of ontology concepts; (3) multi-layer structural integration of ontologies (individuals, properties, concepts), such as our research group put forward semantic similarity products match algorithms based on ontology concepts in combination with properties [22], [23]. E-Catalog integration can use [24] for reference. [25] applied the mapping method of ontology concepts and properties to integrate global data synchronization networkGDSN and EPCglobal catalog ontologies.

Personalized e-Catalog Services
Within unstructured data a keyword search engine can do a very valuable job like in Google.com or Baidu.com. But this technique does not utilize the semantics available in structured data. Moreover, it has lots of problems with the syntax, despite the semantics of typed e-Catalogs is clear. E.g., in the ranked keyword search XRANK [10] would cause a lot of problems. Also exploiting structured and typed data, parametric search aims to find the right alternatives in case there is no perfect match [11]. Iteratively, the user can soften or skip some search conditions. The most problematic deficit in this technology is that there is no deterministic way, indeed no confirmed way at all to find the best alternative. The user never knows when it is best to terminate the search and with which result. That is, traditional key-based retrieval method can not satisfy massive heterogeneous personalized catalog service, then [26] introduce metasearch engines, but this method is passive service. [27] provided an intelligent catalog recommend method using customer requirements mapping with product categories. [28], [29] researched personalized catalog ontology service. [28] brought forward personalized e-Catalog model based on customer interests and [29] is a personalized catalog service community, WebCatalog. [30] designed enterprise e-Catalog based on customer behavior. The knowledge representation and acquisition of client catalog turns into the key problems. In order to reach an effective method, K-clustering algorithm and e-Catalog segmentation approach are described in [31], and [32] described the customer segmentation method based on brand and product, price. In [33], the author researched personalized catalog service with one-to-one market by association rules and CART. In recent years, personalized ontologies (also known as private ontology, such as [27]) are introduced into e-Catalog service, Peter Haase put forward personalized ontology learning theory based on user access and interest coordination [34]. In distributed system, there are sharing concepts of domain ontologies and personalized knowledge ontologies [35]. Therefore, it has important theoretical and practical significance to apply personalized ontologies to personalized e-Catalog service.
As to this requirement, in this paper, SPECSS focuses on personalized catalog service and e-Catalog ontology construction in order to provide semantic e-Catalog query. Therefore we build user personalized catalog ontology based on consumers' behavior matched with domain e-Catalog ontology through semantic match model. The match result set is used for semantic query system.

Methods of Building E-Catalog Ontology
This study proposes a foundation for personalized e-Catalogs service by constructing two catalog ontology. One of them is user personalized catalog ontology which provides users' personal information and preferences and the other one is domain e-Catalog ontology that provides us standard e-Catalog knowledgebase.

User Personalized Catalog Ontology
In order to satisfy customer's personalized requirement, we should master more information of the customers. And it is not enough that we only construct domain e-Catalog ontology from semantic dictionary and international classification standards(such as eCl@ss, UNSPSC). Sometimes customers also cannot describe their own thought, to understand their potential mind, we need a user e-Catalog ontology. Based on consumer behavior, we propose a personalized approach to build user personalized catalog ontology (UPCO).
 First, build user ontology backbone(UOB) based on users' personal information and preferences;  Second, extract user catalog information from user purchase history, user searching keywords, user browsing catalog, user feedback information and experts recommendation by catalog information extracting module;  Third, web resource semantic processing module is convenient for providing personalized services, according to user catalog ontology information, such as classifying web resource and formatting process.
We can export personalized catalog information from personalized user ontology, which can be sorted, as well as classify users' interest. The process is described in figure 1.
In the project SPECSS, we establish UPCO, which is general in domain application. When we need to apply it to specific areas, we can build the corresponding catalog ontologies in the fields according to the general user personalized catalog ontology framework, and combine users' personalized request with e-Catalog semantic queries, which can upgrade users' interest from the level keyword-based to a knowledge-based level, and link with keywords by semantics, because the UPCO describes users' interest in customers' view, and is closer to the users' real thoughts. SPECSS organizes a group of keywords expressing users' interest through UPCO, when users put up semantic query, it is no longer a simple keywords match, but considering users' personal preference and information, and tightly integrates the users and products, so that the system can improve the semantic query precision rate and recall rate, as well as be conducive to sort query results.   Figure 2 shows a user personalized catalog ontology framework, in which we describe user information, user preference and product concepts, properties and individuals that users are interested in, including product area, brand and quality authentication. Users associate with the product by property hasPreference, and we set aside a weight interface in property "hasPreference", indicating the fact users' different observation extent about different propertities of a product which is shown in Figure 3.

Domain e-Catalog Ontology
 Generation domain e-Catalog ontology (DECO): Generation domain e-Catalog ontology is divided into four steps: ① Extraction of the core concepts and properties for domain e-Catalog ontologies, according to the UNSPSC and eCl@ss standards, wordNet standards and semantic catalog dictionary. ② Construction of a DECO model. ③ Definition DECO and storing them into catalog warehouse through user-defined DECO subsystem. ④ Acquisition standardized DECO by e-Catalog ontology pruning subsystem, combining wordnet and semantic catalog dictionary (see figure 4).


Generation local e-Catalog ontology：Local e-Catalog ontologies are built based on heterogeneous distributed database, first, we analyze catalog database mode, and extract its ER model and convert the ER model to initial local e-Catalog ontology after defining the mapping rules. Then, we need standardize initial local e-Catalog ontology by standardization module, as in figure 5.


Integration initial domain e-Catalog ontology and local e-Catalog ontology to acquire domain e-Catalog ontology DECO  We have researched on the technology of database schema-based semi-automatic generation domain catalog and Chinese e-Catalogs Semantic Dictionary in [36], here it is only a framework and in [37], our group has built the mapping rules from the relational database schema to local ontology.

Semantic Match Based on Ontology
One critical step of semantic match is that calculation semantic match degree between the terms of ontology concepts. There have been many methods to calculate conceptual semantic match in e-commerce scenarios [38]. Common calculation methods and models are: (1) Identifier-based method [39], which uses word-building to find the semantic match degree between the concepts, and primarily reflects the linguistic similarity of the two concepts; (2) Synonym dictionary-based method [40], which organizes all concepts to a tree hierarchy structure according to synonym dictionary where there is only one path between any two nodes and this path length is taken as a measure of semantic distance of the two concepts; (3) Feature Match-based model [41], which calculates semantic match of concepts by the collection of properties; and (4) Semantic relationship-based model [42], also known as the semantic A semantic query approach to personalized e-Catalogs service system Donglin Chen, Xiaofei Li Yueling Liang, Jun Zhang distance-based model, which calculates semantic match of concepts based on hierarchy information and is mainly used in the same ontology.
In this paper, we need to calculate the semantic match of UPCO and DECO, and adopt identifier-based method to calculate the semantic match of names of concepts, and property feature-based clustering analysis to calculate the semantic match of concept properties, and property value-based method to calculate the semantic match of concept individuals, and at the end, we get the final ontology semantic match result sets by integrating the above results. In the following, we give an overview of the approach. First, semantic match module calculates concept-based semantic match degree, if the concept value is no less than the set threshold directly, it would become candidate 1. Second, the module will calculate property-based semantic match degree, and then we get candidate 2 or it does not exist, the same as candidate 3 which comes from individual-based semantic match. The next step, we need to integrate the candidates to gain the match result sets. Figure 6 shows this process.

Concept-based Semantic Match
In this paper, we adopt synonym set-based approach to calculate the semantic match degree of the concepts in the UPCO and DECO, which would make use of the Semantic Catalog Dictionary. If two ontology concepts have the same or similar characters, they usually have the same or similar means. However, if the naming rules are inconsistent with each other in the DECO and UPCO, the characters of the same semantic concepts may be completely different, then we might get zero match degree. Therefore, this paper puts forward a method that semantically expands the ontology concept into a concept aggregation according to the synonym set of the concept in the definition of ontologies (Same As relationship) and Wordnet before identifier-based calculation the semantic match of names of concepts. That is, we calculate semantic match degree of the synonym set concepts C 1 and C 2 , except for calculating semantic match degree of them.
The algorithm is as follows: Calculate the semantic match degree of each element c i of the synonym set of C 1 and each element c j of the synonym set of C 2 the identifiers-based，and then take the maximum value as the semantic match degree of C 1 and C 2 , that is

Property-based Semantic Match
Property-based semantic match method respectively calculates the semantic match degree of datatype and objective properties ( ) , ( 2 1 P P smd and ) , ( 2 1 P P smo )and then sets weight for the semantic match degree of the two kinds of properties, and at last, integrates them to gain the semantic match based on property. This work has been developed in our earlier work [21] and are motivated and defined as following. The motivation is to take into account the degree of specificity of the properties, based on the fundamental idea that a property being used very frequently is generally less specific than a property assigned to only a few e-Catalogs and the weight given by the users in the UPCO. That is, we modify weight setting method, while first of all, we need process normalization properties and adopt k-medoids algorithm, as follows:

Individual-based Semantic Match
To query user preferenced product, we should get the product similar with user preferences, namely calculating the instance similarity between DECO individual and UPCO individual. We calculate the semantic match of the individuals by the property value-based method.
Calculate the semantic match method based on linguistics, when we calculate semantic match degree of the property values. C and 2 C are input parameters, in the process, which are the properties values of two products, such as GlobalBrand lenov and Lenov_China in Figure 7.
 calculate the individual semantic match of the two products through comparing several groups property semantic match degree.

Figure 7: The Proporty Values of Individual
And the semantic match degree of C1 and C2:

Designing Personalized e-Catalog Service System
To sum up, implementation distributed e-Catalog semantic query, first of all, personalized catalog ontologies are customized e-Catalog ontologies according to consumers; secondly, we need to build domain e-Catalog ontologies(DECO) and establish local e-Catalog ontologies(LECO) extracted from heterogeneous databases; thirdly, we match the two kinds of ontologies by match algorithm through semantic match module which generates match resultsets. The semantic match resultsets are repositories which implement query reasoning and expanding module in SPECSS, as well as the basic of semantic tagging to information resources.
The structure of the SPECSS is divided into four parts distributing e-Catalog-based:


User personalized catalog ontology: a customized e-Catalog ontology, extracted form web resources, user ontology backbone and user catalog information;  Domain e-Catalog ontology: the foundation of e-Catalog semantic query reflects the semantic mapping relationship between e-Catalog databases and domain e-Catalog.


Semantic match: if we say domain e-Catalog ontology is woods, UPCOs are leaves, and therefore, we must match the "leaf" within "woods", to gain the shared concepts, properties and individuals.
 E-Catalog semantic query engine: receives query conditions that users input in querying interface, and translates them into ontology descript which semantic query engine can understand. Then, returns query result sets by means of reasoning and expanding module and interacts with the distributed database.
The basic theory of distributed semantic query based on e-Catalog ontology is: users input key words, phrases, sentences or paragraphs (users' queries, U q ) in user querying interface; query generator module of SPECSS translates U q to ontology descript; query reasoning and expanding module is responsible for reasoning and expanding the descript using the semantic match result sets, then outputs semantic queries (S q ) in forms of Sparql [43]; further more, query disassembling module disassembles S q into the local e-Catalog ontologies and finally extract data from distributed e-Catalog database, at this time, re-writing S q s to SQLs based on different e-Catalog databases is necessary, then querying results will be created. Query combining and filtering module combines the distributed results and filters repetitive and invalid results; semantic packing module will repack the querying results in the form of SQL to generate the final result set which is in the form of ontology and recommend them to users. Figure 8 depicts the semantic process. S q : the generating query sentences according to U q in the form of Sparql; S q = select ?a from DECO.

Query generating module
This portion is responsible for receiving users' queries from querying interface, and converting U q to a unified, identifiable Sparql. Example 1："Show me a list of laptops, made by IBM, with at least 1GB of memory, 80GB HDD and more than 1 year warranty" First SPECSS establishes DECO and LECO based on the above approach, and combines with UPCO, to generate descriptions of ontology according to WSML language [44] as follows: namespace {_ "http://example.org/", tasks _ "http://example.org/ontologies/tasks/"} goal GetBrandInformation annotations dc#description hasValue "Describes the desire of getting brand information of IBM notebook product" endAnnotations ontology ComputerProducts {_"http://example.org/ontologies/products/ComputerProducts"} concept Notebook hasManufactory ofType manufactory hasBrand ofType brand hasMemory ofType xsd#integer hasHD ofType xsd#integer hasWarranty ofType xsd#decimal importsOntology User {_"http://example.org/ontologies/user/"} concept user hasPreference ofType userPreference hsaInformation ofType userInformation . Some of them are well-known from regular relational algebra, others are slightly modified to reproduce SPARQL semantics.

Query reasoning and expanding module
Compared with traditional query, the largest characteristics of semantic query are that it introduces expansion reasoning functions into the users' queries in the querying course. It mainly reasons and expands the users' queries to gain Sq which are associated with Uq. When we recommend semantic information, we first query the visited records in order to quickly find users interested goods which have strong correlation with what users want.
For example 1, we know that there may be hundreds of thousands of kinds of IBM notebook satisfying the conditions, and if we return all the results to them, it greatly reduces the system efficiency and user satisfaction. SPECSS extracts user preferences and basic information personalize the user ontology-based to expand the user queries. Such as the user may be a university student, and he may prefer the metal shell and the appearance of blue waves. We focused on the income of university students, and limit ?price below 6000RMB, ?color prioritizing blue, and ?shell for the metal material.

Query breaking module
In order to reduce the complexity of the search, we also need further break down Sq into several atomic queries, and then deal with each of the atomic queries in LECOs, which is the work of query breaking module. Of course, how to determine which LECO should be appropriate and how to fast position these LECOs are the main problems that this module needs to solve. We consider we can make use of OS finding file algorithms and adding name space to mapping table to solve these problems.

Query re-writing module
Query re-writing module: Reasoning and expanding are extended to deal with queries in the middle of a process, but the final results are from the relational database. Therefore, re-writing query semantic to query terms from relational database is a necessary step. Rewriting e-Catalog semantic query bases on the principle of building LECOs which is

Semantic packing module
Semantic packing module: The process puts up query results of relational database in semantic ontologies, then submittes to combining and filtering results module, in order to get the final results to return to the users.
Here, SPECSS will return search results based on the size of the importance, along with the semantic annotations for the convenience of users' choosing. Alternatively, we can amalgamate a number of ontologies pointing the same individual as integrated ontologies, which can be utilized to annotate resultsets. Since the ontology contains a number of similar ontologies, so its comments reflect the similar viewpoints of the majority people, which can help users understand.

Combining and filtering results module
We need to further compound result ontologies from last module to user-oriented semantic result ontologies. This module is the solution of combining ontology and plays a role as filtering out irrelevant ontologies.
Semantic result ontologies are composed of two or more LECOs and their related ontologies. There is overlap among the properties, methods, or the relationships, that is, the semantic relationship degree SRD (O1, O2)> 0. Regard related ontologies as basic ontologies, LECOs as expanding ontologies. The basic operation is calculating SRD and copying sub-ontologies. Assuming the average number of sub-ontologies in DO is n, the number of LECO is m, we can get the algorithm complexity is O(mn).

Implement and Experiment
SPECSS is implemented in Java and Jena2 API. Jena2 [48] is a Java framework to build the semantic web application program, provide the programming environment for the languages like RDF, OWL, SPARQL, etc., and include rule-based inference engine. Therefore, we use it to process ontologies, and carry out SPARQL query.
The system was evaluated by having five users implement the system to create personal ontologies. Each user was asked to provide his/her personal information and preference, especially give the different weight on different properties, as shown in Table 1. Our experiment was carried out on five PCs with Windows XP, Pentium(R) D CPU 2.80 GHz, and 1G RAM, in order to simulate heterogeneous environment. The user was given a query interface to input his/her query parameters and view each one of their concepts and every concept from the DECO that had been matched to the personalize catalog concept. Also the user was able to decide which concept or property was not needed when reasoned and expanded the query. In the experiment, we take computer domain as an example. The user was asked to compare the semantic query result and that from the keyword-based search engines and decide if SPECSS was the better. Therefore, we manually create the domain e-Catalog ontolog (DECO) and user personalized catalog ontology (UPCO) and calculate semantic match degree in the system as in Figure 10 and 11.    We evaluated the system with two measures, precision and relevance, as shown in Figure 12. Precision measures the number of relevant pages that were seen vs. the total number of pages that were seen. Relevance measures the number of relevant pages seen plus the number irrelevant pages not seen vs. the total number queried.

Conclusion and Future Work
In this paper, we have provided complementary contributions to related work on applying ontology into e-Catalog and traditionary query. We focus on the theory of e-Catalog semantic query and personalized catalog service, which can express the preference and potential intention of users while they search products, including UPCO and DECO