Evaluating the Quality of Open Data Portals on the National Level

Over the last few years, governments worldwide have started to develop and implement open data initiatives to enable the release of government data in open and reusable formats without restriction or charge for their use by society. As a result, a large number of open data repositories, catalogues and portals have been emerging in the world. The efficient development of open data portals makes it necessary to evaluate their quality systematic, in order to understand them better and assess the various types of value they generate. Citizens also expect data disclosed by official authorities to have quality in the sense that they are official data and therefore should be accurate and reliable. Consequently, the aim of this paper is to examine and compare the quality of these portals. For this purpose, a benchmarking framework is proposed and validated to evaluate the quality of open data portals on the national level. The results obtained show that the number of datasets online and the sophistication of open data portals and their functions differ, reflecting the lack of harmonization and the need for quality standards. In particular, the United Kingdom, India and the United States have published many datasets and launched advanced portals.


Introduction
The world economy has become a data-centric one and as a result, those with capabilities to extract maximum benefits from their data will have the power at political, social, cultural and, especially, economic level [5].Therefore, over the last few years, an increasing number of governments have started to open up their data.This so-called Open government movement has resulted in the launch of numerous open data portals and infrastructures that aim at providing a single point of access for government data and explore their impacts [24], [30], [53], [58].By publishing these data on open data portals, the governments are giving it back to the citizens, which indirectly paid for their creation with their taxes in the first place [25].The emergence of these portals changes the way both citizens and researchers look for accountability-related data [32].The main point here is to create a fast change towards data sharing and transparency.This change could be helpful to gain the benefits from opening up further data silos in the public sector [55].
Although there are many different sources of data, government data are particularly important because of their scale, breadth, and status as the canonical source of information on a wide range of subjects [50].They are called Open Government Data (OGD).It is important to explicitly mention at this point that OGD is not an equivalent to, but a subcategory or subset of, open data, which may equally originate in the commercial, academic or third sectors [18].Because of large amounts of data produced by the public sector, the open data model has evolved into the open big data model [11], [31], [38].This intersection of open and big data is mostly about integrating multiple data sources, e.g. on the international, national, regional and local of the public sector and international organizations [31].Marton et al. [38] stated that the fundamental concepts of open and big data are technical in nature as they were developed in the fields of computer science and engineering.They are both gathered for a purpose and then normally repurposed [11].While big data are mostly characterized as large in volume, gathered at high-speed and may be also unstructured and come from many sources, open data are about standards on how to make data machine-readable, and hence linkable [38].
Governments and other public sector authorities generate and collect vast quantities of data through their everyday activities, such as managing pensions and allowance payments, tax collection, recording traffic data and issuing official documents [5], [55].For example, these data are the largest single source of information in Europe with an estimated market value of 32 billion Euros [16].Buchholtz et al. [5] then estimated that aggregate direct and indirect economic impacts from the use of open big data across the whole European Union (EU) economy are of the order of billions Euros annually.Hence, these data have a significant potential for re-use for developing new products and services, possibly in creative combinations with other data sources [8].Zillner et al. [55] estimated overall economic gains of 40 billion Euros a year in the EU; besides the economic benefit as such, the EU recognizes additional value like the contribution to address societal and industrial challenges, achieve efficiency gains through sharing data inside and between public administrations and foster participation of citizens in political and social life and increasing transparency of government.The possibilities to better use these available data are growing due to the technical facilities and advancement to merge and analyze different datasets [21].Areas of interest related to the application of these technologies include data analytics for the advance analysis of large datasets (benefits for e.g.fraud detection and cyber security), improvements in effectiveness to provide internal transparency and improvements in efficiency for providing better personalized services to citizens [30], [54], [55].During the last years, a vast number of open data communities that develop new ideas and applications, have emerged around OGD portals [42].Therefore, open big data should be the goal where possible.The only, minimum and non-radical, demand is: openness by default.The sooner governments are opening up their data the higher the returns [55].However, disclosing these huge amounts of data does not necessarily equate to more transparency and does not necessarily facilitate accountability [32].Except the economic importance, there are additional issues concerning the regulation of government data such as discoverability, harvesting, community engagement and interoperability [20], [50].Also the quality of these data and related portals may vary from country to country, which can affect their value for users.
An ability to discover the relevant data is a prerequisite to unlocking the potential of open data.Creating a portal of available datasets is a one way how to make these datasets more accessible and easier to find [27].On the other hand, with the different data management systems of open data portals and related open data initiatives, there is a great diversity in their content, functionality and technology standards [4], [30].But most importantly, they vary in their usefulness and suitability to their task [4].The extraction of valuable information coming from these different data sources then requires the quality evaluation of them [6].Calero et al. [7] addressed the evaluation of the quality of a web portal by defining a data quality model containing 33 data quality attributes grouped into four data quality categories.While some aspects of open data quality align with the ones of web portals, domain-specific quality perspectives in the context of open data (e.g.data management system, the openness of provided data based on the license or format, metadata) need to be identified and evaluated.The quality of data plays an essential role in the use of open data portals and a certain level of data quality is critical for OGD use [57].However, Umbrich et al. [48] argue that despite the enthusiasm caused by the availability of a steadily increasing amount of openly available data, first critical voices appear addressing the emerging issue of low quality in the metadata and data source of open data portals, which is a serious risk that could disrupt the open data project.Kučera et al. [27], Umbrich et al. [48] as well as Zuiderwijk and Janssen [57] then claim that there is a need for a quality evaluation and benchmarking framework to better understand quality issues in open data portals and study the impact of improvement methods over time.
2. Proposal of the benchmarking framework (definition of the quality dimensions, and the particular metrics for each of them) based on the systematic literature review, as defined by Petticrew and Roberts [41], i.e.: A review that aims to comprehensively identify all relevant studies to answer a particular question, and assesses the validity (or soundness) of each study taking this into account when reaching conclusions.Therefore, the following steps are defined:  Define search terms and keywords search strategies based on the defined research question;  Select sources (digital libraries) on which to perform search;  Application of search terms and keywords on sources;  Assess the validity of studies identified in the search; and  Selection of primary studies by application of inclusion and exclusion criteria on search results.
3. Data collection with the use of questionnaires method as research data sampling technique; and 4. Data processing and results presentation, which include calculation of various descriptive statistics, such as frequencies and relative frequencies of all values for each of the metrics, and construction of various charts, using the Microsoft Excel software.
The study is based on quantitative techniques according to the relevant literature by Maylor and Blackmon [39].More precisely, it follows their recommended analysis approach based on the data collection method and the dimensions used.It starts with the research questions that can be answered by collected data.The next step is the definition of the process for collecting and analyzing data.The final step is looking for trends and patterns, i.e. interpreting data.

Literature Review and Background
In addressing the literature review and background, multiple research streams that are associated with this study will be examined.The first section will define open data and related initiatives.It will be followed by benefits, risks and impacts of opening government data.After that, the issue of open data portals, their description and classification will be presented and discussed.It will be followed by open data portals quality and evaluation requirements.Finally, the literature related to benchmarks and models for evaluating the progress of open data portals will be examined.

Open Data and Related Initiatives
The topic of open data is generating interest among practitioners in the public sector as well as in the private sector [20] and continues to grow driven in part by pressure for increased public sector transparency and in part by the current enthusiasm for big data and data analytics [9].Open government acts as an umbrella term for many different ideas and concepts.Open data ecosystems are often government ecosystems, as much open data are published by governments, although elements for these ecosystems can also be provided by the private sector.Open government While the implementation of open data initiatives seem similar across governments (i.e., mostly through a centralized web portal, where datasets can be downloaded by the public), governments may have different motivations for embarking on open data initiatives [20], [24].Recent studies have also shown that current initiatives employ different approaches for providing data and exhibit important limitations such as data duplication [23], [24].Therefore, Sayogo et al. [44] conducted an in-depth evaluation of selected cases to justify the application of a proposed framework for understanding the status of OGD initiatives.Reggi and Ricci [43] explored the information-based strategies that EU Regions and Member States are implementing when publishing government data on the Web.Cohesion Policy and its Structural Funds, which involve all EU Regions and Member States, are the ideal context to verify the presence of different approaches to the publication of government data.They identified three approaches: user centered, which shows the effort to make these data understandable by non-technically oriented citizen, clearly represented, and accessible to users; stewardship, which is defined by many desirable characteristics aimed at assuring accuracy, validity, security, management, and preservation of information holdings; and re-user centered, which emphasized the importance to download the data in a machine-readable format and other characteristics related to data quality.Lee et al. [29] then examined open data initiatives in the some of the world's most innovative countries and noted that they can either be government-led or community-led.
Open data are a piece of content or data if anyone is free to use, re-use, and also redistribute themsubject only, at most, to the requirement to attribute and share-alike.Most of open data are actually in raw form.However, republishing does imply citing the original source not only to give credit but to ensure that these data have not been modified or misrepresented [19], [20], [26].Linked data then describe a method of publishing structured data so that it can be interlinked and become more useful through Semantic Web technologies such as Uniform Resource Identifiers (URI), Resource Description Framework (RDF), vocabularies and ontologies.Linked data are a way of publishing data in such a way that they can facilitate the interaction between different data sources or developing advanced value-added e-services by combining different datasets from multiple OGD sources; also, the value of any kind of data increases each time they are being re-used and linked to another source, and this can be facilitated and triggered by providing informative and explanatory data about each available dataset, i.e., metadata [14], [15], [17].These ideas then gave rise to the development of Linked Open Data (LOD), which are the combination of both, to structure data and to make them available for others to be re-used without any restrictions.Data interlinking practice is highly recommended for lowering technological and cost barriers of data aggregation processes [15], [17], [50].
Use and re-use of these data means using them in new ways by adding value to them, combining information from different data sources, making mash-ups and new applications, both for commercial and non-commercial purposes [13], [15], [30].The core idea behind OGD is just very simple: government data should be a shared resource.Making data open is valuable not only for the government departments that collect and release these data, but also for citizens, businesses and other parts of the public sector, because OGD has limited value if these data published are not utilized, which means involving stakeholders and focusing on developing sustainable ecosystems of users [17], [52].Basically, in an open data network there could be cooperation of various stakeholders to facilitate the use of OGD.Also, there could be competition between businesses using open data, for example, to obtain open data endusers as (paying) customers for services that they have developed based on OGD.There could also be competition between open data providers, since they may want to promote their organization by stating that they open larger amounts of data or more datasets than other public sector institutions do [29], [54], [58].

Benefits, Risks and Impacts of Opening Government Data
Both Janssen et al. [20] and Ubaldi [47] provided a comprehensive discussion of the challenges in the OGD domain.Kucera and Chlapek [26] presented a set of benefits that can be achieved by publishing OGD and a set of risks that should be assessed when a dataset is considered for opening up.Also Cowan et al. [11] [20], [56].For end-users and society in general, open data will help to obtain and integrate required information more efficiently and successfully manage the transition towards a knowledge-based economy and information society [5], [50].However, Yang and Kankanhalli [54] stated that despite public institutions actively promoting the use of their data by organizing events such as various challenge competitions, the response from external stakeholders to leverage OGD for innovative activities has still been lacking.Also, the findings of Zuiderwijk and Janssen [56] are in agreement with the claim that results of data re-use are not discussed and only little feedback is gained by data providers (public sector).Therefore, public sector institutions must have processes in place clearly defining which data to share with the users in which formats, at what time intervals, and under which licenses, ensuring no restrictions on re-use of these data [49].
Ideally, making these data available on the Web would lead to more transparency, participation and innovation throughout society.However, just publishing the data on the Web is not enough.To truly advance the open society, the publication platforms need to fulfill certain legal, administrative as well as technical requirements [4].In practice, gaining access to raw data, placing them into a meaningful context, processing and extracting valuable information from them is often extremely difficult.As a result, during the last couple of years different solutions have been developed to support the whole lifecycle of these data re-use, i.e., data discovery, cleaning, integration, processing and visualization [17], [23].According to [56], open data process consists of all activities between the moment that data are starting to be created and the moment that data are being discussed, including the activities to publish, find and re-use them  [22]: information transparency, collective impact, data-driven efficiency and data-driven innovation.All the mechanisms are dependent on the private and public sector, together providing the motivation, opportunity and ability to generate value from data.They also claim that the motivation, opportunity and ability of individuals to use data for value generation are influenced by [22]: the incentives provided; the level of technical and legal openness of data; the maturity of data governance; the general data-related capabilities in society; and the technological maturity and prevalence.These models were later used and improved by Bílková et al. [3] and Máchová and Lněnička [34] to evaluate the impacts of open data in the economic, educational, environmental, health, politics and legislation, social, and trade and business development.
Maier-Rabler and Huber [36] discussed the impact of open data on the relations among citizens, public sector, and political authority to engage them in collaboration, co-decision, co-development and shared responsibilities, while Geiger and von Lucke [15] analyzed the added value of freely-accessible government data and discussed challenges of OGD for public sector at the different administration levels.
Through the last 10-15 years, various e-government development frameworks and indices have been introduced to help assess the opportunities and challenges of e-government and related open government initiatives.The early 2010s has added new indices to the e-government research, which are focusing on the evaluation of open data impacts.These are, e.g., the Web Index and the ODB index produced by the World Wide Web Foundation (W3F), Open Knowledge Foundation's (OKF) GODI, the OURdata (Open, Useful, Reusable Government Data) Index introduced by the Organization for Economic Co-operation and Development (OECD) and the Public Sector Information (PSI) Scoreboard (PSIS) by the EU [35].These frameworks also have influence on the proposal of a new benchmarking framework.

Open Data Portals, their Description and Classification
One of the key issues in adopting open government is the accessibility of open data, which are generally collectively provided in open data portals [53].Therefore, one of the first problems to be solved when working with any data is where to find them.In using data, user needs exactly the right dataset, i.e., with the right variables, for the right year, the right category, etc. [50].According to the definition, open data have to be well described and in a good quality for others to transform them into knowledge and make them useful [10].In the last few years, an increasing number of governments have launched open data portals, specialized websites where a publishing interface allows datasets to be uploaded and equipped with high-quality metadata [47], [50].The open data portal is a web-based system used to collect existing datasets from multiple sources that may be in different formats, and publish them on user-friendly dashboards that users may view, download and access via an Application Programming Interface (API).With userdefined tags, these datasets are organized into a searchable catalog [25], [50].It is operated by a catalogue operator, which could be a government agency, citizen initiative, etc.Each portal offers different datasets that directly reflect data availability to public disclosure [27].The actual dataset is not considered part of the catalogue record, but the catalogue record usually contains a download link or web page link from where the actual dataset can be obtained [12].Each dataset can comprise several data sources [50].
Open data portals categorize open datasets according to their domains, providers, formats, and other properties for better accessibility of the data [53].Open data portals also usually feature keyword search and various browsing interfaces to help users find relevant datasets and retrieve corresponding metadata to describe the institution releasing the dataset as well as content of dataset in addition to geography, jurisdiction and time period of data [51].Dataset format also needs an immediate attention as it may lead to lot of issues of interoperability and integration [51].Other issues that are related to the context of the dataset concern completeness and exhaustiveness, the representation of open data, the validity, the reliability, the clearness and comprehensiveness and the provision of reports about analysis of these data.In line with these content related issues, the overall data quality should be taken into account [13], [16], [19].Data standards, codes, vocabularies and schemas are also important aspects of datasets [51].As an alternative to making raw data directly available for download, several portals offer web-based data APIs that enable developers to access data within their applications [28].A sufficient description of the portal should clearly distinguish themes from keywords.While themes are always chosen from a controlled vocabulary, tags are not [33].Furthermore, a machine-readable file format is important to support automatic tools.Features regarding validity, quality and granularity are required to support a wide range of use cases and enable analysis results of high quality [4].
Not too long ago, metadata were only a concern of information professionals engaging in cataloging, classification and indexing.However, nowadays, there are many more creators and consumers of digital content which also needs to be cataloged [17].Without sufficient metadata, such as descriptions or tags, neither manual nor automatic search can find the dataset and it will not be helpful for any user [4].Metadata structure of the data portal summarizes common properties used to describe each dataset across the selected portal.It mainly includes attributes such as the dataset's name, description and the URL of the actual sources, i.e., files or service end points.Using these metadata, users can quickly find datasets they need with searching and filtering features [50].In terms of metadata semantics, the most important initiative that a data portal should accommodate to facilitate interoperability is a RDF vocabulary named Data Catalogue Vocabulary (DCAT) by the World Wide Web Consortium (W3C).By using DCAT to describe datasets, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogues [12], [33].Some authors also proposed their own DCAT RDF vocabulary as an interchange format to enable standardized description of data catalogues, such as Maali et al. in [33].
Based on the geographical coverage (administration level), open data portal can be divided into the following groups [27]: local, which is owned by cities/towns or with only city/town coverage; regional, which is owned by a regional authority (county government or federal state government) or with regional coverage; national, which is owned by a central government institution or with nationwide coverage; and international, which is owned by an international institution or with the international coverage.Lněnička and Máchová [31] then extended this classification by adding a new level of open data aggregators as a basic and the most important category of data catalogs [19], [31].Another categorization of open data sources can be made based on the web paradigm, i.e., they are based on the traditional Web 1.0 paradigm or the more recent Web 2.0 paradigm [1].Based on the maturity of open data portals, Colpaert et al. [10] proposed a five stages system to represent the main function or affordance that the data portal is built or used for.These stages are ordered by the investment of time needed to be able to fully implement the stage.The categorization starts with portals linking to various datasets and continues towards a metadata portal for both the datasets and the re-use of the datasets.The fourth category takes care of the data publication itself.Finally, a data hub is set up where data become a common resource.
The open data portal is one of the solutions that should be used to significantly improve discoverability of free available datasets [27].However, many governments focus on the development of a national OGD portal as if it was a higher priority than developing technical infrastructures to open up government data for others to use [47].
Infrastructures may improve the use of OGD by providing insight in how individuals can participate in data re-use and in the quality of open data [57].Understanding the preconditions for effective OGD in a specific context is essential to set up websites that enable value creation, and lies at the core of the government data publishing responsibility.Much of current criticism on national OGD portals is based on the fact that governmental interest appears to be on presenting data in a particular fashion, which distracts from, and thereby limits, the increasing provision to users of data that they are really interested in using for their own purposes [47].Janssen et al. [20] surfaced several factors inhibiting public use of open data such as the lack of explanation of the meaning of data, and the lack of knowledge to make sense of data.Martin et al. [37] presented seven categories of risks associated with open data portals: governance, economic issues, licenses and legal frameworks, data characteristics, metadata, access, and skills.Also, various factors from institutional to technical seem to affect the development and implementation of the OGD portals at the national level [20].Therefore, it is sensible to argue that different nations have different capabilities in developing and implementing their OGD efforts [44].

Open Data Portals Quality and Evaluation Requirements
As the amount and the variety of data sources are increasing, it is important to create good metadata (descriptions, geographical coverage, limitations, etc.) in order to allow stakeholders, who may not be domain experts, to easily search and consume data.In a sense, the notion that all data disclosed should have quality or be intrinsically good is self-evident.But such a concept is not easy to pinpoint in the context of open government, and the requirement for data quality may be considered as encompassing several characteristics [32].The risk of low (meta)data quality affects the discovery and consumption of a dataset in a single portal and across portals.On the one hand, missing metadata directly affects the search and discovery services to locate relevant and related datasets for particular user needs.On the other hand, incorrect descriptions of the datasets pose several challenges for their processing and integration with other datasets [48].
The literature dealing with data quality provides different classifications of the data quality attributes, depending upon the perspective of the authors and the context tackled.On the other hand, in order to assess data quality a growing tendency towards considering the users' point of view exists.In fact, the most common definition of data quality is data that are fit-for-use, i.e. the ability of a collection of data to meet user requirements [7].Data quality is usually described in the literature by a series of quality dimensions that represent a set of consistency properties for a data artefact [2], [6].Batini et al. [2] published a detailed and systematic description of methodologies to assess and improve data quality.Methodologies are compared along several dimensions, including the methodological phases and steps, the strategies and techniques, the data quality dimensions, the types of data, and, finally, the types of information systems addressed by each methodology.Caballero et al. [6] focused on the evaluation of data quality.They claimed that more than ever the need for assessing the quality-in-use of datasets gains importance since the real contribution of a dataset to a value creation can be only estimated in its context of use.The most important characteristic for assessing the level of quality in use of heterogeneous datasets is consistency, which is divided into three parts: contextual, temporal and operational.Further evidence supporting the importance of the data quality can be found e.g. in Tien [46].
Measures on data quality can be applied in the open data domain.Open data domain-specific data quality criteria and measurements emerged recently to evaluate the quality of open datasets as well as portals [53].According to the literature analyzed, open data portals, in order to fulfill quality informational needs, should address the whole range of entity types and requirements possible in each country institutional arrangement [32].Also the semantics and language that each open data portal is tied to, is one of the most common and inherent quality challenges [42].
Research on the quality of open data portals has confirmed that a first wave of these portals mainly provides basic functionalities for uploading and downloading data [1], [8].Existing portals and infrastructures often lack opportunity for data users to participate in improving published datasets [1], [57].Lourenço [32] defined these eight key characteristics for open data portals: quality, completeness, access and visibility, usability and comprehensibility, timeliness, value and usefulness, granularity and comparability.Although open data success strongly depends on the quality of released datasets, there is a wide variety in the quality of the released datasets and users may also be concerned about the quality of open data [37], [42], [57].
Recent experiences also show that the quality of the catalogue records might affect the ability of users to locate the data of their interest.Therefore, Kučera et al. [27] discussed some limitations associated with the quality of the catalogue records, e.g. the metadata in the catalogue records is insufficient or correctness of the provided metadata is not checked, and proposed relevant techniques for its improvement.Open data publishers and users are often not aware of each other's needs and activities.For instance, many open data providers are primarily focused on making data available and do not know which format is preferred by users and how the way that they publish data can stimulate the use of open data [56].Lourenço [32] then identified a set of requirements that dataset in the data catalogue needs to fulfill in order to contribute to the transparency of public agencies and allow for the accountability of the public sector institutions.Such requirements concern the type of entities covered by the dataset, the type of information types provided, the information seeking strategies supported and some qualitative aspects of open data provided.Kučera et al. defined four requirements of dataset quality [27]: accuracy-all information in a catalog record should correspond to the data described by the record and all the catalog records in the catalog should be accurate; completeness-all mandatory attributes of the record should be filled in and all published open government dataset should be registered in the catalog but there should be no duplicate catalog records; consistency-same terms or concepts should be used to classify data of the same type or category, unknown or missing information should be handled in the same way across the whole catalog; and timeliness-all information in the catalog record should be up-to-date and all the catalog records should be up-to-date.
In order for users to assess data quality, they need to understand the nature of the data and because data producers cannot anticipate all users, the provision of good quality metadata is as important as the quality of data themselves [55].However, the role of public institutions in open data strategies and initiatives does not lie solely in the release of data.Other than increasing the variety and improving the quality of data made available to the public, there have been concurrent efforts by public institutions to motivate the use of open data for innovation activities by external stakeholders [40].Kučera et al. [27] identified two types of strategies for improving data quality; namely data-driven and process driven.The first involves directly modifying the values of data, such as correcting invalid data values or normalizing data.The second involves the redesign of the data creation and modification processes in order to identify and correct the cause of quality issues, such as implementing a data validation step in the data acquisition process.Various efforts already exist to study different aspects of open data portals which are the main platforms to publish and consume datasets [48].

Benchmarks and Models for Evaluating the Progress of Open Data Portals
Since the launch of the first open data portals by the United States government in 2009 and the United Kingdom (UK) in 2010, an increasing number of countries have launched similar open data initiatives and data portals to make it easy for the public to find and use these data, which are available in a range of formats and span through a wide range of domains [54].This is in line with the findings of Umbrich et al. [47], which observed that the number of datasets and sources is continuously growing.Examples for the increasing popularity of data portals are OGD portals [20], data portals of international organizations and Non-Governmental Organization (NGO)s, scientific data portals as well as master data catalogues in large businesses [14], [19].Numerous countries, including a good number of EU Member States, have followed along with some local (e.g.city) governments [50].Many of these portals use Comprehensive Knowledge Archive Network (CKAN), a free, open-source data portal platform developed and maintained by Open Knowledge.As a result, they have a standard powerful API, which raises the possibility of combining their catalogues to create a single worldwide entry point for finding and using government data.Others then include Drupal Knowledge Archive Network (DKAN), Open Government Platform, Socrata, Prognoz or Junar.Similar to digital libraries, networks of such data catalogues can support the description, archiving and discovery of data on the Web [14], [50].As well as the official public and private sector sponsored portals, there are numerous unofficial sources of open data, usually compiled by citizens, communities or aggregators.
Kalampokis et al. [24] revised existing e-government stage models and proposed an OGD stage model, which provides a roadmap for OGD re-use and enables evaluation of relevant initiatives' sophistication.Solar et al. [45] then proposed an open data maturity model to assess the commitment and capabilities of public institutions in pursuing the principles and practices of open data, which has a hierarchical structure consists of domains, subdomains and critical variables.Alexopoulos et al. [1] developed a new model of the open data portal by extending its functionality using a wide set of capabilities for data processing, enhanced data modeling (flat, contextual, detailed metadata), commenting existing datasets and expressing needs for new datasets, datasets quality rating, users groups formation and extensive communication and collaboration within them, data linking, upload of new versions of existing datasets and advanced data visualization.Van der Waal et al. [50] described the key functionalities of open data portals and presented a conceptual model to make these portals the backbone of a distributed global data warehouse for the information society on the Web.Charalabidis et al. [8] presented and validated a methodology for evaluating these advanced second generation of ODG infrastructures and open data portals, which is based on the estimation of value models of them from users' ratings.They concluded that the highest priority should be given to the improvement of the data upload and data search-download capabilities, since they received low ratings from the users, and at the same time they have high impact on higher layers' value generation.
One of the first comparisons of the selected open data portals was conducted by Maali et al. [33] in 2010.They aimed to identify commonalities and overlap in the structure and to document challenges and practices.However, only seven data portals from five different countries were compared.Sayogo et al. [44]  Zuiderwijk and Janssen [57] then evaluated the usability of participation mechanisms and data quality indicators for open data portals with the use of six criteria, while Lourenço [32] assessed whether the current structure and organization of seven open government portals is adequate for supporting transparency for accountability.The author introduced a set of requirements identified based on the key characteristics of desired data disclosure from the literature on open government and transparency assessment.These requirements were used as a framework to analyze the structure and data organization of these portals.Yang et al. [53] compared categorization structures of open data portals by investigating the coherence, i.e. similarity, of the datasets in the same category.Braunschweig et al. [4] presented a survey of existing OGD platforms, focusing on their technical aspects.They studied over 50 portals operated by national, regional and communal governments, as well as international organizations, focusing on features such as standardization, discoverability and machine-readability of data.
Lněnička and Máchová [31] evaluated selected national open data portals.However, they compared only the EU Members States with the use of five criteria.Lněnička [30] later extended the list of portals to 67 countries, but with no further comparison.Petychakis et al. [42] analyzed the OGD sources developed in the EU27 from a functional, semantic and technical perspective, in terms of their thematic content, licensing, multilingualism, data acquisition, data discovery, data provision and data formats.They concluded that most of the datasets of the European OGD sources are published without a clearly defined or open license and about half of these OGD portals in their user interface support the native language of the corresponding country, while the other half are multilingual (they support one or more foreign languages as well).
However The first group of functions is focused on the lists of a portal's datasets, groups, organizations or other objects such as tags.Only free tags (tags that don't belong to a vocabulary) are returned.License list then returns the licenses available for datasets available on the portal.The second group then focuses on the searching process, i.e., search for packages or sources matching a query satisfying a given search criteria.This action accepts Apache Solr search query parameters and returns a dictionary of results, including the related datasets that match the search criteria, a search count and also facet information.Get an activity stream of all recently added or changed packages on a portal.Get a list of the site's user accounts and the roles for members of groups and organizations.Estonia (two licenses), Finland (nine licenses), Island (two licenses), Netherlands (eight licenses), Poland (six licenses), Romania and Uruguay, both with one own license.However, only selected datasets are published with one of these licenses.This issue was already mentioned in [42].
Also 39 countries offer the package search function through their API, 38 countries offer the resource search function and 38 countries offer the function to get an activity stream of all recently added or changed packages on the portal.Furthermore, the function user list should be available only to an authenticated user.However, as can be seen from the Figure 6, only 17 countries require the authentication of the user.The other countries make available information about their users such as name, created, email hash, activity streams email notifications, state, number of edits, number administered packages, etc.Finally, there are three basic member roles on each portal, which is powered by CKAN.These are: admin, editor and member.

Evaluating the Quality of Open Data Portals Using Benchmarking Framework
Based on the literature review conducted in the previous parts, three technological changes had a direct impact on the proposal of the benchmarking framework.The first technological change is the growth of broadband technologies and the speed of computational devices.They allow public sector institutions to interact with larger databases and develop services to personalize search and use their data more efficiently.The second technological change is the widespread use of social media and wiki platforms to create content, exchange ideas and best practices, or share, publish, discuss and collaborate with information.The use of these technologies by public sector institutions, citizens and businesses transforms the use of government data and the relationship between the stakeholders.The last technological change is the emergence of big data technologies to analyze, collect, and process large amounts of government data, which led to increase the potential uses of government data and their publishing and sharing.This change positively affected mashups of these data and improved collaboration using new and more reliable data.
The benchmarking framework presented in the Table 1 is based on the systematic literature review and authors' experiences and knowledge gained in the first part of the open data portals' evaluation.It follows the perspective of quality dimensions and metrics defined by Batini et al. [2].The proposed framework is divided into two parts.The first one focuses on the general characteristics that consist of technical dimension, availability and access dimension, and communication and participation dimension.The second one evaluates the general characteristics of datasets and their metadata quality.In total, 28 complete criteria (metrics) are defined.This framework was firstly introduced in [30], but based on the discussion and feedback received, it was further modified.Some metrics were defined and described more detailed and one criterion for the general characteristics of datasets was added.
Each criterion was converted to a question to be included in a questionnaire to be distributed to users, e.g.: This open data portal provides information about the authority, which hosts the portal and the governance model or institutional framework supporting data provision models.These 28 questions are evaluated on a five point Likert scale to measure agreement or disagreement with such a statement (1 = Strongly Disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly Agree).Each question can provide a score from one to five points, for a total score ranging from 28 to 140.
The evaluation questionnaire was initially tested by two associate professors highly experienced in quantitative research in the information system domain.They found it understandable and did not report any important problems.Then, 10 postgraduate students were trained in the capabilities of open data portals and their quality aspects.The training session, which lasted about one hour, consisted of two blocks: theoretical and practical.The theoretical block included the PowerPoint presentation about open data portals, evaluation requirements and benchmark criteria to be used for the evaluation.The presentation was followed by a question and answer session.The practical block included a sample procedure showing how to perform the evaluation.After the end of this training, they all filled the questionnaires in paper form.The users' evaluation data collected through the questionnaires were then processed to obtain the mean value for each portal.The results are then presented in the Table 2.In the first three columns, there are all evaluated dimensions for the general characteristics of the portal, followed by the sum of these scores.The average score for the general characteristics of the dataset can be found in the fifth column, followed by the overall score.

I. General characteristics of open data portal
List of metrics Description of the requirements for the quality evaluation 1. Technical dimension

Authority and responsibility
Portals should provide information about the authority, which hosts the portal and the governance model or institutional framework supporting data provision models [32], [44], [47].1.2 Data management system Portals should provide information about the data management system, which is used to power the portal [25], [33], [44].

Language
Portals should offer more language versions to gain more users (attention) and improve the overall quality of this portal [8], [42].
1.4 Free of charge Portals should provide that all datasets and services are available free of charge and without any restrictions under open licenses [40], [52].

Availability and access dimension 2.1 Number of datasets
Portals should provide the number of datasets they include [25], [33].

Number of applications (re-uses)
Portals should provide number of applications developed based on the open data re-used [47].

Search engine (filter)
Portals should adopt and make visible an overall organization structure and provide strong dataset search capabilities and selection tools using different criteria for browsing through categories and browsing through filters [8], [27], [32], [33], [42], [47].

User account
Portals should support user account creation in order to personalize views and information shown [8].

Thematic categories
Portals should provide thematic categories of the datasets provided by the portal [27], [42].The portal should clearly distinguish categories (themes) from tags (keywords) [33].Same tags should be used to classify data of the same type and category [8], [27], [32].2.7 Tags (keywords)

Forum (feedback)
Portals should provide an opportunity to submit feedback on the data from the users to providers and forum to discuss and exchange ideas among the users [8], [42], [47], [57].

Request form
Portals should provide a form to request or suggest new type or format type of open data [27], [33], [57].

Help (usability)
Portals should include high quality of documentation and help functionality to learn how to use the portal and improve the usability [8], [44], [47].

Frequently Asked Questions (FAQ)
Portals should provide a FAQ section to help resolve any potential issues [47].

Social media
Portals should be connected to a social media platform to create a social distribution channel for open data.OGD users and providers can inform each other about what they did with and learned from a dataset [42], [47], [57].

Title and description
Datasets should be provided together with their description and also how and for what purpose they were collected [8], [27], [40].

Publisher
Datasets should be provided together with their publisher to verify authenticity of their source [33].

Release date and up to date
Datasets should be explicitly associated with a specific time or period tag.All information in the dataset should be up to date [27], [32], [33], [40], [52].

License
Datasets should provide license information related to the use of the published datasets.Datasets that doesn't explicitly have an open license are not open data [33], [40], [42], [52].

Geographic coverage
Datasets should be determined if the coverage of data is on the national, regional or local level [33], [42].6. Dataset URL Datasets URL should be available for each dataset [33], [42].7. Dataset (file) size Datasets (file) size should be available [25], [33].

Number of views (visits)
Total number of online views should be available for a dataset [47].

Number of downloads
Total number of downloads should be available for a dataset [47].

Visualizations
Datasets visualization capabilities should be provided, e.g., as visualizations in charts or visualizations in maps [42].12. User rating and discussion message Datasets should provide capabilities allowing to collect user ratings and comments on a dataset or to discuss conclusions based on data use [8], [57].instance, an open data user may change the language in the main menu, search for data, then filter the results and then zoom in on the dataset and then analyze it in detail.Most of these portals also offer a personalized approach to use the available functions by creating a user account.The evaluation also showed that some portals provide advanced engagement capability and features such as: blogs to facilitate the sharing of sources and knowledge, a forum to facilitate discussion on related applications and technology, and community groups.Overall, some of these open data portals constitute a single point-of-access for a broader open government policy and initiatives, opening up government data and new technologies, such as the current government use of social media platforms.Furthermore, although all portals provide some kind of metadata for each dataset, their usefulness is sometimes limited by the fact that there is no complete listing of possible metadata fields and their description is provided, several metadata fields remain empty or it is simply difficult to find them on the portal.

Journal of Theoretical and
From the evaluation presented in the previous sections, it may be concluded that most of the open data portals in the world have achieved some significant steps towards opening a number of interesting and useful datasets.Therefore, the findings presented in this paper are in the line with Alexopoulos et al. [1], which reported that the existing first generation of data portals offers mainly basic functionalities for searching and downloading data by the users of these data, and for uploading data by their providers.The majority of these portals offer simple free-text search and theme-browsing functions for the discovery of datasets.Only some of the best performing open data portals have recently taken advantage of Semantic Web by providing semantically enriched discovery services and only a few of them provide functionality to view datasets on a map, include dataset's rating and commenting or various types of charts.However, there are no functionalities for processing the datasets in order to improve them, adapt them to specialized needs, or link them to other datasets (public or private), and then for uploading-publishing new versions of them, or for uploading users' own datasets.
Thus, this paper indicates that there are some important improvements that should be made, in order to enhance openness in some countries, and increase the social and economic value that can be generated from them.Some recommendations in this direction are provided as follow.
As the findings of the quality evaluation showed, the quality of open data portals is affected by the version of the data management system, especially in the case of CKAN, and open data portals powered by CKAN or DKAN achieved better scores than the other portals.CKAN is released under several versions, which differ from each other in terms of features and service level.Each version offers various functionalities, which may improve the quality of open data portals as well as related datasets and their metadata.Therefore, it may be suggested that the other portals should migrate their services to CKAN or DKAN, especially in the case of developing countries.
Also, more datasets should be opened, from a wider range of thematic categories, such as more economic/financial datasets concerning government spending, which should lead to higher government transparency and accountability, and on economic activity of businesses.At the same time, more emphasis should be placed on opening datasets of some important categories that have been neglected, such as employment related datasets, agriculture and tourism related datasets or environment and planning related datasets, which are important for citizens' quality of life.More emphasis should be placed on the use of structured and machine-readable file formats in publishing datasets, and metadata (adopting existing metadata standards), and on the support of RDF and also Protocol and RDF Query Language (SPARQL), which will enable more effective browsing and discovery of datasets, and also linking and combining open data from multiple sources, leading to a big increase of their usefulness and value for various stakeholders.Furthermore, there need to be interactions with data providers, as users can request data from data providers and provide them with feedback after they have used these data.Information about the licenses that are connected to the use of certain datasets is also important in open data ecosystems, as open data users need to know whether the license allows them to use the data in the way that they want to use them.Also, the information about registered users should be available only to the authenticated user.This function may not be available through the API.Finally, interoperability of evaluated portals can be improved by providing metadata about shared identifiers and vocabularies and by reusing related elements.This lack of harmonization increases the need for quality standards.
Authorities responsible for these open data portals must therefore consider the specific needs of those looking for accountability-related data, and provide structures and mechanisms to address them [32].At the same time the functionality of the existing OGD sources should be enhanced, providing more advanced tools mainly for data discovery (so that potential users can find more easily and quickly the datasets they are interested in), data visualization (for instance on maps and charts, so that potential users can easily and quickly get a first understanding about the dataset, and decide whether it is worth continuing with a more detailed analysis) and users' feedback (so that OGD users can provide feedback to their providers, about the quality of the datasets they have used, existing weaknesses and necessary improvements, and needs for additional datasets -as the collaboration between OGD users and providers has been recognized as critical for the generation of value from them, e.g. in [21], [22], [32] or [56], [58].
In contrast to [30], [31], [33], [42], [44], [48], [51], this paper distinguishes between the quality of a single portal and the quality of datasets and their metadata on this portal.Furthermore, it uses bigger research sample than these studies.Finally, it presents important findings about improving the quality of these portals, especially in the area of security and information protection.
Limitations and reliability of the results presented in this paper may be affected by the related initiatives, legislation and applicable regulations, which do not allow the use of data for purposes other than those regulated and for which the data were collected.More precisely, only data complying with these requirements may be found on the evaluated open data portals.However, the framework was proposed with this in mind as it evaluates mostly the metadata description of related datasets.Also, the benchmarking framework does not evaluate any functions, which require the authenticated user.Furthermore, the evaluated portals have complex (and not always clear) organizational and categorization structures that are continuously evolving, which hinders any analysis conducted from an ordinary user point-of-view (that is, without inside knowledge of each portal).All the criteria in the benchmarking framework are considered of equal relevance.However, some of them may be more important than others.Although there is no comprehensive prioritization of these requirements in the literature, it may affect the final ranking.For example, if the general characteristics of open data portal would have the weight of 2/3 and the general characteristics of dataset would have the weight of 1/3, then the final ranking would be: UK, United States, India, Australia, Austria, Canada, Paraguay, Croatia, Russia, and France.
Another very important point is the question of users used to evaluate the selected portals.In this case, only the postgraduate students participated in the evaluation.However, there are also other stakeholders such as the local and state governments, businesses, citizens, etc.Therefore, the future research should be focused on the further study of these stakeholders and their requirements and what they think is the value of these open data portals.Also, the selected open data portals on the regional or local level of the public sector may be evaluated.Finally, the need for open data quality standards is once again being recognized.

Conclusions
This paper presents the benchmarking framework for the quality evaluation of open data portals on the national level as well as in-depth review of the issues, challenges and opportunities associated with these portals.For this purpose, the systematic literature review and methods of content and multi-dimension analysis together with quantitative techniques were used.
Open data portals are the interfaces between government data on one side and re-users on the other side.Basically, each portal should have a clean look with a search bar on the homepage, information about the authority, which hosts the portal, and the content should be written simply and structured into categories and also tags.Apart from making data available to stakeholders, the portal should also aim to engage citizens' ideas and feedback.The efficient development of these portals makes it necessary to evaluate their quality systematic, in order to understand them better and assess the various types of value they generate.Therefore, the proposed framework is the main theoretical contribution of this paper.However, the quality evaluation of open data portals is not only important for understanding the value the portals generate, but also for further improvements.Findings from evaluations can be used to further improve open datasets and open data portals, so that they become more useful and so that adoption will increase.Finally, there are also mentioned several observations and recommendations in terms of practical contributions of open data portals development drawn from the presented evaluation.

Figure 1
Figure1shows groups of countries based on the size of their open data portal.Most countries offer between 100 and 500 datasets.When compared to the others, Canada, the United States and the UK opened up more of their datasets to the public.These results are in agreement with the rankings of the GODI and the ODB index.In the GODI's rank order, the highest level of openness exists in the UK, Denmark and France.In the ODB's 2015 rank order, the highest level of openness exists in the UK, in the United States and Sweden.

Figure 1 :
Figure 1: Histogram of numbers of available datasets In the Figure 2, there are numbers of thematic groups available on the portals, which can be used to divide datasets into categories.However, this function is not compulsory, thus, 20 open data portals do not offer this function.More than 20 groups are offered by Australia, Brazil, Paraguay, Romania and Sweden.Figure 3 then shows the number of tags that are associated with related datasets.Figure 4 shows groups of countries based on the number of organizations, which participate in the open data portal.Generally, up to 50 organizations offered their data on these portals.Datasets from more than 250 organizations are available in the open data portal of Finland, Sweden and the UK.In both these figures, 10 open data portals do not offer this function through the API.Numbers of licenses for

Figure 2 :Figure 4 :
Figure 2: Histogram of numbers of thematic categories Figure 3: Histogram of numbers of tags

Figure 6 :
Figure 6: Histogram of users registered in the portal used several practical examples in an attempt to illustrate many of the related issues and allied opportunities of open data.Furthermore, different authors have confirmed that releasing government data in open formats creates considerable benefits for citizens, businesses, researchers, and other stakeholders to understand public or private problems in new ways through advanced data analytics

Results of Open Data Portals Quality Evaluation Only
[30] data portals on the national level are evaluated, no international, regional or local open data portals, and also no national statistical institutes or offices portals, which may also offer open data.The comparison is based on the rankings of the GODI and the ODB index from 2015, which evaluate the state of open data in selected countries in the world.Together, they cover 140 countries.The verification and validation process of the open data portal's existence consists of these steps: a keyword consisting of the name of the countries listed in the rankings mentioned above is inputted into general search engine Google together with open data or open data portal; the selected country is compared with the list available at other sources such as Site 1, Site 2 and Site 3; and the identified portal's URL is opened to examine whether it is in working condition.The portals, which were found at the beginning of 2015, are presented in[30].Later that year, a new searching process was established, and 24 additional open data portals on the national level were found.It increases the total number of open data portals to 91.
, none of the previously mentioned research papers evaluated the quality of open data portals and related datasets on the national level.Also, none of these research papers used more than 60 open data portals on a single administrative level.Therefore, a new benchmarking framework is proposed based on the previous literature review to solve these limitations.Only open data portals on the national level are considered to showcase the applicability of4 4.

1 Evaluating the Quality of Open Data Portals through API As
the first step in the quality evaluation of open data portals, a content analysis is used to compare selected open data portals on the national level.For this purpose, the API is used to access these portals.Therefore, only the portals, where the API is available, will be evaluated.Most of these 52 open data portals are powered by CKAN or DKAN.Postman Representational State Transfer (REST) client is used as the main tool to get JavaScript Object Notation (JSON) formatted results.

Table 1 :
The benchmarking framework for open data portals quality evaluation

Table 2 :
The results of the open data portals quality evaluation

Table 2
[18]UK received the highest score among the evaluated countries, which is in the line with the GODI and the ODB index.These indices and related reports rank the UK as the world's most developed open data ecosystem.It is also in agreement with the findings of Heimstädt et al. presented in[18].The UK is followed by India, which provides the highest quality datasets, the United States and Australia.Most of the evaluated portals are powered by CKAN, however, not always by the latest version.Therefore, they received a low score such as in the case of the Czech Republic.However, on the other hand, the best results were achieved by countries, where the open data portal is powered by CKAN or DKAN, except of the largest advanced economies such as France, Germany or the United States.Basically, on the best performing open data portals, i.e., the UK, India, the United States, Australia and Austria, there were found various functions that can be conducted, such as searching, requesting, viewing, downloading, analyzing, combining, visualizing and discussing different types of datasets as well as open data trends and concepts.For