Pro
Siirry sisältöön
AI

The dimensions of data governance in the era of AI

Kirjoittajat:

Ari Alamäki

principal lecturer
Haaga-Helia ammattikorkeakoulu

Published : 23.01.2025

Data is an asset for companies, but too little has been said about data governance in the context of the AI debate. Data governance involves both regulatory requirements and management practices, from data quality issues to its lifecycle. In the era of AI, data governance calls for well-managed, organized, and systematic approaches to data-related initiatives.

This article reviews and suggests practices for data governance while scaling AI initiatives within organizations. It is essential to understand the key dimensions of data governance and all the questions and responsibilities that an organization should consider regarding each dimension.

Data governance

Companies should collect, store, manage, and utilize data based on business-driven needs while focusing on the company’s competitiveness. This relates to data governance, which is a collection of methods, practices, procedures, and rules to collect, manage, and use data, while meeting the goals and strategies of a company and fulfilling the regulatory requirements of national and international policies and laws (see e.g., Alhassan, Sammon & Daly 2016; Aunimo, Alamäki & Ketamo 2019; Ruslan, Alby & Lubis 2022).

Data management is not a new topic or term, as modern societies have collected data for hundreds of years from taxes, citizens, and properties. Interestingly, old ships’ logbooks also include a lot of interesting data, from weather to route information. In recent years, the amount of data has expanded, but its value has not grown at the same pace.

Valuable data is typically associated with the core products, processes, services, and businesses of organizations, as it helps them steer and monitor various processes, automate activities, and make better-defined decisions. Thus, organizations should professionally govern and manage their valuable data and continuously develop their data governance capabilities. Böringer, et. al. (2022) show that the effective and systematic use of data in business affects the performance of the company.

Designing data governance practices

The development of data governance practices helps organizations determine which questions need answers and who makes those decisions (Khatri & Brown, 2010). Organizations aim to develop their management practices when data is seen as an asset. Data itself is not valuable if organizations cannot sell raw data to consumers or companies, like oil and gas. Companies often process and refine their own data, and the key question in this process is to find added value for internal decision-making processes.

The DAMA-DMBOK model (see e.g. Cupoli, Earley & Henderson 2014; Ruslan, Alby & Lubis 2022) is perhaps the most well-known data governance framework. Other pragmatists and scientists have developed their own frameworks (see e.g., Khatri and Brown 2010) or academic models (see e.g. Aunimo, Alamäki & Ketamo 2019; Otto 2011; Tallon, Ramirez & Short 2013) that include similar principles and practices as the DAMA-DMBOK model.

The key dimensions and practices are summarized in Table 1. Organizations should consider these while developing data governance in the era of AI. The dimensions help organizations manage data more effectively, safely, and transparently.

Table 1. The dimensions of data governance (scroll to see whole content)

There are many AI pilots and trials going on in several organizations as new AI applications are launched every day. Thus, data and AI regulation and ethical guidelines are becoming more important dimensions of data governance.

Rapidly growing disinformation and cybersecurity threats have raised data security concerns to new levels. As data is a valuable and private asset, potential data breaches might violate companies’ privacy standards and brand. Therefore, data security should be a key dimension of the data governance model. Additionally, data systems play an increasingly important role as advanced technological infrastructure solutions enable more effective data processing. Data lifecycle management also calls for more development initiatives since poor data quality wastes organizational resources from security, quality, and regulatory perspectives.

The dimensions of data governance

The development of data quality is one of the most important aspects of data governance. Data quality dimensions include the development of consistency, accuracy, integrity, completeness, reliability, and timeliness features and requirements. Data quality dimensions vary between classifications (see e.g. Alamäki, Mäki & Ratnayake 2019; Government Data Quality Hub 2021; Pipino, Lee & Wang 2002). I selected the dimensions described below as I believe that they cover the most relevant aspects of data quality.

Data consistency means that data needs to be described, marked, and scaled using the same parameters or measures in all processes across the organization. Thus, metadata descriptions are required, and the actors within the organization should have clear instructions and guidelines for those actions. Metadata definitions, data classifications, and categories become easier when the usage context and need for data are carefully described and communicated.

Data integrity focuses on cases where data should be e.g. combined and integrated with other data sets, but the trustworthiness of the original data should remain consistent and reliable. Numerical data is easier to use, integrate and combine with other similar data sets compared to natural language-based data, such as text, voice, or images. Thus, the reliability of data should remain the same whether it is used, modified or integrated. Additionally, human errors, data collection errors or damages like cybersecurity threats belong to data integrity (Cote 2021).

Data accuracy relates to the validity and correctness of information, meaning that accurate data reflects reality without the biases or errors of data. Data accuracy includes time-oriented and contextual relevance, as outdated data or data in the wrong context cannot be considered accurate.

Data completeness means that data should cover all necessary details and information in relation to the actual business needs. Incomplete data sets can be improved with additional data sets. A data and AI strategy should cover these requirements, ensuring the organization has enough data for statistically reliable forecasting, for example.

Data reliability describes how well data models and describes the reality from which the data was initially collected and saved. Additionally, the proper data should be used in analyzing specific changes and situations for ensuring reliability. A classic example is correlations, as certain things can technically or theoretically correlate with each other even though they do not have anything to do with each other in the reality. For example, the amount of ice cream consumed and drowning deaths in summertime correlates but there is no real link between them in the reality.

Data should be up to date, which means timeliness, as the rapid availability of data for e.g. analyzing market changes is a competitive advantage for companies in turbulent market situations. Thus, solutions should be developed to ensure that data is as real-time as possible, or its updating should otherwise be taken care of and monitored in organizations.

It is important for a company to know where and how the necessary data can be collected, acquired, or purchased. Additionally, it is often required to build an IT system that assists in storing and managing the data. Obtaining and storing individuals’ consent is an essential part of the availability and legal use of data. Without the consent of individuals, such as users, customers, or other persons, data cannot be stored and utilized unless they have otherwise agreed to the storage and processing of their personal data, or the organization has a legitimate interest in processing the data.

However, there is an increasing amount of open data or commercially available real or synthetic data worldwide. It is however good to know that the most valuable data is quite often the data that is directly related to the organization’s operations and environment.

The owner and responsible person or organizational function for the data should always be nominated, if possible. According to scientific research, this is not an easy task, as defining the data owner or responsible person is not an entirely straightforward task (Abraham, Schneider, & Vom Brocke 2019). In fact, responsibility might follow the functions and operations of the organization, meaning it depends on who is responsible for the process in which data is generated, stored, and processed. Thus, responsibility and ownership should follow organizational responsibilities rather than a single expert or manager who may change. In this way, ownership is not directly related to the ownership of the process, tasks or IT systems where the data will be managed.

However, the starting point for utilizing data should be that the company has the right to use it, either as the owner or through usage rights obtained by other agreements or contracts.

The AI Scalers project consulting SME:s in effective data governance frameworks and practices

Table 1 summarizes the key dimensions of data governance, including the most essential questions to consider and, example responsibilities in a typical middle-sized and large organization. Each organization should customize its data governance models and update them according to their own data and AI strategies.

Hopefully, this article will provide ideas for developing data governance frameworks and practices in various organizations by raising awareness of the key dimensions and questions that should be considered while designing, developing and evaluating effective data governance frameworks and practices.

We work on data and AI governance-related development actions at Haaga-Helia’s research and development projects. The AI Scalers research and development project, funded by the European Regional Development Fund, will consult small and medium-sized companies on these issues from 2025 to 2027 in the Uusimaa region of Finland. The project takes a special approach to AI design and development actions and multidisciplinary AI capabilities, where data plays a key role for companies.

References

Abraham, R., Schneider, J., & Vom Brocke, J. 2019. Data governance: A conceptual framework, structured review, and research agenda. International journal of information management, 49, 424-438.

Alhassan, I., Sammon, D. & Daly, M. 2016. Data governance activities: an analysis of the literature, Journal of Decision Systems, 25(1), 64-75.

Alamäki A., Mäki M. & Ratnayake, R. 2019. Privacy Concern, Data Quality and Trustworthiness of AI Analytics. In Ketamo, H. & O’Rourke, P. (eds.): Proceedings of Fake Intelligence Online Summit 2019, May 7, Pori, Finland, pp. 37–42.

Aunimo, L., Alamäki, A. V., & Ketamo, H. 2019. Big data governance in agile and data-driven software development: A market entry case in the educational game industry. In Big Data Governance and Perspectives in Knowledge Management (pp. 179-199). IGI Global.

Böringer, J., Dierks, A., Huber, I. & Spillecke, D. 2022. Insights to impact: Creating and sustaining data-driven commercial growth. McKinsey & Company.

Cote, C. 2021. What Is Data Integrity and Why Does It Matter? Harvard Business School Online.

Cupoli, P., Earley, S., & Henderson, D. 2014. Dama-dmbok2 framework. Dama International.

Government Data Quality Hub 2021. Meet the data quality dimensions.

Khatri, V., & Brown, C. V. 2010. Designing data governance. Communications of the ACM, 53(1), 148-152.

Otto, B. 2011. Organizing data governance: Findings from the telecommunications industry and consequences for large service providers. Communications of the Association for Information Systems, 29(1), 45-66.

Pipino, L. L., Lee, Y. W., & Wang, R. Y. 2002. Data quality assessment. Communications of the ACM, 45(4), 211-218.

Ruslan, I. F., Alby, M. F., & Lubis, M. 2022, September. Applying data governance using DAMA-DMBOK 2 framework: The case for human capital management operations. In Proceedings of the 8th International Conference on Industrial and Business Engineering (pp. 336-342).

Tallon, P. P., Ramirez, R. V., & Short, J. E. 2013. The information artifact in IT governance: Toward a theory of information governance. Journal of Management Information Systems, 30(3), 141-178.

Picture: Shutterstock