In this episode we discuss:
- 3 perspectives that auditors should consider regarding data governance, and
- 3 considerations for auditors in implementing data governance within their functions/teams.
Related blog article: Controlling access to data within the audit team – open or closed?
Narrator: Welcome to the assurance show. This podcast is for internal auditors and performance auditors. We discuss risk and data focused ideas that are relevant to assurance professionals. Your hosts are Conor McGarrity and Yusuf Moolla.
Yusuf Moolla: Good morning, Conor.
Conor McGarrity: Hi Yusuf. How are you?
Yusuf: I'm very well, thanks, yourself?
Conor: Not too bad. What are we talking about today?
Yusuf: So today we're going to talk about data governance. Importantly, how we audit it or how we consider it as part of the audit work that we do. And then a few considerations for implementing data governance within the audit function.
Conor: Okay. So obviously very important topic, given the proliferation of data within organizations these days . So why is it important for auditors to know about data governance?
Yusuf: As auditors, we've seen a range of matters come up in terms of ethics and controls and security and privacy of data. So that's important and we'll talk about that. But what's really important for auditors, in particular, is understanding how data is used within the business to achieve, particular objectives. So both effectiveness objectives, but also efficiency objectives. And so, if we don't understand data governance well enough, or we don't understand what it is that we need to be looking for, then it's really difficult to have conversations with the business about data governance. The other reason that it's important for auditors to know about data governance is that: for a range of reasons that we've outlined before and we'll probably get into a little bit today, it's important to understand what data governance expectations there are within the audit team, as in the data that we use as part of our audits , where the expectations may differ a little bit from how data governance is organized or controlled or coordinated within the business. So both in terms of our input into data governance for the organization, but also how we think about it and how we implement it for ourselves. Because without good, solid data governance, one of the key things that goes wrong is that your data quality is poor. And with poor data quality, it's then difficult to actually use your data for competitive advantage.
Conor: You mentioned three very important principles there that we'll cover off in today's discussion. You talked about the use of data being effective, the use of data being efficient and whether the use of data is ethical. So obviously three key underpinnings for how we go about our work as auditors. Let's start with the first one, unpack that a little bit. When we're talking about the use of data being effective - what are we considering there?
Yusuf: Three things that relate to the business, and then we'll go into three things for audit's own consideration for their own practices.
So the first one there, as you said, is the use of data effective. And that is, there's obviously a range of data sources both proprietary and open that are available for businesses to use, and the sorts of things that business can achieve , by using data is getting a better understanding of customers. So knowing their customers better , so that they can serve them better. Understanding internal operations better, so understanding how marketing and sales and various operational aspects of the business are performing. But then importantly, being compliant. So, ensuring that our contractual obligations are met, etc.
So if we need to be able to use data to do that, then the question we need to ask is, are we using our data effectively to achieve those objectives?
So, first of all, do we have all of the data that we need to be able to answer the questions that we need to answer, to know our customers better, to understand our operations and to be compliant.
And then secondly, is the level of quality of that data sufficient to enable us to do those things effectively? , And then are we actually doing it? So are we actually using that data or are we making sure that we at least have a plan to use that data to enable those outcomes?
So data can enable all of those, and we're talking about , medium to large , organizations with more than say 50 FTE as staff. Even some smaller organizations are doing it; we spoke to a 32 FTE organization the other day , and they were talking about implementing a data governance program to enable them to use their data better. So even smaller organizations are thinking about it.
Conor: And so. You may have some organizations that have lots and lots of good, reliable and useful data, but just aren't tapping into that presumably. If you're not thinking about how data can help you better understand your customers or how you're operating, or how you're performing with an external focus, then maybe you're not using it effectively.
Yusuf: Yeah. So as auditors, we need to think about what the different potential opportunities for using data are and whether that is being exploited.
Conor: Okay. So that's using data effectively . Let's look at the efficiency angle. What do we need to be thinking about when we're talking about how well we're using data to get some efficiencies.
Yusuf: Two things. One is externally facing, the other is internally facing. Using data to enable efficiency involves things like enabling faster decisions and enabling faster service for customers. Where we are able to understand, for example, who is calling us when they calling us. So that we can serve them faster, but also getting to the data that we need to achieve the customer service levels that we need. So if we can't get to our data fast enough, or if our data is of poor quality and needs to be cleansed each time we need to make a customer service decision, or enable a customer service outcome, then we're in a situation where we're not using our data efficiently.
The other is: how easy is it for internal people, for our staff, our teams, to get access to the data that they need. And turn that data into a decision or turn the data into something that can enable a decision or information at a minimum. And so data quality and good data quality is one thing. So do we actually have the level of quality that we need, or does the data need to be cleansed?
But importantly do people actually have access? Do they know where the data is? Can they see the data or are we arbitrarily limiting access to that data? So efficiency enabled through quality. Again, because quality is one of the underpinnings. but also efficiency through enabling people to get access to and quick access to the data that they need so that they don't waste time going to look for it and they don't waste time cleansing it to get to an answer.
Conor: We've spoken about making data available to as many people broadly as possible instead of closing it down for access by only a few privileged individuals. But that sort of segues into the third point here. We always need to be mindful of the fact that our use of data needs to be ethical. That's something that doesn't go away .
Yusuf: Yeah, that's right. As auditors, we tend to go for this item first. , And we deliberately put this as the third item . So ethical use of data, and this covers a range of matters like security and privacy and ensuring that any decisions that are made are free of any bias . We often think about the way in which data is recorded and we take that, thinking around recording of data and we apply it to the use of data. And that's where we often have far higher than expected levels of, tightening of security than we need and that are necessary to enable the business to make decisions.
Because of way in which we think about transaction recording and limits that we put on who can actually record and approve transactions within ERP systems, we've applied that same thinking to systems of intelligence. And what that means is that we often get to a situation where there's too little access to data that's provided. So we need to get to a reasonable balance. How do we secure our data? How do we keep private data private? And then, at the same time ensure that those people that need access can actually get the access that they need. So some sort of balance that needs to be in place . And on the other side of the ethics discussion. Is, how do we make sure that when we using our data, we have the right level of quality and we have the right approach to controlling the various algorithms that we use and the various rules that we use so that we are eliminating bias as far as possible.
Conor: So one of the things that we've advocated for quite some time is open by default , so people can access certain data and look at certain data, unless there are particular characteristics of it such as privacy considerations, that mean that it needs to be locked down. Is that fair?
Yusuf: Yeah. So there's different approaches depending on the nature - there would be certain industries where you would need to take a different approach. For many that we deal with, open by default is something that could potentially be easier to implement , but still get to the same level of control that you need.
Conor: Okay. So what about as auditors ourselves when we're looking after our own data. What's the first consideration we need to think about there?
Yusuf: When we think about data governance for ourselves, this is a little bit different for various reasons, the main one being that we have a different level of access to data than most of the rest of the organization would have. The first thing is security. So how do we classify the data that we have? Can we use the organizational data classification system to classify our data? We need to think about that a little bit differently, where we bringing data sets together that the rest of the organization wouldn't be bringing together.
What are we actually doing with that data? So, the data becomes a little bit more sensitive when, we have identified potential fraud, and we record that as potential fraud or where we initiate investigations or where we have decided that there is certain information that the rest of the organization shouldn't know about - yet. So some audits are reasonably sensitive, and you don't want everybody to have access to that. Maybe not yet, particularly as the audit is progressing ,for various reasons. The most obvious one is fraud. So how we classify information, how we classify data might be a little bit different to how the broader organization does. How do we protect the data that we have? That is fairly similar or would be fairly similar to the way the rest of the organization would protect their data , except that if you have a different classification, you may need to have a different level of protection. So if you have a different classification category than the rest of the organization has, then there may be a different level of protection that needs to be applied.
Conor: So when you're talking about classification there, one of the most obvious examples is where there's been a linkage of various data sets that provides a whole new set of insights or opens up a whole new context for either how you see your customers or how you see your members of the public. Something that individually business units can't see themselves.
Yusuf: Yeah. So where you have data set A and data set B, and typically nobody else would have access to both data sets. Quite often as auditors, we have payroll data and we have procurement data. And often the procurement people will have access to procurement. The payroll people have access to payroll. And one of the scenarios is where we're combining those to understand whether anybody on the payroll is also a supplier, right? And that's a potential conflict of interest or potential for fraud.
So when we bring that information together, that's something that other people don't have access to. And so the data will be far more sensitive than either of the underlying datasets , individually. Yes. And then the second one is if there is potential fraud in there, then we don't want anybody to know that we are investigating that. And so that means that that dataset also becomes a little bit more sensitive than we've seen before.
Conor: Okay. So the first consideration for audit in their approach to data governance is security. What's the second thing we need to think about?
Yusuf: Okay, so the second thing is making data accessible to the whole audit team.
So again, balancing between accessibility and security. And this is, when we think about audit data governance and security. Do we need to secure data from those individuals that are not involved in a particular audit?
If we as auditors have signed up to the same, confidentiality, etc obligations, then we do need to think carefully about whether and why we are restricting other individuals within the team from seeing the data that is being used , or seeing the outputs of the use of data, for an audit. So that's accessibility and balancing off against the security expectations.
Conor: Okay. So a third point, and that's around the quality of the data that we use. Some of the things we need to think about there.
Yusuf: The key things that we need to think about in terms of quality are objectivity, credibility and integrity.
Objectivity is the way in which we use our data and the way in which we , collect and report on that data is unbiased. It's unprejudiced. And it's impartial. Because we're using data to report on or assess the way in which the business is being run, or a particular public sector organization is executing their functions or their obligations, we need to make sure that we're not just using individual sets of data without thinking broadly about the various data sets that we should be looking at to get to an answer. If we limit what we use, there's potential for bias and this potential for prejudice. We can't have that. As auditors, it's important that we explore the relevant available avenues to get to the answer that we need to get to.
So we can't decide on a hypothesis and then say: to prove this hypothesis, we can use this data set and therefore we'll use that dataset and prove it - where, there's potential for using a different dataset, that could say something different.
Conor: So let's not collect the data to prove or disprove our hypothesis on that basis alone.
Yusuf: That's right.
Conor: And that's why, it's about having those iterative conversations with the auditees and the business to say, is this a fair and reasonable data set for us to start asking questions of? Or there are other data sets available that would be useful and could also help us come up with an answer.
Yusuf: Yup. Absolutely. So that's objectivity. Then credibility. So this is: the data that we use is reliable. It's reputable. We can attest to the source and we know the context in which the data was collected and reported. So is the data credible? Will it stand up? If somebody asks us about where we got the data from or how we know that the data was well collected?
Will we be able to give an answer? Can we put our hand on our heart and say that this data is a trusted source and is true and correct to enable us to get to a decision? So that's the second one. Credibility. And then the last one is integrity. So the data is complete. It's accurate, it's timely. So examples are that the data is of sufficient depth and breadth for the task. So similar to objectivity, but the quality is maintained throughout the analysis that we do. So when we get it in, we check the quality, we make sure there's no missing data, that we haven't duplicated anything, that we're not missing years, we're not missing fields. And then also that when we use that data throughout the audit analytics process, when we're joining data sets together, when we're exploring certain angles that we maintain that level of quality because it's quite easy to lose quality through the analysis process by not considering whether you're maintaining the level of integrity through that process. So simple stuff.
Conor: Can you give a simple example of where data quality might be lost during the analysis process?
Yusuf: One of the challenges that we often see is where we're blending datasets and one of the common ways in which we have traditionally blended data together , is in Excel. Right. Now in Excel, you use a function called vlookup. So vlookup is where you have a sheet and you have another sheet that has supplementary data, and you want to bring the supplementary data into the main dataset.
Now, a vlookup is fantastic if you have - in data terms, a one to one relationship. So where there's only one potential answer for the join. The problem with the vlookup is that if you have a situation where there are multiple potential answers in the second data set because of the way in which vlookup works, you'll only get the first possible answer.
Right? So let's say you had a list of suppliers, and you had a list of transactions and you wanted to know on what date a supplier transacted with us. And typically you would join that, in a database for example. But if you're using a vlookup, you're only going to get the first transaction coming back. Okay. So that's, that's the first thing to know. The second is, even if you are using a database, if you joining two datasets like supplier and transaction, if you haven't checked the quality of either of those data sets, you could get into a situation where the join doesn't match, because you've got the supplier number wrong, or you've got a slightly different input in a data field. Your join then will technically work, and you'll get a whole range of answers, but you may be missing data in the result. So one of the things that you need to do is a reverse match, right? So, if I have 10 suppliers and a thousand transactions, I need to make sure that I have at least a thousand items coming out the other side.
If I don't, then I'm either missing a supplier or I'm missing a transaction. Equivalently, if I have duplicate data in my supplier listing and I join it to a transaction listing, I may end up with more than a thousand transactions coming out. What do I do with that? Now, this is a very simple example.
There's obviously, there's a lot more complexity that will go into it. And as auditors become more and more familiar with data and the use of data, this becomes less and less of a problem. But, it still comes up - where we haven't checked the quality of the data, we haven't checked the consistency of the data that we're joining.
And the joined dataset is either missing records or has duplicates. So that's where the integrity of the original data may have been problematic, but the integrity of the result is even worse. And we haven't checked . It may or may not come out in the wash. So you may go to the business and they may say you've missed something, which is probably a positive outcome in terms of getting to the answer.
It won't be a great outcome in terms of how the audit team is viewed. But if you miss it completely and everybody misses it, you may just have the wrong answer. You may just be giving the board the wrong answer, for example, and that would be a disaster.
Conor: That's a good example. And the important thing to remember there is you need to understand the limitations of whatever technique you're using.
Yusuf: It's something that is an ongoing challenge. The challenges change, so they become more complex. You need to put the QA process in place - ongoing checks of the way in which data integrity is maintained throughout the audit analytics process.
Conor: If you've got that good QA in place, then it should help remove any fear from actually trying to analyze some data.
Yusuf: Absolutely. Yup. We definitely want to do more of it. It's not difficult. It's not difficult to do the QA either. So similarly to when we do audits, right? When we do our audits, we always have a QA step in place. Nobody does an audit by themselves and then submits that. There may be, you know, 2% of the audit population worldwide that might just try something like that, but the majority of people don't do that.
You would have somebody else look at your work regardless of how experienced you are. And so similarly with analytics and with the use of data, just get somebody else to have a look. Look at it yourself, make sure that it makes sense. Get somebody else to have a look, get the team leader to have a look, etc. and that will then eliminate some of thar risk. Conor: Okay. So we've covered a lot of ground there in our discussion on data governance. Three main considerations for how data's governed within the business. First one being is data being used effectively. The second one being is it used efficiently?
And the third one, is it being used ethically? Quite often we place that maybe too high up the consideration chain when it should be looked at, equally with efficiency and effectiveness .
And then three considerations for how we , establish our own data governance framework. The first one being security obviously, and how we protect and classify and monitor ongoing use of that data we collect for some of our audit projects.
The second one is making that data we collect and analyze in our audits accessible to our team members , given that we're all subject to the same confidentiality requirements within an organization. And the third one, which is, an important principle and making sure that we've got good quality in our data and that we don't lose any of that quality through our analytical process.
Yusuf: There's obviously lots of guidance around what data governance is and what the various data governance principles and practices need to be. And so we're not going to repeat any of that because it's quite easy to just go and find it. As auditors we need to think about what we're doing a bit more broadly than what the business are doing, or maybe a bit differently, which is why we've been discussing it as , effectiveness efficiency ethical use of data.
So if we think about it like that, we may be able to guide the business better. Quite often the business are thinking about it purely in terms of what they see from various standards and governance frameworks.
Conor: Fantastic. Great conversation today Yusuf, on data governance. Thank you.
Yusuf: Thanks Conor. See ya.
Narrator: If you enjoyed this podcast, please share with a friend and rate us in your podcast app. For immediate notification of new episodes, you can subscribe at assuranceshow.com - the link is in the show notes.