Welcome to the assurance show. This podcast is for internal auditors and performance auditors. We discuss risk and data focused ideas that are relevant to assurance professionals. Your hosts are Conor McGarrity and Yusuf Moolla.
Good morning Conor.
How's it going Yusuf?
All right. When we talk about open data, what are we actually talking about?
Different things, to different people. For our purposes, we talk about data that's freely available, can be used and accessible.. So generally that means stuff that we can get online, can be reused and re-distributed by anyone. Although sometimes there's obviously some requirements to attribute the owners of the original data.
I like to refer sometimes to the open data handbook that's available online and it's an organisation that has committed itself to the use of open data so it recognises three main components. The first one being availability and access. So it must be available on, but no more than a reasonable reproduction cost, they say, or preferably free. The second one - it needs to be able to be reused and redistributed. The terms of the data itself must mean that it that it could be used and even mixed with other data sets or linked and joined and analysed with other data sets for its purposes. And the third one is universal participation. That's where everyone regardless of where you are, who you are and what you're doing with it, should be able to use it for non commercial purposes or where you use it for commercial purposes you need to attribute it correctly. So that's, in essence, the three main things. Needs to be available and accessible, needs to be able to be reused and redistributed, and there must be no discrimination in who can use it.
Why is open data important to auditors - internal auditors and performance auditors?
It's another evidentiary source that's probably bean, but that was only come to light, probably in the past decade. Can a considerable value, too, not just how you do your audit, so it can be a new evidentiary source, but it's also an opportunity to shed some light on what other information is available to part your business. Ahs. You go through your assurance activities, so why is it used for firstly? Because it can. You can use it as a point of triangulation to to test whether or not the information in your own systems and in your own organisation is actually accurate, and secondly, it can be used to identify new ways in which you can do new projects for your assurance focus.
What are the potential outcomes for auditors in using open data?
The value of open data is still really not known. So when, for example, you're doing a performance on it, you've got an opportunity to actually educate some of the audience. You're dealing with it. But firstly, what opened out is available, and secondly, how useful it is so that that's probably the first thing. The second thing is that opened out is valuable too often to challenge prevailing assumptions. So you from a performance sort of perspective. One of the main things governments try to do is deliver services and programs based on assumptions at a point in time when that service or program was developed. By the time that's ruled, it could be 123 even four years after the original decision was made. The way in which the recipients are beneficiaries of that program mainly been the public, consume those services or we'll use that problem may have differed open data the likes of things like any benchmark in tunes that are made available these days on Openly online actually gives you some opportunity to see is what we have designed, something that's going to be actually used in the way that we have originally designed it. That's a long way of saying it gives you an opportunity opened out. It gives you an opportunity to actually test some of the original assumptions there were made on the outside.
If you already have hypotheses or at a high level criteria that you need to evaluate, then open that they can help you determine whether those have bean met,
I would say that open data is one of the evidentiary sources you would go to so you would probably in the first instance. Rely on information or dot is available within the entity you're performing. The performance audit over, but it opened out is definitely a strong point of triangulation to try and prove or disprove whether criterion has bean carried out in accordance with this design or whether my hypothesis is proved or disproved. That sounds like a conceptual statement on to some extent it is but opened out. It gives you that ability certainly from a external auditor, general's or comptroller general's viewpoint to be able to take not just the entities daughter that your looking out, but to join up with anything that's available outside and beyond that that may not actually be readily available within not entity or being looked at within that entity itself.
As a contrast - within internal audit, the open data that we would use would be a smaller part of the overall analysis that's done, than it would be for performance audit. And that's because, except for certain specific circumstances where public sector entities are involved or there's significant potential to use open data for proprietary internal audit work - the level of the value add of open data isn't as high. And that's because most of the open data is public sector related or has some sort of public sector focus. There are exceptions to that. Where it is used, it can definitely add value.
And that's totally understandable, because I think we're talking about two different beasts here. So, internal audit, for example, in the private sector, there's not gonna be much publicly available open data that might speak to the performance of that particular entity because just have another there might put you at a commercial disadvantage or, as in the public sector oddity nor performance auditing world in particular. Governments are moving towards transparency and putting that out. There are not something we need to tap into.
Interestingly, there's a move within private sector as well, spurred on largely by the public sector. But there are a number of industries where data is going to be needed to be put into the "open" domain, so it may not be completely open. Given the definitions that you spoke about earlier, they may need to be quite a bit of privacy and security associated with the way in which that data is then accessed. But the sort of open banking regime that's coming into play, which will be followed by a range of other industries, I think starting with energy and then moving on to telecommunications. And that's where proprietary data from individual institutions that can be associated with consumers will need to be shared between those organisations and a few others that are coming through newer sort of startups, Fintechs and other players.
You've touched on a bigger point there, and it's more of a strategic opportunity, as opposed to concern for internal and performance audit . With all this more open data becoming available, that means that the power is going to vest in the hands of the users, which is you and me and members of the public, Yusuf. So, I guess that creates a question in the minds of performance auditors and internal auditors. If the public has that information, that data available to them, then they're going to expect us as auditors to be using that wisely and giving them insights from what's available
internal, what don't generally have a very direct customer facing relationship. Most of internal audit work will be within the organisation, and they weren't really liaise directly with the organization's customers performance. Or it might be a bit different, though, so, yes, there'll be an expectation that the organisation does something with the data. I think that expectation is going to lay out Maura's ah commercial reality than as a consumer expectation. If you're not doing anything with the date and you can when somebody else is and is able to provide that value, you might just go to them because they can do something for you that your current organisation can't or they might use that data to figure out what it is that you you need. And then you're you away
will speak briefly about the types of open data that are right there some are quite obvious but yet overlooked by some of the team. So the most obvious 11 that we've relied on extensively over the past few years with our work is statistical data on that obtained through, for example, a census process, you know, households and individuals. And that is very useful data.
Quite a few of the statistical agencies, even though the actual census, so the you know the physical census would only be done every 5 - 10 years. In the intervening years, they would have projected estimates. You can generally get population estimates, and those population estimates will go down to - so in Australia, in the UK, the postcode in the US, the ZIP code and then various countries will have variations of that. You can get down to a fairly granular level, and that's useful for both performance auditors and internal auditors. That sort of information. So we've used it, for example, for internal audit. We've used it to identify the level of sales and the level of complaints and a whole range of matters that are customer focused where we have information on the customers. So if you have the customers, individual customers addresses or if you have a customer addresses and customer data, so to summarise that postcode level or even a suburb level, then you can use that population data and related population statistics to do comparisons between services that we provide and products that we provide and the level of potential that there exists within those areas. It's a little bit different, too. We're not looking for fraud. We're just looking for levels of penetration there.
The other lovely thing that public sector statistical agencies do is the often break down their data by industry type, which can be quite useful depending on what sector you located in. So it could be, for example, in the housing sector or could be in the motor industry or property industry or whatver and that's often really useful to be able to do some benchmarking for either your organization's performance or your jurisdictions performance.
Within internal audit to actually have other types of open data that we used that are fairly low in terms of complexity. So two examples the first of those is where you have businesses that you deal with, either as suppliers or as clients. You can actually get information from central business agencies that tell you whether those individual companies that you deal with ah legitimate are registered with them. I have actually maintained there registration over the years, whether they registered for certain types of tax - in the UK and Australia it would be GST, South Africa would be VAT, the US, maybe other types of local taxes. So there's a lot of information that you can get around particularly supplies, so you wouldn't use it as much for customers, but particularly supplies, to understand whether you're dealing with supplies that are legitimate and that we should be dealing with. The other example of fairly simple data is where you have information on where money is going to or money is coming from where you have a large customer base. So if you're getting money coming in from different banks and you want to know where my customers banking, whether the institutions that they use do we have a concentration of money's coming in with any particular organisation, and this is useful for resilience and crisis planning. You can use open data sets often you get information that comes in that just tells you what the but the bank code is, and it's different again. Details around bank codes illustrated. You have what you call a BSB in the UK it have something that they called a sorting code. What that can help you identify is based in the branch number bank combination, and this is usually open data that you can get. You can identify whether you have a concentration with any particular bank.
You've talked to her, but open biking information or open banking data on there's certainly a move at least some jurisdictions to make that more available. You just give it a couple examples high. That's useful. We talked about census data demographic statistical donna that most public sector statistical agencies collect on, make publicly available, which is really useful. Some of that you do have to pay for we've encountered a few times. One of the other data sources you said that I think is really useful is that held around property, household daughter that sort of thing we've seen a few few times in the past where there are, for example, peak bodies that collect that sort of data routinely and that could be really useful in particular. Look at things like household consumption, trans of consumption patterns, that sort of stuff to be able to profile activity and that and that sort of area. Sometimes you do you have to pay for that data, and now it's just a factor of the work that goes into actually collected. But but again for internal audits and performance, Or that that that is a really useful data set, that our assurance professionals out there should have a think about. So use of we've described a few examples there, of particular did open data sets that are really useful for internal audits and performance audits are again, as we always come back to in many of our discussions. Is the issue our own quality? How do we make sure that we maintain the quality of what we're getting in on the integrity of the data that we're bringing in?
It's really three key things that we need to look at to ensure that the level of quality has maintained. The first of those is Are you using the right timeframe? So is the data that you're using the open data that you're using. Does the time frame match the proprietary data that you're joining in too? Quite often. You're not gonna be using open data by itself. Right? So you're gonna be using it with some other proprietor death that you have. Are you actually joining it up when you're joining it up to that Proportionate. Are you joining it up based on the correct time frame? The second thing is, are there any missing items in the sequence of data that you have? So if you have daily data, let's say for whatever reason, are there any days that I must sing. If you have monthly data, are there any man's missing? And if they are, you need to work out how you're going to treat that. So are you going to fill in the gaps? Because you can use Lenny extrapolation and other techniques to fill gaps, particularly where the gaps are fairly granular. I mean, if you have a whole year that's missing humane, you too take a different approach. You may need to exclude that year from your analysis. So that's where you have gaps in the data. And then the third thing is making sure that the data is actually correct. So sometimes you get data in and the data is duplicated for whatever reason. So, you know, you may have ah, data set that has individual items and totals. So that means that your data is actually larger. It will be double what it should be. The other trap that you often encounter is where the data set that you download for some other reason. And this happens quite often with open data, and we don't know why. But we see that you have. You know, you may have a data set for 10 years, and one of those years, for whatever reason, is duplicated, so the information in there will always appear to be higher. So wherever you see these sorts of anomalies I have and you're expected to be, what you want to do is go in and have a look at the detail and make sure that you're not bringing something in that doesn't have the level of integrity that you want and then don't just chuck it away because you can't see any integrity in it. You fix those items so that you can actually use it. And those are the three key things to ensure when you managing quality within your open Taylor set so that you can join it properly.
And so they're all really important things. So the key takeaway is that even though we're proponents of using open data where possible, I guess you need to make sure that you make a good assessment against those three. Create area early on to make sure that you're not over investing time in its use.
There's a couple of things. One is that there are some organisations that produce open data, and by that I mean it's quite a few states and territories and countries and provinces that make open data available. Some of them actually have an audit step as part of the creation of the open data in Australia were not that sophisticated yet, but in other jurisdictions they actually have and audit step, and they publish the results of those audits. So they'll say, we've audited that the data that exists in open data portal and this is the result we found this percentage of the inaccuracies or this percentage of the missing data or whatever. If you have that situation, you can actually go and look at it. What the result of that, what it is and then work out how much you need to do to check the quality of the data that you're using,
which goes back to her much reliance. We can play a song, which is fantastic.
Yeah, just common sense audit stuff, right? So if we are using the data and we changing it up and checking it for quality and these things, quite often there's a need then to share the data that we've pre processed. How would we go about doing that?
Pretty straightforward from, ah, performance or a perspective, Some of the progressive performance sort of theirs these days in various jurisdictions are actually making some of that processed data available. The other website use of some might consider it a by product of the report itself, but it's actually a really important product in and of itself, and that's the linking, joining, analysing of those data sets because it can tell some fantastic stories. So that's one way in which those data are being shared, not sphere websites. And you know, there's all sorts of tools these days that make it interactive and clickable, not sort of thing. Some of the other ways are that auditors, general or controller's general are actually providing some of those cleanse and analyse data sets back to the entities that they have actually audited and said, This is what we've seen. This is what we've observed these, our findings, your data Wasn't that creating the first sentence? We've taken time as part of our audit work to do all this cleaning analysis. It's only fair that we give that back to you. Not again. That's another very useful on valuable byproduct back into those agencies themselves. So two things there one is for the public. Is it making the analysed, open data, and especially where it may have been linked with other open data or proprietary data? And so much is it's not private available on the websites, and the second thing is given someone that open data linked with an entity's own data back to the entity to have a look at.
S O. That's so in the first instance there that's basically creating more open data, if you like. Yeah, absolutely. With an internal would. Generally there isn't a lot of sharing of the process Data. The process opened it. Obviously you want to have anything going to the public. You haven't generally caught that open data from an organisation that you're going to be. Well, you usually won't get unless you have a very good relationship with with the external organisation, you're not gonna be providing feedback around the quality of the data very often. What you do have sometimes is that you would Then you may want to share how you went about processing the data with a team internally that we'll be using it. The challenge with sharing the data directly is that you don't want to be in a situation where somebody uses your your work directly and doesn't you know, cheque it themselves because quite often you can process the data, pass it on to management team, business team, and the risk you have days that they just rely on what you've done and go ahead and use it. And then you find out later that you made a mistake and that your mistake is now propagated through the organisation. and you can't say in an audit - you can't then, you know, audit yourself and say we made a mistake and we gave it to the team and they used it and they made the same mistake. So what you do is you say these are the steps that we followed. Have a look at the steps, have a look at this data and see if there's anything we're missing and then use that. And that's a bit safer because it means that people have to actually start thinking about what they're doing is supposed to just taking some data and using it blindly. So it looks a little bit different within the internal.
It does, but it comes back to the reasonable. In this test was that was the process. Thes goes followed, too. Do this analysis and come up with this outcome is it's something that we think is reasonable on. We should or could rely on.
Yep, yep, Yep. Said the business will need to work out. Can they actually rely on that? What do they need to do? Something else? You're absolutely sure. For internal audit purposes, there's only three or four key sources for the examples that I use. They're not difficult to find for performance audit . Where would you usually go to find open data
Government websites in the first instance. So a lot of jurisdictions these days use of have websites dedicated to open data they require there agencies are there entities to actually provide data to a central source and then that gets published online. So if you speak about Australia on DH, the UK and the USA, for example, many of the states and jurisdictions or even some of the lower level county councils, produce a lot of their transactional data online so that maybe from procurement data as to who's tendered and won contracts to performance data about - these are our main objectives for this organisation and this is how we are performing against those objectives, to other types of really operational data. But this is how many people are visiting our facility. So the answer is that a lot of this data is actually already being pushed out there by some of these more forward thinking organisations to things that I think is worth noting about that. The first thing is that demonstrates a commitment to transparency. So where for example, a public public sector organisation or government is willing to put performance. Dot online is always a big step, but the second thing is it's also a bit about accountability and not citizens or the general public can actually query on DH Interrogate Andi asked questions about where their money is actually going.
So can we spoke about accountability and transparency. But really, that is within the domain, of the public sector organisations that are looking to promote the transparency and promote accountability in the work that they do. What is it that an auditor that can do to help improve that level of transparency or help improve that level of accountability.
So use of your dead right In most respects, it's up to that public sector entity or department or agency as to what they want to publish online. In most respects, the important role of performance audit is to call it where there are deficiencies in the publication of the open data. So if I'm doing a performance audit, for example in the Department of Health on one of the 1,000,000 things we're interested in his public sector waiting times for operations and so forth and the Health Department does not have that information online. Although many other departments in our jurisdiction have waiting time for various of there main services online, it may be useful for performance audits given their independence and somewhat helicopter view to say, Well, look, we don't think it's reasonable for this department not to publish that information. So the very important role there for performance auditors and calling out situations where data is not being openly made available when it reasonably should be.
So auditors can help call that out. Look within internal audit, there isn't going to be that much going on. Within performance or definitely given that the performance audits are of those government organisations, so they, the organisations that are producing that open data would be the ones that are under the spotlight as well. There's a number of places that you can go, so go to your state or territory or country open data photo if you don't know what that is, or you can't find what you're looking for. Another way to find open data and we use this quite regularly is Google's open data set search. Pretty straightforward you just type in datasetsearch.research.google.com - you can then actually search for datasets within there and it'll give you different options as well.
OK, great conversation today Yusuf, on open data. So, in summary, we'd say that if you're not using open data now for your assurance projects, you need to ask yourself, Why not? We've explained the value of how some of these open datasets can really contribute to your projects. The last thing to do is to basically start exploring.
If you enjoyed this podcast, please share with a friend and rate us in your podcast app. For immediate notification of new episodes, you can subscribe at assuranceshow.com - the link is in the show notes.