How EY is Disrupting Legal, Risk, and Compliance Management with Data Science

EY’s fraud investigation and dispute services division reports that 10% of your transactions account for over 50% of your team’s investigative time, and over 90% of your organization’s risk.

Join EY’s Todd Marlin and Jeremy Osinski for this webinar where they discuss how they’re overcoming this challenge. They’ll review EY’s latest global survey results on forensic data analytics trends and share best practices around:

Looking to speak with someone about your fraud prevent and compliance management initiatives? Request a demo today.

00:00 Hello, everyone, and thank you for joining us for today’s webinar, Whodunit in the Digital Era: How EY is disrupting legal, risk and compliance management with data science. I’m Hayley Matusow with RapidMiner, and I’ll be your moderator for today’s session. We’re joined today by Todd Marlin. Todd is the principal within EY’s fraud investigation and dispute services practice. He is the Americas Financial Services Forensic Technology Leader, as well as the Americas Forensic Data Analytics and Data Sciences discipline leader. Todd is a trusted adviser to the C-suite, board of directors and General Counsel around the complex issues surrounding data, security, and the legal and compliance risks. Todd’s main focus areas are forensic data analytics, cybersecurity, computer forensics, and electronics discovery. We’re also joined today by Jeremy Osinski. Jeremy is the Senior Manager within EY’s Fraud Investigation and Dispute Services practice. Jeremy specializes in utilizing emerging technologies and big data to better detect fraud, improve compliance monitoring, and leverage operational efficiency. Jeremy focuses on forensic data analytics, electronic discovery, and information governance services. He has led investigations and proactive matters for several of EY’s largest clients, specifically within the financial services, the public sector and energy. Jeremy also serves as an innovation leader for EY’s forensic data analytics business across the Americas. We’ll let them get stated in just a few minutes, but first, a few quick housekeeping items for those on the line. Today’s webinar is being recorded, and you’ll receive a link to the on-demand version via e-mail within one to two business days. You’re free to share that link with colleagues who are not able to attend today’s live session. Second, if you have any trouble with the audio or video today, your best bet is to try logging out and logging back in, which should resolve the issue in most cases. Finally, we’ll have a question and answer session at the end of today’s presentation. Please feel free to ask questions via the questions panel on the right-hand side of the screen. We’ll leave time at the end to get to everyone’s questions. On that we’ll go ahead and pass it over to Todd.

02:06 Thanks a lot, Hayley. And welcome to everybody at this webinar this morning. We’re very excited to be here, a very very important topic. Let me just take a moment and briefly introduce today’s objectives. We’re going to cover three main themes. First, how organizations today are blending and drawing correlations from multiple data sources. Second, strategies for leveraging machine learning with your data. And then third, the advantages of using a microservices-based analytics platform. And obviously, there’s a lot of depth to each of these points, we really look forward to getting into it. We’re also going to touch upon recent results from our global forensic data analytics survey. Through that survey, we analyze the responses of 745 executives globally, who shared their perspective on various topics related to forensic data analytics. Through the analysis, we can see a number of specific ways in which companies can measurably improve their legal compliance and broad risk programs, as well as the maturity level of their FDA capabilities. It’s certainly an exciting time for companies, as we’re all in the midst of this digital transformation, which is creating new opportunities. Certainly artificial intelligence, the robotic process, automation, and advanced analytics are just some of the new possibilities being explored. And clearly, data science plays a critical role. Go to the next slide.

03:43 So, datafication. What is datafication? This is the fact that everything in the world today, or virtually everything, is captured by data. As you go throughout your day, whatever way you may live your life, you’re being recorded digitally, whether you walk in to work in the morning and you spike your badge at the turnstile, to what you buy at lunch, to who you interact with online. Data comes in a variety of shapes and sizes, whether it’s your social media, whether it’s your internet browsing, and, of course, these are just a few examples [inaudible] screen. But that’s not all. It’s also all over, so from the corporate perspective, that data lives within the side, inside of the walls of your company, it may live on cloud platforms you have, it may live with third-party providers. And certainly, it’s even more complex for the individuals. So, datafication, at its basic level, is taking all aspects of life and turning them into data. And I think if you really thought about it, you would struggle to find something, some aspect of your life that is not recorded in a way digitally, which, certainly from the corporate perspective, creates a great wealth of opportunity to understand your customers better, but also understand where your risk is. So you can take steps to mitigate it. Next slide, please.

05:20 So where does the data come from, from a corporate perspective? So here on the screen, you have representation of a bunch of different, let’s call them areas, if you will. The first column is compliance investigation, second column is discovery and cyber, third being insurance claim, the third being government contracts. And you can see, on the top, it really highlights the number of different types of business risks. So, just taking the compliance investigation area, which is near and dear to my heart, surely you can see you have internal investigation, all different sorts of investigation, litigation, regulatory requirements, regulatory response, different types of risks such as FCPA and bribery, lookback, things of this nature. To really understand your system, whether it’s an investigation, whether it’s a litigation, whether it’s a particular risk, you have to get in to the data. And on the bottom, you can see that there is a wide range, and this is certainly not meant to be all-inclusive, however representative. So you have ERP or your financial system transactional data, which is at the heart of it. You have customer data, you have travel and entertainment data, certainly as you move [inaudible] discovery and cyber, discovery being how e-mails collect and are reviewed and produced in litigation, cyber I don’t think needs an introduction. And right at the top you have e-mail. E-mail is a data source. And it’s challenging to work with, but, again, data science is at the heart of how you might deal with some of the risks there, whether it be predictive coding or other machine learning techniques to find some more information. While, just moving [inaudible] the right, insurance claims, sales data, unstructured documents, expenses, government contracts, you can see contract management data. There’s a wide gap, and certainly as you start to look into different issues related to any and all of these, the types of sources may vary. This could include social, this may include datas with your third parties or third party hosted systems that you use to provide certain business functions. And again, taking a step back, what does this all mean? At the end of the day, from a corporate perspective, there are systems that are used to affect different business processes to enable the business of your organization. And that’s what those systems are designed to do. And they result in the creation of all of these sorts of datas, and others. And the question is, how do you make the most effective use of that to mitigate your risks in the areas that we’re here today to talk about? Next slide, please.

08:26 So, as I mentioned, we had a Global Forensic Data Analytics Survey, which we surveyed 745 executives globally, from companies of all sizes. And we’re going to get in some of the results of that, and also tie it to a number of practical examples, to use cases where there’s a particular risk. And we’ve brought subject matter expertise, human, data science, and technology, to really find a better way. And so that’s what we’re here to talk about today. So, Jeremy, take it away.

09:07 That sounds good. Thanks, Todd. And prior to jumping in, we’ll launch a polling question here. We see a number of participants on the webcast here, and many more trickling in from academia, the corporate environment, law firms, regulators, and so on. So we’ll just launch a question here to really get a sense of sort of who’s in the audience, who’s in the crowd today. Are you currently responsible for managing legal risk and compliance concerns around your organization? So we’ll give this a few minutes to tabulate here. And while we do that, we’ll introduce the case study we’ve prepared today to really– and there’s certainly, as Todd mentioned, a lot that could be said, a number of different use cases, and data sources, and so on. But to really help today’s– frame today’s conversation, we’ll focus on perhaps one of the most highly publicized recent areas in the legal risk and compliance domain over recent months. And this one actually comes to us from the financial and banking and capital market space. And really, over the last 18 to 24 months, we’ve seen several very widely publicized regulatory inquiries within the banking and capital market space, particularly around the area of alleged banker misconduct. Following several alleged scandals, regulators are now really looking at sales practices and associated incentive compensation programs that influence bankers. And so this really includes tellers in the branches, employees in the call centers, and other customer-facing employees. And the alleged activities and focus include areas such as enrolling customers in accounts and various sort of upsell-type fee-generating programs or services without the customer’s consent, allegedly to improve that employee or that branch’s or that district’s sales metrics, KPIs, and, in some cases, even to hike up personal commissions. And so we believe, in this area, there’s really a significant opportunity to leverage analytics, to bring machine learning to the table to practically manage these risks. And I think what’s interesting is those sort of started with the banking and capital markets space to really –

11:25 One question– we just need to go ahead and close the poll, sorry for the interruption.

11:29 No worries, Hayley. I’ll close the poll here. And as I was saying, whether coincidentally or not, we’ve really now seen similar focus spread to other industries, most notably life sciences, recently consumer products, insurance, and a few others come to mind. The data sources here really include a number of different ones. Transactional data, AML data sources, HR, sales integrity, exit interviews for terminated employees, employee and customer concerns and complaints, it’s ethics log data, and so on, as well as investigative outcomes. We’ve also polled in e-mail, and voice mail, and so on. And I think what’s really interesting about this specific problem is that it’s a relatively new one, from a data science perspective. 18 to 24 months ago, this wasn’t really a topic or an issue in the top concerns for a chief compliance officer or chief risk officer. But it’s really evolved, and now in fact is. So what that really means is that, historically, generally of course, there was little significant investment in data scientists looking at this, curating advanced models, and so on and so forth. So what that means is that, in some cases, there was very little to go by in terms of tried and true analytics models and machine learning. However, banks had somewhere to start, right? Tips, whistleblower complaints, and some of the work that our forensic risk professionals did provided at least banks, in some sense, a sort of– generally where their risks are. However, finding and really quantifying the problem really became the sort of classic needle in the haystack type issue. And I think what’s interesting here is we know that actually a majority of folks in the webinar today answered that they are not currently responsible for managing these risks. Not too dissimilar from what we found in working with many of these clients, where typically the folks charged with looking at this data historically were not the ones looking at it from a legal risk and compliance perspective, but really, almost overnight, suddenly became the teams that were now charged with really dealing with it and resolving these risks and quantifying them.

13:50 And so we’ll take you through a few of the processes in RapidMiner. This particularly looks at opening new customer deposit accounts for certain segments. The following screens, of course, are based on mock and sanitized data, but we’ll show you just a few of the very, very quick screens that many of our clients and data scientists interact with. And part of doing that, actually, we’ll start with these here, and this is really where our analysts will look for potentially anomalous employees through running a series of indicators or scenarios on the data. Keep in mind that many of our clients and those in our EY team bring years or decades of expertise in anti-fraud areas, though are not necessarily analytics professionals or data scientists. However, they play a crucial role here. And so, through the web interface we’re showing here, this essentially is where those practitioners would go in and essentially utilize, as visual analytics, cross data [inaudible] search, and other microservices-based capabilities to identify, as the scenario here, anomalous employees. For time’s sake, we’re going through this very quickly here, though typically these exercises involve spending some time really understanding the data from an investigative perspective, bringing that domain expertise and using a series of tools to identify anomalies. And so, having done so here, now let’s really identify what’s driving those. And so, in this case here, we’re showing the ability to look at different types of cases, and assign those to different queues based on various risk areas. And so, we’ll assign these to different team members, And now we really flip over into RapidMiner. And so, behind the scenes here is really looking at kind of what’s driving these interesting anomalies. And one of the ways we’re doing that is using logistic regression relationships. So we have a dependent variable, and then usually a continuous sort of independent variable, or several variables, for converting, rather, the dependent variables to a probability score at the employee level. And so, this RapidMiner process really demonstrates taking that data set, the data set we just created, by tagging, creating those cases, as well as all employee-level data, the two are merged together. And then, we’re creating what’s showing up here, is a sample set, or an output data set rather, with a predicted risk score and a confidence level at the employee level. And so, there are a few important nuances here as well, as you might have noticed from the RapidMiner process. The data is subset again several times, and that’s really done to inflate the number of potentially anomalous employees within the data. We’re also in some cases, and I see a question coming in about this, using dimensionality reduction in some cases as well, to really reduce the number of random variables under consideration. But this sort of ability to really, in a very transparent way, build processes and have those informed by an investigative insight, has really, I think, helped, and many of our clients being able to quickly understand, quantify this issue, and then really construct transparent models to be able to identify the magnitude and extent of it.

17:43 And so RapidMiner’s processes, as we’ve noted there, really form one of the principal pillars of our microservices-based analytics platform. We saw a little bit of it there in the previous demonstration. Now RapidMiner, as many of the folks on the webinar know, allows us to really create high-quality and transparent processes. But again, from an investigative context, how does that get wrapped into an overall work flow? How can we really use and present, essentially in near real time, the results of our models? And using high-impact data visualization, text analytics, cross-data-set search, and case management, to really kind of help to tell that story. We view RapidMiner as one of the many important building blocks in our larger ecosystem, something we call EY Virtual Analytics Infrastructure, or EY Virtual. And EY Virtual is really our flagship microservices-based platform, encompassing an evolving number of commercial and proprietary technologies into a singular platform. We can then quickly customize and create solutions through a crisp and hopefully, as you saw, easy-to-use digital interface for our clients. To do things such as tag transactions, run models, kick out results, and tag those accordingly. Decision makers through EY Virtual can then also gain a consolidated view of alerts, displayed along common sort of unified time dimension. And so that information often as well is really enhanced with additional contextual market data, external intelligence feeds, and so on. And as you saw, users, even non-technical analysts and folks who would not call themselves data scientists or data analysts, can drill into a specific area, select filters from an extensive list, and then as well as add their own insight through being able to tag transactions and add context around those. That really naturally becomes a really great corpus for sort of continuous learning or continuous modeling and machine learning. It also, as Todd mentioned, the data sources we’re often focused on today represents our clients’ most sensitive data sets. And so, the flexibility, being able to quickly take what we call different Lego pieces here and bring them together, affords us a number of different deployment options, particularly as the landscape around data sensitivity and all the various rules and regulations globally continue to evolve. We’ve been able to deploy EY Virtual in RapidMiner within our EY cloud, so that’s our– on our EY secure network with single and multi tenant, sort of cloud options. We’ve also been able to deploy behind our clients’ networks, in which case clients can directly connect to the environment, and even in some cases been able to deploy within our clients’ cloud environments. So that flexibility, that ability to quickly take different components and plug and play has really been transformational in being able to, as in the bankers surveillance use case, quickly tackle these problems. Now, I’ll show you a bit more of how this works moving forward, but for now, Todd, as we’ve mentioned, the landscape’s really changing. We’re increasingly leveraging a broader number of data sources, and the integration between those becomes more critical. And so, perhaps, Todd, if you want to take us through some of the survey results to help paint the picture of kind of what we’re seeing and hearing from our clients in that area.

21:35 Sure. Thanks a lot, Jeremy. And so, what you see on the screen here is a key takeaway from this year’s Forensic Data Analytics Survey. And the key point is, better integration needed to gain insights from the data. So 46% consider that they’re getting a consistent global view from multiple data sets remains a real challenge in using data analytics. And certainly, while there’s been an increase in the collection of data from multiple data sources, companies are still struggling with this. Companies are struggling to get the data to be able to work with it, find the outliers, to then combine it, enrich it with other data, and this is really due to a number of different issues. First, where is the data located? Two, what is the data, what does it mean, and who understands it? Three, what form is it in? And how can I get it? And certainly, one of the challenges, of course, is that there isn’t a single owner for this data across the organization, right? At the end of the day, what you see is that you have the data typically owned by departments or by region or geography. So you have to span business lines, span departments, span geography, often even span companies today, given the nature of how infrastructure’s set up. And so, this creates a very challenging landscape when you’re trying to combine sets. So, first, there’s sort of the need to be able to sort of more effectively collaborate and affect that. Then there’s sort of the technical bit. So, to be able to do that with those advanced sort of prediction and correlation that Jeremy’s talking about, depending on the modeling techniques that you’re using, the presence of outliers could be a key, key factor. So, really understanding your data, understanding what it is, what it isn’t, what bias may exist in it, and how that might relate to results that you’re going to get, these are all critical factors. And you can only really get into that if you go right to the source and have a good understanding of how is it generated, how is it collected, and who owns it? Because these are things that you might not be able to tell from the data without– by just looking at it. Then, of course, you can more effectively do the technical bits that Jeremy was talking about. Jeremy, back to you.

24:09 Great. Thanks, Todd. And really, as Todd mentioned, investigations often evolve. And that’s really driven by the emergence of new facts and the ongoing analysis of data sources, as well as interviews being conducted. And so, let’s go back a bit, let’s go back to our bankers surveillance walkthrough, and really show and demonstrate exactly how users can explore cases, how our analysts, rather, can explore cases and really add their own contributions and feedback and tagging and so on. And so, this scenario here, back to EY Virtual, analysts can open the case they’ve created or that have been assigned to them. And, in this instance here, be able to quickly explore the underlying data, so, for instance, employee and branch type details. They can also pull in additional files, they can explore relationships amongst the data, and so on and so forth through the web interface. We can also add notes, upload files to the case, and assign additional team members. Again, really presenting, hopefully, a crisp and easy-to-use digital layer for our non-technical users to interact with the results of our models, as well as contribute to them going forward. And so, what I mean by that is, when it comes time to really close a case, the analyst fills out a questionnaire, much like the one here. And this is to really record their findings. Now, we won’t get into the weeds of sort of all different variations of the questionnaires and work flow and so on, but this is typically representative of one that seems to have used to go in and add their results, quantify any potential issues, and so on. Now this becomes a corpus for creating continuous feedback loops to really help us learn on an ongoing basis for our investigations. So the process we’re showing here, the feedback loop process, really utilizes two different data sets. One being all the variables which we’re tracking, and using to score bad behavior at an employee level, and the other one containing employees with known issues, which we just marked accordingly. And so, from a design perspective, you can see this kind of half in here. Now, the overall output when we go ahead and run the model essentially contains the variables at an employee level that we’re modeling against, and this time, the weight that each of those variables has in a logistic regression model. The results here, too, are also clean and sampled as we’re showing, using a bit of oversampling, and again, this is to really inflate the number of potentially anomalous responses in order to ensure that our logistic regression models are scoring appropriately, and so on. And so, typically, this is where our data scientists would spend quite a bit of time with our investigative teams, reviewing the results of the model here. And then feeding these results back into EY Virtual for subsequent rounds of review. So, let’s talk a little bit too in terms of who’s really benefiting from the use of forensic data analytics. Todd, perhaps we can kind of go back to the survey results for some context there, in terms of the many beneficiaries beyond the traditional sort of risk management group within an organization?

27:33 Yeah, yeah, I think that’s a great point, Jeremy. And it certainly, what we saw from surveying was that the average number of beneficiaries was seven. And that’s not just compliance and risk, but certainly corporate executives, finance, even the board of directors are noted as the top three beneficiaries of forensic data analytic activities. Which is not surprising, if you think about it. As it relates to controversy and risk mitigation, and beginning to understand your data, it stands to reason that your top executives in the organization are going to be beneficiaries and consumers of this. And so, at the bottom, you see that there’s a chart that shows the range of beneficiaries from, at one end, corporate executive management, on the other end, supply chain. So, if you think about it, every creator that affects business, sort of taking a step back, every business process and part of the company that generates business, utilizes data, creates data affect the business. Well, there’s risk and controversy potentially associated with each of those. So they themselves are interested in managing and mitigating those risks, as well as all the different corporate functions that have to direct accountability for that, be it [inaudible] compliance, internal audit, the business unit, the corporate executives and so on. So certainly, as you as an organization structures how to approach these using these sophisticated approaches like we’re talking about, it provides the ability to give different lenses wherever you sit inside the organization, accelerated insights. In order to get there, of course, it’s really important to attack those challenges that we’re talking about earlier around data collection, aggregation. Let’s move on to the next slide.

29:47 That sounds good. Thanks, Todd. And really, over time, users can become more sophisticated and utilize many of the models we’ve talked about, as well as their insights to better assess risk on a go-forward basis. And so, in many cases, data flows in on a continuous basis. For many of our clients, this is a bankers surveillance and supervision space, we’re collecting and receiving data either through offline extracts or increasingly through continuous data integration and feed on a daily or weekly basis. Now, with scoring and run on top of that data as it’s received, now we believe really in giving our users of course with the right permissions and training and so on, transparency into how that scoring works, and really enabling, again, our end users to contribute to the RapidMiner models being used. And so, in this view here, our users with the right permissions can log in and really adjust the scoring model. This is typically where we calculate the risk score, either on an employee level, or branch level, or account level, or vendor level, or what have you, in terms of specific indicators and analytics tests organized into certain categories. So again, from a bankers surveillance perspective, we’re looking for anomalous transactions, employee behavior, and high risk customer risk. And how we’re modeling these are through specific algorithms and scenarios. And so what we can do is really enable our client users to come in, and essentially turn those different algorithms on or off as well as [inaudible] them up or down, kind of based on what they’re seeing in terms of false positives, but also, more importantly, as the risk universes in an organization evolve, they have the ability to really modify the model in a near real-time way to reflect those evolving risks. So, for instance, if inactive accounts are suddenly front and center, we can turn the dial on that test up. If we’re seeing a large number of false positives from, let’s say, the first two here, accounts with a low number of transactions and low or no activity, we can separately either tune the weighting of those down or turn those tests off entirely. And then, really, there’s a lot to be said and certainly meant the algorithm scenarios where we often run, but from an open architecture and microservices perspective, this really has been a game changer, an exciting one for many of our clients. As you might imagine, they can really contribute to the models, they can add, delete, and modify tests on the fly as their risk universe actively evolves. And so, the process here, behind the scenes a bit, demonstrates actually, an NLP model in this case here, which, again, takes data containing several variables, influenced by that tuning which we just showed, as well as known anomalies, to then create a model which we can run on a go-forward basis to score incoming data in a routine way.

33:02 So the first data set here contains variables at a employee level that will be scored, really, in our model. The second data set contains all employees that have been scored and marked as either anomalous or non-anomalous. Now, the output here essentially presents a logistic regression model that’s scored against a set of variables from that first data set. And so, the coefficients you’re seeing here on the screen, what these really represent is the overall effect and impact on the model, as well as the statistical significance for each of the variables within that model. Two things to point out here as well. We’re importing, as you saw in the process there, both data sets, and we’re merging them at the employee level. We’ve also done some very interesting things in terms of exploring, really, bots and process automation to further streamline and enhance that. And then we’re really selecting the attributes that we’re going to be modeling off of, and so you can see that through some of the nodes here, as well as the variables that we’re modeling against. And this is done through the selection nodes here, which really are fed by the filters you saw in EY Virtual. Finally, the cross validation model here, this really creates the model and scores it. The output, of course, is the model, which we’re then running on a periodic basis, which, as I mentioned, could be daily, weekly, and so on. And I think what’s great here, too, is that this too is very transparent, very easy, too, to understand, and has in many of our clients allowed our data scientists, our data analysts, to really work hand in hand with the folks with the domain expertise to really bring powerful tools, techniques, and analytics, machine learning and so on, to these risk areas in a way that is quick and achievable and easy, too, to understand, as well as– and easy, rather, excuse me, for client users to be able to manipulate and influence on their own as well. So we’ve demonstrated today a number of the ways we can used advanced analytics to really accelerate the use of forensic data analytics within the fraud detection realm. I’ll now turn it back over to Todd, just for a few final thoughts on how to really more broadly accelerate the use of FDA within organizations. Todd?

35:46 Thanks, Jeremy. So, as we’ve been talking about through this whole webinar, there are some– really, there are some key themes, in terms of really driving an effective forensic data analytics program. First, certainly starts with the data. 73% have pointed out that better access to centralized data repositories and cross-functional collaboration are key to driving a more effective data analytics program. And– forensic data analytics program. And as I mentioned, if you think about it in terms of the complexities of how that is achieved, it really requires setting the tone at the top and having the right sponsorship. So, on the far right, you’ll see having secure, strong leadership and sponsorship from the executive management is key, because without that, you will not be able to get and attain the level of cooperation cross-functionally, typically, to achieve the tasks that are required. Worse, it’s not just about technology, [inaudible] it’s about [inaudible] the data and the people. So, you can see that investment in the right technologies is certainly important, but as well, having the right people to help configure and design how to use those technologies in the process. And then, as well, having the right people to understand how to create these models and get the most out of your information, and frankly understand the limitations and the bias of the data, potentially, so that the results can be put in context. And certainly we’ve all seen some of the– when, frankly, some of that goes wrong. So it’s really important that you have the right folks that are translating the highly technical items into layperson explanations, so that people at all different stations within an organization can understand what the data is, what it isn’t, and how it’s being used, and what’s being predicted and for what purpose. So that you can use [inaudible] and frankly doesn’t create additional risk by the way that you do it. Jeremy, any final thoughts from you?

38:25 Sounds good, Todd, thanks. And certainly agree, and particularly what– I’m seeing a few questions just around data quality, how we handle poor data. There certainly are ways to do that. Within RapidMiner, we’ve certainly explored and found efficiencies and capabilities such as [inaudible], the ability to fill data gaps. So the impute capabilities, so being able to take an educated guess and populate missing variables. And the bottom line is that there’s certainly great potential for forensic data analytics. Bad data need not sort of be a barrier towards being able to leverage FDA and really– and secure and realize some of the benefits from a really robust FDA program.

39:15 Jeremy, just to jump in there, though, everything you said is accurate. However, where the rubber really meets the road is when you’re dealing with some of these data science techniques, particularly machine learning and some of these advanced statistical approaches that we’ve been talking about.

39:31 It’s also really important to understand what those models expect, and what they’re sensitive to. Some of these techniques are going to be much more sensitive to outliers or missing values. And there’s fundamental base assumption that each of these techniques require to drive results. And that’s really going to drive how you handle the data quality issues. And whether the particular model you’re contemplating using is the right approach given what you know about the information.

40:03 I completely agree. And let’s– Hayley turn it back to you and open it up, potentially, to questions from the audience here. Seeing if–

40:12 Great. So thank– Yep. Thanks, Jeremy, and thanks, Todd. Great presentation today. As a reminder for those on the line, we will be sending a recording of today’s presentation within the next few business days. So be sure to look out for that, and then I’ll go ahead and open it for Q and A. We also have Dylan Cotter, RapidMiner Director of Channel Sales, on the line to answer any questions directed at RapidMiner, if there are those as well. So, why don’t you go ahead and start? See some questions coming in, but if you have any questions that you want to ask us, definitely go ahead and input those in the questions panel. So I’ll go ahead and ask the first question here. How are regulators responding to the use of advanced analytics within the fraud investigation space? And this one’s for EY.

40:59 Surely. I mean, that’s a very good question. And I would take a step back and say that the landscape’s fundamentally changed over the last several years, to where regulators and government organizations are now investing in data science platforms and data science professionals in a new way, in a way that they previously hadn’t. And that is to be able to better understand the information they’re receiving from organizations and be able to make more targeted requests. So what we’re seeing is that the game is changing. They have higher expectations and more targeted requests from organizations. And, frankly, given what, now, they understand to be the technology landscape, it’s raised the bar for expectations about what organizations can do in this area, and what they should be able to find, and what they should be able to understand. So we’re seeing it drive a lot of change with the organizations that we work with, in terms of being able to harness that information and make sure that they’re able to work with it and to document what they have and have not done with it and why.

42:13 Great, thanks. Another question for you guys. Who typically looks at the results of the cases generated?

42:23 That’s a great question. I’ll start, and then perhaps Todd can chime in as well. What we’ve really seen is, as Todd noted earlier in his remarks, the consumers, if you will, of FDA have really continued to evolve. Often, investigators have a role, attorneys, data science teams, and so on. Really, teams with varying skill sets we’re seeing are often the ones now teaming together and utilizing and playing different roles, rather, where typically forensic accountants or attorneys and so on would often be looking at cases, flagging transactions, entering comments, and so on. Data scientists are then often the ones really kind of looking at the models, fine tuning them and so on, but most importantly all collaborating together. And where we’ve seen that really work well is organizations. And really, at EY, we challenge our teams in this sort of vein as well. We spend time, really on a daily basis, looking through results, walking through things, challenging each other on sort of what’s working and what’s not. And then what’s interesting is, if you look and reflect back, even 5 or 10 years ago, often times on investigations they were sort of compartmentalized, different groups. There was this notion of the data team, the investigations team, the field work team. I think we’ve really seen, and it’s frankly been really exciting to see, is those different roles now come together.

43:58 Great. Thanks. I see another question here for you guys, and that’s, How do you handle poor quality data?

44:07 Sure. Yeah, and I think, as we’ve noted a few ways of doing that, I think poor or bad quality data certainly should not be a reason not to invest in forensic data analytics and its associated technologies. There area a few ways of getting around that. There are some actually great capabilities in RapidMiner, processes which we’ve been able to put in, and it’s our processes, for instance, the ability, the impute function. We use that quite a bit, of course, with the right guardrails, right? As well as the capability to fill data gaps, so often times we’ll receive sourced data from clients and certain records don’t exist in those reports, because. essentially. the values are zero or negligible. or immaterial. But certainly, from a modeling perspective, we need that continuity, and so we’ve been able to use the ability to fill in those data gaps within the RapidMiner process as a way to handle that. So, certainly, as we think about kind of how to deal with poor quality data, there are technologies, there are capabilities that we’re looking at. It remains a challenge, but I think we’ve seen a real acceleration in the technologies and capabilities, and particularly tools like RapidMiner that can kind of help us to at least mitigate the adverse impacts of poor-quality data.

45:40 Great. Another question here, I believe this is about how to integrate RapidMiner in your application? So, Dylan, if you want to hop in for this one, this person is asking, Can you really describe the architecture behind how analysts input as model parameters are integrated into the RapidMiner processes?

46:02 Yes. I mean, there’s a couple of ways of integrating, and I think Jeremy highlighted some of this in the last screen, but RapidMiner exposes the process as a web service endpoint, so that’s one way to integrate. You can pass parameters and have it be bidirectional, or you can have a process run asynchronously, and then have a front end pull results in, so those are kind of two approaches there.

46:02 Yes. I mean, there’s a couple of ways of integrating, and I think Jeremy highlighted some of this in the last screen, but RapidMiner exposes the process as a web service endpoint, so that’s one way to integrate. You can pass parameters and have it be bidirectional, or you can have a process run asynchronously, and then have a front end pull results in, so those are kind of two approaches there.

46:54 Thank you very much. Thanks, everyone.

Related Resources