When seeking to grow the adoption of analytics in the organization, even successful projects sometimes struggle to gain organization-wide visibility to build momentum for additional applications. This can be alleviated through choosing the right people to fill all the necessary roles on a strong analytics team. One of these roles, the business translator, is especially helpful in identifying opportunities to use advanced analytics to solve problems.
In this 30-minute webinar we’ll cover the skill sets you want on your analytics team, as well as some helpful tips to formalize a process to identify and score use cases, including:
- The role of the business translator in identifying valuable opportunities
- Conducting workshops focused on business value to start analytics projects off with the right question
- Considerations for scoring and prioritizing use cases aligning to organizational readiness
Hi everyone. Thanks for joining the RapidMiner webinar with Clarkston Consulting about inside driving action, the role of the business translator in choosing a use case. So I’m Maggie Seeds and I’m the service lead for our analytics team at Clarkston Consulting. And our team partners with clients to identify and prioritize use cases for advanced analytics in their organization. We offer a variety of services under the data and analytics umbrella including governance, data architecture, and how to set up an analytics function within the organization. We do a lot of data profiling and data understanding early on to enable us to ask the smart questions and really drive the business value and our goal as a team is always to turn insights into action and we look for every opportunity to automate and be prescriptive. We also do manage services for those clients who are looking to offload some of the technical aspects of model maintenance. So for anyone unfamiliar with RapidMiner, it’s a unified data science platform. And we partner with RapidMiner because it’s a great tool for analyst organizations of varying maturity. It’s a drag and drop tool and it doesn’t require any coding so you can build things really really quickly. It has hundreds of machine learning and advanced analytics operators built-in which provide depth for data scientists and is also simple for others to learn and upskill. We also like it for collaboration, since it’s a visual programming language. So it’s really easy to modularize your work and there are no black boxes.
What we’re going to cover today is, who makes up a great data science team? And how can we identify use cases that will set the entire analytics function up for continued success beyond a single project? We see a lot of clients that struggled to translate the success of one project into a different use case and have difficulty thinking outside the box sometimes to identify additional opportunities for analytics. So I’d like to talk through our approach in hopes that it can spark some ideas about how to formalize the data science process at your organization including the initial phase to choose a use case. So let’s start with who should be on the team. We see data science as a team sport with each member bringing a different skill set to the team. The roles you’re looking at don’t necessarily require their own individual person, but all facets should be covered for a solid project. So your business analyst could also be your data analyst, and you may find that your data scientist takes on a lot of engineering tasks. It’s a very normal thing. So just to go around the circle here, the data engineer is responsible for preparing the data and I don’t mean making the data model ready, more about making the data pipelines available to the team to be consumed, working through data quality issues, and any pre-processing that should be applied across the whole organization for the sake of consistency. The data scientist will build models using advanced techniques and statistics and they’ll often work with the business analyst to create new data that will help improve modeling and they will test and verify results to be presented back to the team. The data analyst’s role is about understanding the data. The first phase of any analytics project should be about profiling the data to look for patterns, identify data quality issues or outliers, and work with the team to address them. Swooping down the business analyst understands the data context and knows the current state as well as the improvements that the project should create. They can help with locating data owners and understanding each use of each piece of the data. And the decision-maker is an important role in the team that often gets left out which is the leading reason that many projects fail to launch. Someone with decision rights needs to be involved in the project for it to get off the ground and for actions to be taken which can create the anticipated value. The translator is an interesting role that I hear discussed a lot and this role is to communicate between a technical and business group so that opportunities are recognized and translated into potential projects. And we’ll talk a little bit more about the business translator role in a little.
I wanted to also cover up on some additional support roles needed for analytics projects. So here on this slide we’ve got– mostly, these are support pillars and could be shared services in the entire organization. But just to run through these, a solid data architecture is required for a production-solutions upscale and you’ll also need someone to administer the server and helping get models into production. Once you’re ready to take them all to production, technical developers can help build custom integrations with APIs or build into existing processes and to actually bring the models into the real world. Business intelligence is a skill set pretty vital in translating analytics outcomes into the language of the business and into metrics. And finally, here, model maintenance is required for all models in production to monitor drifts, troubleshoot issues, and maintain the long term sustainability of your models. So RapidMiner has a great suite that offers tools for each role in the team. Like I mentioned, it’s a unified platform to take data through each step from data prep to modeling to productionizing on the server. And your data engineer will be excited for data prep. Your business analysts can rapidly build models the right way using Auto Model as they upscale. And your data scientist can build complex models and take them to production on the server. And finally, your decision-makers can access the model outcomes in real-time through APIs.
So I want to come back to the importance of the business translator role in data science projects particularly today in this webinar around choosing a use case. We regularly talk with clients about the same complaint that they aren’t sure how to get started with advanced analytics or they aren’t sure how to turn one single project success into a strong analytics function. So this is where the business translator comes in. This is a person that’s the bridge between the technical and business team and they’re especially skilled at hearing business challenges or pain points and then describing or designing a solution using data. They’re also important throughout the project. They help to maintain a focus on the business value and success metrics. Many times I’ve seen or been part of technical teams that get really heads down in their work for weeks at a time, going down rabbit holes, and losing sight of the big picture. It’s what we’re good at, exploring and finding interesting things, but having that regular check-in for the project goals drives momentum and keeps the work focused on that business value. And just to add, the business translator is also vital at the end of a project to translate the outcomes into an actionable roadmap and to confirm those success criteria.
So now that we’re comfortable with the roles on a project, let’s walk through a few examples of people you might see in your organization and how their skills align to analytics roles and RapidMiner tools. So this is Amy, our IT application engineer. Amy is well versed in system architecture and she’s got a lot of pull in IT by representing the team to the rest of the business. She’s an expert in data architecture and knows what the systems are capable of. And Amy can help with– we see Amy helping with data engineering and would also likely make a good translator and decision-maker for the IT group. She’ll be great as a server admin and coordinate architecture and any technical development. And in RapidMiner, Amy will mostly interact with the server and handle model deployment. And she’ll be really excited to play around with Radoop and deal with data prep from the engineering side. Moving on to Pam. Pam is a sales operations manager. She oversees a team reporting on sales and marketing and has her own budget. She regularly presents to leadership on key themes and metrics and is skilled at tailoring insights to her audience. She has a solid understanding of the data structure. So we think Pam will make a great translator and decision-maker. And she will likely interact with RapidMiner through results and outcomes, typically presented through integration with APIs or visualization tools that she can then take back to leadership. And lastly, Ryan here is a sales operations analyst. His background is in data analysis and Excel and other tools. And he’s used to reporting on and interpreting sales metrics and doing ad hoc reporting. He has a general sense of motivators for stakeholders but isn’t directly involved in leadership meetings. And so Ryan we would already consider a data and business analyst and we can upskill him into a data scientist role. And he can also help out with business intelligence and participate in model maintenance as the business analyst to help validate success criteria over time. We think Ryan will love Auto Model and Turbo Prep as quick tools to help him navigate the best practices around data prep and moving into machine learning.
So thinking about various uses of the tool, I wanted to demonstrate a little bit about how we at Clarkston use RapidMiner to create reusable data prep. So this is an example of the data profiling that we start every project with. We want to look for patterns in the data, identify strong attributes that will help us predict our target, and start to visualize the data a little. And this activity is similar across similar types of problems and doesn’t need to be recreated every time. So this is an example of a binary classification problem, predicting whether or not a customer referred new business. You can see here, we’ve created some steps to aggregate totals, discretize variables, and loop through each attribute dynamically to compare results. We’ve also got this weight by correlation operator at the top that we’ll talk through in a second. So on this side, you can see some of the results and what we’re looking at is the count. You can see in that count column there, the count of yes versus no referrals separated out by age group. So this is where we discretize the age into different bins of 10 years. And we can see looking at this that the 60 to 70 age group made up the highest percentage of both yes and no responses. But that the second-highest count of yes responses was in the 70 to 80 age group while that group was lower for the no responses. And so starting to do some of this analysis, looking at different percentages, looking at the numbers in different ways, we can see that age might be an important factor in our model. And I also mentioned that weight by correlation operator, and so that runs through the data and for each attribute in the data, which we see in the column on the left here, measures the correlation with our target variable, which, again as a reminder, was whether the customer referred business or not. And so sorted by top correlation, we see here that whether or not they were a loyalty team member is the most correlated with a referral, which makes sense, followed by age, and then age when they joined as a customer, which is actually a field that I calculated for this dataset. It’s simple to share this process with my team so they can save time in their next classification project and that we all are on the same page with best practices for data profiling that we’ve established.
Now, shifting gears to choose a use case. We recommend taking a data product approach for organizations working to build out advanced analytics. This enables a focus on business value and proving out your work while growing important capabilities. So we start with choosing a use case and like to conduct a business-value workshop, which we’ll cover a bit more in a bit. And we start projects with a readiness check as well which identifies any gaps in technology, organization, or other areas. It’s not necessary to solve any problems identified during that readiness check before you can get started with any of your analytics, but it is necessary to be aware of them as you progress toward a fully-scaled solution. So next we do a proof-of-value project either an experimental model or a pilot and then scale, finally, to a full data product. And as you go through this progression, you layer in that strategic alignment, the process, and organization, and then the technology infrastructure, so that you work through any of these issues before actually fully scaling your data product. So what’s the difference between an experimental model and a pilot you say? They’re both part of a proof-of-value and aim to vet a solution before investing in a full-scale product. They differ in that an experimental model is built off a static data extract and not necessarily connected to full data pipelines and that the output is really more to demonstrate value than to integrate with a live production output. And it’s really great to focus on solution design and bringing the value back to the business. On the other hand, a pilot fully integrates to both data source and a production output and it’s great for illuminating any issues that you may have in data engineering in your skillset across the organization in your data science process and in your technology stack. And it’ll actually help you formalize your data science process as going from end to end including things like cut over and security and training and all that good stuff.
In order to find a use case, we want to find a question that when answered a specific action can be taken that will produce a measurable outcome in line with current business goals. So this is always what we’re working for in that business-value workshop. So we like to tackle this portion of the project through this business-value workshop which incorporates design thinking. So even if a client feels like they know which problem they want to tackle, it often helps to refine the questions and connect the problem to the business value. So the exercise goes like this. You have your team individually and anonymously write questions on Post-its to challenge the way that they do things and ask questions. We give it time and then we read each question aloud and let them think in in the group. And then we go through and group the questions into similar topics to help organize the team’s thoughts and narrow down the topics covered. This is a really good way to get ideas flowing and not to be restricted to the way we do things or it’s always been done that way which we hear a lot.
So as an example, we had a consumer-product client that was looking to transform their business strategy to focus on a new customer segment. So we conducted a design-thinking session with our executive team to challenge the status quo and to target three top-priority analytics problem statements that we could go after as individual projects. So some of the questions that they asked us here were, why aren’t we more focused on our target customer’s buying experience? Why won’t our target customers buy online? Why do we believe that our target customer wants a relationship with us as a manufacturer? Why don’t we incentivize retailers to share POS data? Why don’t we have more products in the portfolio that our target customers need? So after the session, the problem statement that we crafted from these grouped questions was, how can we use the data that we already have to find out what different customer segments are buying and actually identify our new target customers in the data itself? And so this would allow them to have visibility into the growth of their target segment. And track the effectiveness of their overall strategy through the data.
So moving on to defining success which we think about in three ways: what is the ROI, defining the output upfront, and defining what’s good enough to deploy. So defining ROI and the benefits to your business of your data science project is so helpful in evangelizing data science throughout the organization, whether it’s a new team or new tools or new methods. So it also helps move away from slow-moving or low-impact projects. So we want to apply a scoring methodology to the top couple of use cases that emerge in order to prioritize them. This identifies which projects to tackle first and also, gives the next step or roadmap to create that momentum for the next use case. You also want to identify the data and people that you need to get started on your proof of value. So consider the roles we discussed earlier, and determine who will be the decision-maker or data product owner with the ability to get it into production and that person is your owner. Engage all of your stakeholders and let them know that their expertise may be called upon during the project.
And then finally, you’ll want to define what success looks like. What is your minimum viable product? What may change in the organization when we answer the question that we’re going after? So this is where you’re going to actually turn those insights into actions and really drive change and business value. A few things we consider when we score use cases are business value, impact, and feasibility. So business value is the kind of obvious one when we talk about ROI. So what is the value back to the business in metrics that the organization uses? So note here that this may be tricky to nail down sometimes if there is no current state to compare to. For example, if you’ll be doing predictive analytics for the first time, then you’re going to want to consider a list instead of comparing actual metrics. And then, you’ll also need to consider the time that it will take to achieve that business value. Impact is kind of the wow factor. So what is the positive impact to the whole group or organization and the timing in which that greatest impact can be realized? And we run into this with a lot of clients where a cool project is done but doesn’t get a lot of visibility to the rest of the organization because it really only affected one small team. And then we like to leave feasibility last on purpose to allow for open-minded brainstorming. But at the end of the day, we also need to know what we’re capable of building and how long it’ll take. So this includes the availability and quality of data sources, whether we have the right technology stack, and whether we have people with the right skills to accomplish our project, as well as how long it will take. And this is typically the kind of final stage gate for scoring our project.
Once we know our top use case, we want to hone in on the problem statement. So we want to choose a question or problem that’s clear and objective. Going after the “best results” may mean different things to different people. So be clear in your definitions and be objective. We try to avoid asking questions like, what’s the value of this data set? Trying to kind of start with the data set to drive the use case because we find that that often ends up with kind of a misalignment of the business value and what we’re trying to do. So it’s really better to start with the question and then identify the data needed to answer it or support. We also need to choose the right scale. Remember this is a proof-of-value, not a fully-scaled solution. So consider if you want to segment by product, by region, or something else. So you want to make sure you have enough training data to build a solid solution. But you also want to decompose the problem into the different business scenarios that are most independent. And finally, you want to choose the right granularity. This can become a pretty technical discussion but you need to make sure that your data is at the correct grain to match up. So for example, trying to match transactional sales data with minute by minute foot traffic. So it is partially an engineering question and where you’re sourcing the data, but it also, again, ties to that business value. So you want to base your granularity off of the decisions that you will make with the data. So if you’re making staffing decisions, for example, at a weekly level about a month out, then you don’t need a daily forecast. So just consider how often decisions are being made and base your model and design off of that.
So now that we’ve got our fully-formulated problem statement and know who’s going to help us go after it, we want to conduct a readiness check. So this will check things like business readiness and risk awareness to identify if we need to incorporate change management technology and data readiness to scale out our technical solution and data pipeline and roles and responsibilities to identify any need for training or shifting the organization. Again, you don’t need to bring your project to a halt if every single one of these elements isn’t perfect. You can still get going and work through proof of value projects and then prioritize these other support pillars as you build toward that final data product. So that concludes my remarks for today. Thank you for attending today’s webinar and I hope it was informative. If there are any questions, please leave them in the comment box and we’ll get back to you with responses. Otherwise, here is my contact information, as well as Hayley from RapidMiner. And thank you and have a great afternoon.