2  Doing Good Science

2.1 A Scientific Approach to Problem Solving

While it is important to learn the methods detailed in these guides (and more), the greatest emphasis should be placed on learning how to use the scientific method to approach problems. Approaching business problems scientifically, trying to build theories based on our observations about the subject being studied, is often overlooked, as this is in many ways the most complex part of the process. It is also less exciting than learning new ways to analyse data, so for someone that is new to data science, it is quite easy to de-emphasise this and focus on doing the fun parts instead.

These issues are exacerbated by the fact that any data science education, whether formal or self-taught, is often focused on the methods and tools required for data science. There is less explicit focus on the “process”, but this is because the process is taught implicitly through the methods. Teaching the various methods and tools that a data scientist needs to know is time-consuming and complex, and in the process of learning and applying the methods, the student will also learn the scientific approach to problem solving. That said, once someone has mastered the methods, they are (relatively) east to apply to real-world problems, while the process of doing science the right way is more difficult, and requires careful consideration for every practitioner, no matter how experienced they are.

2.1.1 Framing the Problem

All of this ties into the same questions that should be asked when identifying data science opportunities. But having identified these opportunities, it is important not to skip to identifying a cool method for solving the problem. Instead, focus on the problem itself, and then identify the method(s) that will be most appropriate for answering it.

A simple, concise approach to framing the problem and maximising the value of data science is to use Tom Carpenter’s suggestions below:

If you follow these five steps detailed by Tom, you stand a good chance of approaching the problem in the right way, generating real business value, and really getting to the heart of the problem you are trying to solve.

2.1.2 A Simple Framework for Data Science

The guides here obviously focus on the methods, but before diving into the code, I think it is important to highlight the importance of learning to implement these methods appropriately, and approaching data science in the “correct” way. This means understanding the problem, generating theories about how to solve it, identifying the data that is needed to test these theories, and building a robust model that is informed by your theory and that is appropriate for the data you have. Before jumping into the code and the iterative process of model building, it is important to take the time to think carefully about your theory, the variables that are relevant to the phenomenon you are studying and the causal mechanism that guides it (or the data generating process), and your expectations from your model. Only once you’ve thought carefully about all of these things should you start to think about the method(s) you should use.

The order of these steps is important, but the right order is dependent on context. In an ideal situation…

For anyone that is analytically minded, it is very easy to fall into the trap of hyperfocusing on the cool methodological approaches you could take to solving a problem. This is something that I regularly have to check myself on, because I find it so easy to get excited about the fancy methods I could apply to a particular situation. However, despite the fact that the methods are often the most fun part of the job, the fact remains that a very fancy model won’t get you very far if you haven’t spent a lot of time thinking carefully about the problem that model is trying to solve, gathering the data that is needed to solve it, and preparing the data in a way that maximises the model’s performance.

The easiest way to avoid falling into the same trap that I regularly slip into is to treat the methods as a means to an end. The goal should always be to build a solution to a specific problem, with a clear understanding of what “success” looks like in that context. And if you also get to make something fancy in the process? Well isn’t that exciting!

Data plays an increasingly important role in business decision-making nowadays, due to the technological advances that have made it possible to collect and analyse large amounts of data. This is not just true of tech companies, but also of traditional businesses in sectors such as retail, manufacturing, and healthcare. The NHS is increasingly embracing the role that data plays in driving decision-making, improving clinical services, and, ultimately, improving patient outcomes.

But as more data becomes available, the challenge of making good use of it becomes even more difficult. The old methods are still useful, but they are no longer sufficient. In order to make the most of the data that is available, the NHS and SCW need to be able to identify when there are opportunities to use data science to improve the services that they provide. Recognising these opportunities can be challenging. It requires not only a good understanding of what is possible with data science, but also a proactive approach to identifying when it is possible to do more with data than is currently being done.

While the other guides in this project will hopefully give a better understanding of what is possible with data science, this particular guide is intended to help anyone frame business problems, whether internal or external, in terms of data science, and to identify opportunities to develop more advanced solutions to those problems.

2.2 Identifying Data Science Opportunities

A lot of the analytics work that is done at SCW tends to be descriptive in nature, and takes the form of regular reports, dashboards, and Excel spreadsheets summarising data. These descriptive approaches are and always will be valuable, but there is also a limit to what descriptive analytics can tell us. Descriptive analytics can tell us what has happened, but it cannot tell us why, nor can it tell us what will happen in the future. When a piece of work is concerned with explaining past events, or predicting future events, there is an opportunity to use some of the data science techniques that are covered in these guides.

Identifying these opportunities is not always easy, however, because requests for work will often come in the form of requests for a certain type of solution to address a business problem, with expectations of what that solution should look like. Customers and colleagues will often have a clear idea of what they want, because that have seen or used similar solutions in the past, and this limits the scope for identifying data science opportunities. If we always provide the exact solution that is requested, without considering whether there is a better way to solve the problem, we will miss out on opportunities to extract more value from our data products.

It is important to understand the business problem and the solution that is being requested, but it is also important to identify why a certain solution is being requested, and how the customer or colleague plans to use it. For example, a request for a dashboard that shows utilisation of a particular service in the past twelve months is not necessarily a request for a data science solution. It is possible that the reason for wanting this dashboard is to help plan how to meet future demand for that service, but it is also possible that the reason for wanting this dashboard is simply to understand how the service is being used. While a dashboard will offer valuable context, it is possible that a machine learning model that predicts future demand might be more useful.

2.2.1 Asking the Right Questions

Designing data science solutions to business problems is a process of discovery. The best way to identify potential data science work is to ask questions.

There is no foolproof set of questions you can ask, and no perfect framework for designing data science solutions, but there are some questions that are more likely to lead to useful insights than others:

2.2.1.1 Identifying the Problem

  • What is the problem you are trying to solve?
  • What is the business impact of solving this problem?
  • What is the current solution to this problem?

2.2.1.2 Considering Solutions

  • What do you think the solution to this problem should look like?
  • Why do you think this approach is the best way to solve this problem?
  • How will you use the solution to this problem?

2.2.1.3 Measuring Success

  • How will you know if the solution to this problem is successful?
  • How is success defined and measured for this problem?

2.2.1.4 Identifying Feasible Approaches

  • What data do you need to solve this problem?
  • What data do you have that might be useful in solving this problem?
  • When does this piece of work need to be completed by?
  • How often will this piece of work need to be repeated/updated?

2.3 Next Steps

Although the data science guides focus, for the most part, on the implementation of various data science techniques, developing SCW’s data science capabilities is more than just upskilling analysts and equipping individuals with the tools to do this work. Perhaps a more important part of this process is helping a much wider group of SCW colleagues to think about data science, and to recognise opportunities to use data science to solve business problems. Hopefully the guides will help SCW colleagues to understand what is possible, and this guide will help them to identify opportunities to do more with data using these techniques.

2.4 Resources