How to formulate machine learning problem

Let’s understand how to define and formulate the machine learning problem (for predictive modeling) from a business problem. This structured approach should help you apply the process to most other types of predictive modeling problems at work.


Often in ML teams, you will hear from the business/company departments about the problems and issues they face. Being able to relate the business problem to a machine learning objective will put things in perspective.

So first, let’s learn how to translate a business problem into a Machine learning / Data Science problem. You will use this knowledge to help your team/clients understand how ML can help.

Let’s see this with an example problem.

The problem setting

“Alan is working as Data Scientist for a telecom company ‘Alfavoice’.

‘Alfavoice’ has a large presence in multiple countries and constantly struggles to get its customers to stay with the brand. They want to prevent them from switching to a different operator in a competitive market.

Brad, Director of Marketing, says the company spends a large marketing budget for customer reach outs, yet they often end up calling the disinterested customers, potentially being perceived as spam. This has affected the brand image in the past.

He brings up this problem in a department meeting that Alan happens to attend. Alan, being a Data Scientist, feels he can help Brad, and the company, to optimize marketing costs for customer retention, and contribute to improving the brand’s image.”

Before proceeding, let’s suppose you are in Alan’s position, take a minute to think about how you would tackle this problem.

How to translate the business problem to a Data Science problem?

Rather more accurately, how can we translate this Business problem into a “predictive modeling” problem?

Can this be solved using ML? If so, how?

But first, we need to first figure out what exactly to solve in order to fix Brad’s problem. Let’s make this more structured by approaching it methodically in 5 smaller steps.

Step 1: Identify the business pain point(s).

While you identify the pain point:

  • Understand how solving this pain point going to benefit the business.
  • What could be the potential root causes of that problem?
  • Know what can be controlled by the company and what cannot.

How to apply this to the current problem statement?

In the current problem, I see two pain points: (There might be more.)

  1. High marketing spends, a part of which seems to be ineffective and probably harming the business.

  2. Spamming customers/prospects, causing a negative brand image.

Step 2: Translate to a quantifiable metric that can be predicted.

Now, let’s think about what can solve these issues.

Not all customers Alfavoice targets are thinking about switching to a competitor. Contacting all existing customers for feedback/surveys etc will likely create a bad experience.

So, we want to know who exactly are our target customers. We want to know who are the customers who are already thinking to stop buying from the company for various reasons, that is, we want to know which customers are ‘churning‘.

Would that address the problem?

Probably, yes! Because, you will have lesser people to target (lower cost), and reach only customers that matter (avoid spam).

To solve this, can we build a model that can predict if a given customer will churn? In other words, for every customer, we want to know the ‘probability of churn.

But that seems like a complex metric to measure?!

The concept AND definition of churn can vary from business-to-business. For example, a telecom company might decide (based on business intuition) that any customer who hasn’t bought a plan for more than 5 days of expiry may be considered as ‘churned’, for subscription services like Netflix, those who have not renewed the subscription may be considered as Churned.

Nevertheless, customers who meet the criterion, set by the business, may be marked as ‘Churned’. However, what we want to know right now is: who are the customers who are at the risk of churning, before it happens?

Can machine learning predict that?


If ML can learn the common patterns from ‘Churned’ customers, it can learn to find such patterns from the existing customer base and potentially predict those at ‘at-risk of churn’. This type of problem is quite popular in Data science and is called ‘Churn Modeling.

Step 3: What data if exists can help predict the response?

The data that influences the value of the response can also help predict it. Can you think of what data if present can help us tell if a customer will buy in the future or not?

You need not have concrete proof that so and so data can predict initially. Just a mind intuition would suffice initially.

For example: To predict if a given customer is about to churn, the following info might help:

  • The last time he/she initiated contact
  • Time since logged in to the website
  • Browsing activity
  • Age, gender, purchase history, etc

Such data can offer some insight into buying patterns and thereby help predict the probability of churn.

How does this help the business?

The marketing team can craft a specialized campaign for a specific type of customer instead of targeting everyone. This may significantly increase the chance of winning and lower the cost.

Step 4: Use machine learning to predict the response

Now you know what to predict and what variables can help predict it. All you have to do now is to use machine learning techniques to predict it.

How do we go about applying ML?

To do this we will use a dataset that contains information about each customer (both churned and active). The dataset in the below image contains data about a company’s customers and if the given customer has churned or not. See the ‘Exited’ column in the below image.

Machine learning (ML) learns the patterns that are typical to churned (‘Exited’) customers and uses that knowledge to predict if an existing customer is at risk of churning or not.

That is exactly what we will learn over the next 14+ lessons. In very clear detailed steps.

Let’s check your understanding

Business problem Statement

“Zedia, works as Data Scientist for an online real estate company ‘Grabhouse’ which allows people to buy/sell real estate properties.

The company earns money whenever a deal is completed via the platform leads. Grabhouse has over a million properties listed on its portal but feels a large part of the potential is untapped.

Martin, who takes care of Product Marketing notices a common pattern for properties that take a long lead time to sell: Properties that take a long time to sell were originally overpriced on the platform that the owners take several weeks time to lower the prices down to realistic levels that their property can actually sell for. Martin would prefer the owners quote reasonable prices at the time of posting their property listing.

Zedia feels this problem can be solved using machine learning, helping Martin close deals quicker, and thereby helping Grabhouse sell more real estate and make more money.”


  1. What is the business painpoint?
  2. What is the value (variable) you want to predict?
  3. What data do you think can help predict it?

Submit your answer: Here

[Next] Lesson 2: Setup Python Environment for ML

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science