Understanding linear regression.
Let’s understand what linear regression is all about from a non-technical perspective, before we get into the details, we will first understand from a layman’s terms what linear regression is.
Now, linear regression is a machine learning algorithm ml algorithm that uses data to predict a quantity of interest, typically, we call the quantity of interest as to why we want to predict some item and we call that as well.
So, it is basically an algorithm that uses data, it actually learns from the data to predict a quantity of interest, what if these quantities be some examples are crop yield, product sales, price of the house mileage of the cars, there are so many other examples in the real world where you can use linear regression to predict them.
Now, what is the common thread amongst this what type of variables can you use linear regression to predict?
Now, if you look at these four variables, all of these are numeric variables. All of these are numeric variables. So, the only or the main condition for you to use linear regression is it must be a numeric variable all the values of these variables must be numeric linear regression cannot predict any categorical values, so you should be clear.
Now comes another question, how can a machine learning algorithm like linear regression possibly predict something like the crop yield? Or the price of the house? Right?
How can a machine learning algorithm even estimate something or predict something like the price of the house?
Now, if you think about it, a human expert would look for things like the area of the house, the locality in which the house is located, the road worth in front of the house, the schools nearby things like things that are helpful in predicting what the housing prices in that particular location, you need to provide such information to the machine learning algorithm also, in order for the algorithm to learn the relationship between your inputs that you are providing the algorithm to the output that you expect out of the algorithm.
And as a data scientist, you will need to figure out sometimes work with experts to figure out what sort of data will be helpful in predicting the quantity of interest.
For instance, in housing prices, we are talking about school, sometimes how to measure how a school plays an impact on housing prices, you might want to collect data, such as the number of students in the school of students the rating of the school number of teachers in the school, some information related to how good the school is as well.
Now, in this case, you might not be very sure whether this particular variable will be helpful in predicting the price. But we know to a certain extent that the number of students studying in school, the number of teachers employed in the school somehow relates to how good the school is if you don’t know for sure, but you have an idea, okay, this variable might be helpful, it is still good to collect that variable. And let the algorithm figure out if it is useful or not.
We have also seen in our expiratory Data Analysis course like how to study this relationship between the variable the variables that you have in the data, and the variable that you want to predict. So it is on us to decide what variables will be useful.
Let’s do an exercise, let’s try to come up with what sort of data you would go ahead and collect in order to predict the crop yield.
Likewise, try to come up with variables or data that might be helpful in predicting the demand for a product. Pause the video for a sec, and give it some thought.
And I will show you some of the items that I have in mind that will probably be helpful in predicting which are killed. Let’s have a discussion on that.
Alright, so here’s my list to predict crop yield,
I would look for things like the amount of rainfall, the soil type, fertilizer used, irrigation type, what is the planting distance between each plant, and things like that.
So these are some of our tentative variables that you might want to collect in order to predict what the clock will is going to be.
Likewise, to predict the demand for the product in the future, you need to know what the past demand pattern of that product is.
Likewise, what day of the week it is that you want to forecast, what is the month of the year price of the product discount ad spend, what sort of discount the competitors give what’s the pricing information for the competitors, this kind of variables will also be helpful.
Likewise, macroeconomic variables like the population, the average age of the population, and income level of the population, these all different types of variables that fall under economic macroeconomic variables.
So overall, the core idea is you need to go after variables or features that might be helpful in predicting the variable of interest. That’s the entire idea behind this. And once you have that linear regression, learn the relationship between the Xs that you have collected and the Y that you want to predict.
It learns this relationship by forming a mathematical equation that forms that predicts y as a function of all the different x’s that you have collected.
So given the X, you will be able to predict the Y. From the next lesson onwards we will start getting more technical and try to understand the math of the equation, the intuition behind it, and everything we will understand in full detail.