Multiple Linear Regression : A Guide
Multiple regression is like linear regression, but with more than one independent value, meaning that we try to predict a value based on two or more variables.
Take a look at the data set given in a file House_data.xlsx, it contains some information about House Prices.
Example Data (Sample Dataset)
Predicting House Prices Based on Size and Number of Bedrooms
A real estate agent wants to predict the price of a house based on two factors:
- Size of the house (in square feet)
- Number of bedrooms
Multiple Linear Regression Equation:
Y = b0 + b1 X1 + b2X2
where:
- Y = House Price (in lakhs)
- X1 = Size of the house (sq. ft.)
- X2 = Number of bedrooms
- b0 = Intercept (constant)
- b1,b2 = Regression Coefficients
You can download dataset file. link is given below.
We can predict the House Price of a House based on the area of the House, but with multiple regression we can throw in more variables, like the area of the House and Number of bedrooms, to make the prediction more accurate.
Steps to Implement :
In Python we have modules that will do the work for us. Start by importing the Pandas module.
import pandas
The Pandas module allows us to read csv files and return a DataFrame object.
We can create a dataset of house and store it in a csv file. In this case “House_data.csv”
df = pandas.read_csv(“House_data.csv”)
Then make a list of the independent values and call this variable X.
Put the dependent values in a variable called y.
X = df[[‘House_size’, ‘No_bedrooms’]]
y = df[‘Price’]
Note : It is common to name the list of independent values with a upper case X, and the list of dependent values with a lower case y.
We will use some methods from the sklearn module, so we will have to import that module as well:
from sklearn import linear_model
From the sklearn module we will use the LinearRegression() method to create a linear regression object.
This object has a method called fit() that takes the independent and dependent values as parameters and fills the regression object with data that describes the relationship:
regr = linear_model.LinearRegression()
regr.fit(X, y)
Now we have a regression object that are ready to predict House Price values based on a House’s size and number of bedrooms:
#predict the House Price of a House where the size is 1800, and the Number of bedrooms are 4:
House_Price = regr.predict([[1800, 4]])
Python Program
import pandas
from sklearn import linear_model
df = pandas.read_csv(“House_data.csv”)
X = df[[‘House_Size’, ‘No_Bedrooms’]]
y = df[‘House_Price’]
regr = linear_model.LinearRegression()
regr.fit(X, y)
#predict the House Price of a House where the house size is 1800 an number of bedrooms are 43:
House_Price = regr.predict([[1800, 4]])
print(House_Price)