
Linear Regression
In this blog, you are going to learn
- What are Linear Regression Models?
- How to implement an Ordinary Least Square model?
- How to implement a Ridge Regression model?
- How to implement a Lasso Regression model?
- How to implement a Multi-Task Lasso model?
- How to implement an Elastic-Net model?
- How to implement a Multi-Task Elastic Net model?
- How to implement a Least Angle Regression model?
- How to implement a Lars Lasso model?
Linear Regression Models
Import the Libraries
The first step is to import all the necessary libraries.
from sklearn import datasets, linear_model from sklearn.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_boston from sklearn.metrics import mean_squared_error, r2_score
Download Dataset
Attributes information
CRIM: Per capita crime rate by townZN: Proportion of residential land zoned for lots over 25,000 sq. ftINDUS: Proportion of non-retail business acres per townCHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)NOX: Nitric oxide concentration (parts per 10 million)RM: Average number of rooms per dwellingAGE: Proportion of owner-occupied units built prior to 1940DIS: Weighted distances to five Boston employment centersRAD: Index of accessibility to radial **highways**TAX: Full-value property tax rate per $10,000PTRATIO: Pupil-teacher ratio by townB: 1000(Bk — 0.63)², where Bk is the proportion of [people of African American descent] by townLSTAT: Percentage of lower status of the populationMEDV: Median value of owner-occupied homes in $1000s
column_names=['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT','MEDV'] data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data" X = pd.read_csv(data_url, sep="s+", names=column_names) X.head(5)
y=X.pop('MEDV') y
Once we have our training set and the target column, we will split the dataset into training and testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)
Ordinary Least Square
The Linear Regression model aims to minimize the sum of squares between the target variable and the predicted value. The model fits the training dataset and finds coefficients for each variable and tries to predict the target variable.
Advantages
- Easy to understand and implement.
- Model is trained in no time.
Disadvantages
- It takes the assumption that all the features are independent.
- Prone to errors when data is co-linear.
- Prone to overfitting.
Let us implement a Linear Regression model on the Boston housing dataset.
ordinary_least_square = linear_model.LinearRegression() ordinary_least_square.fit(X_train,y_train)
Once the model has been trained on the dataset, let’s have a look at the coefficients for each variable.
ordinary_least_square.coef_
These are the values for w1–wn.
predicted_values=ordinary_least_square.predict(X_test) predicted_values
Let’s try to plot a graph between the target values and the predicted values.
plt.scatter(X_test['DIS'], y_test, color="black") plt.plot(X_test['DIS'], predicted_values, color="blue", linewidth=3)
Looking at the graph we can say that our model is able to learn from the dataset.
print("Mean squared error for Linear Regression : %.2f " % mean_squared_error(y_test, predicted_values)) print("Coefficient of determination for Linear Regression : %.2f Ordinary Least Square" % r2_score(y_test, predicted_values))
redge_regression = linear_model.Ridge(alpha=0.25) redge_regression

redge_regression.fit(X_train,y_train) redge_regression.coef_
ridge_predictions=redge_regression.predict(X_test) ridge_predictions
print("Mean squared error for Ridge regression : %.2f" % mean_squared_error(y_test, ridge_predictions)) print("Coefficient of determination for Ridge regression : %.2f" % r2_score(y_test, ridge_predictions))
We are getting a better result in Ridge regression as compared to the Ordinary Least Square.
Lasso Regression
Lasso Regression reduces the number of features upon which the target column is dependent by trying to find solutions with less non-zero coefficient. Its an extension of the Ordinary Least Square method with a regularization term l1.
Advantages:
- Can be used for simple datasets.
- It can be used for feature selection.
Disadvantages:
- Doesn’t work on the Multiple Linear Regression problem.
- Let’s implement a Lasso Regression model on the Boston Housing dataset.
lasso_regression = linear_model.Lasso(alpha=0.005) lasso_regression
Since we have a really small dataset, we have chosen the value of alpha =0.005. You can use various values to find the optimal one depending upon the dataset you are using.
lasso_regression.fit(X_train,y_train) lasso_regression.coef_
lasso_predictions=lasso_regression.predict(X_test) lasso_predictions
print("Mean squared error: %.2f for Lasso regression" % mean_squared_error(y_test, lasso_predictions)) print("Coefficient of determination: %.2f for Lasso regression" % r2_score(y_test, lasso_predictions))
Multi-Task Lasso
It is an extension of the Lasso regression. It selects the features that will be the same across time-period by fitting multiple regression problems together on the dataset.
Advantages:
- Works on Multiple Regression Problems.
Disadvantages:
- Once the features are selected, those features would be the same for all the regression problems.
Let’s implement the Multi_task Lasso algorithm on the Boston Housing Dataset.
The input to the Multi-Task Lasso should be a 2-Dimensional array. We are going to convert the series object to a 2-D array using NumPy’s reshape method.
y_train_2d=np.reshape(y_train.tolist(),(-2,1))
multi_task_lasso_regression = MultiTaskLasso(alpha=0.025).fit(X_train, y_train_2d) multi_task_lasso_regression
multi_task_lasso_predictions=multi_task_lasso_regression.predict(X_test) multi_task_lassso_predictions=np.reshape(multi_task_lasso_predictions,(1,-1)) multi_task_lassso_predictions[0]
print("Mean squared error: %.2f for Multi-Task Lasso regression" % mean_squared_error(y_test, multi_task_lassso_predictions[0])) print("Coefficient of determination: %.2f for Multi-Task Lasso regression" % r2_score(y_test, multi_task_lassso_predictions[0]))
We can see that the Mean Squared Error has been reduced. This means the algorithm is working well for this dataset.
Elastic Net
Elastic Net trains a model with both L1 and L2 regularization of the coefficients. It is useful to work with the dataset where features are co-related to one other.
Advantages:
- It uses both L1 and L2 regularization parameters.
- It is useful when the dataset is co-related.
- It inherits both the features of Ridge and Lasso Regression.
Disadvantages:
- It doesn’t work for Multiple Regression Problems.
from sklearn.linear_model import ElasticNet elastic_net = ElasticNet(alpha=0.001, l1_ratio=0.001) elastic_net.fit(X_train, y_train)
elastic_net.coef_
elastic_net_predictions=elastic_net.predict(X_test) elastic_net_predictions
print("Mean squared error: %.2f for Elastic Net regression" % mean_squared_error(y_test, elastic_net_predictions)) print("Coefficient of determination: %.2f for Elastic Net regression" % r2_score(y_test, elastic_net_predictions))
The ability to use both the regularization parameters L1 and L2 has helped in reducing the mean squared error. These parameters can be adjusted to get the best accuracy.
Multi-task Elastic-Net
It extends all the abilities of Elastic Net with the advantage to work on Multiple Regression problems jointly. Multi-task Elastic net finds the sparse coefficients for regression problems. The target variable y is a 2-d array.
multi_task_elastic_net = linear_model.MultiTaskElasticNet(alpha=0.001) y_train_2d=np.reshape(y_train.tolist(),(-2,1))
multi_task_elastic_net_regression = multi_task_elastic_net.fit(X_train, y_train_2d) multi_task_elastic_net_regression
multi_task_elastic_net_predictions=multi_task_elastic_net.predict(X_test) multi_task_elastic_net_predictions=np.reshape(multi_task_elastic_net_predictions,(1,-1)) multi_task_elastic_net_predictions[0]
print("Mean squared error: %.2f for Multi-task Elastic net" % mean_squared_error(y_test, multi_task_elastic_net_predictions[0])) print("Coefficient of determination: %.2f for Multi-task Elastic net" % r2_score(y_test, multi_task_elastic_net_predictions[0]))
lars.coef_
lars_predictions=lars.predict(X_test) lars_predictions
print("Mean squared error: %.2f for Lars" % mean_squared_error(y_test, lars_predictions)) print("Coefficient of determination: %.2f for Lars" % r2_score(y_test, lars_predictions))
Lars Lasso
This algorithm predicts a solution based on the piecewise linearity created as the function of the norm of the coefficients instead of taking an approach based on the coordinate descent.
lars_lasso = linear_model.LassoLars(alpha=.025, normalize=False) lars_lasso.fit(X_train,y_train) lars_lasso
lars_lasso.coef_
lars_lasso_predictions=lars_lasso.predict(X_test) lars_lasso_predictions
print("Mean squared error: %.2f for Lars Lasso" % mean_squared_error(y_test, lars_lasso_predictions)) print("Coefficient of determination: %.2f for Lars Lasso " % r2_score(y_test, lars_lasso_predictions))
Summary
- LinearRegression() : To implement an Ordinary Least Square model.
- Ridge(alpha=value) : To implement a Ridge Regression model.
- Lasso(alpha=value) : To implement a Lasso Regression model.
- MultiTaskLasso(alpha=value) : To implement a Multi-Task Lasso model.
- ElasticNet(alpha=value, l1_ratio=value) : To implement an Elastic-Net model.
- MultiTaskElasticNet(alpha=value) : To implement a Multi-Task Elastic Net model.
- Lars(n_nonzero_coefs=value, normalize=False) : To implement a Least Angle Regression model.
- LassoLars(alpha=value, normalize=False) : To implement a Lars Lasso model.