Introduction:¶

VW cars

In this analysis, I have done a basic EDA of features and I have selected k-best features out of both linear features and from polynomial features and have applied regression on top of it to find the maximum r_squared value that I am able to acheive from the data.

Introduction
Importing dataset and exploration
Exploratory data analysis
Pre-processing for modeling
Modeling
Backward selection for variable selection on linear regression
Polynomial features for modeling
Conclusion

Importing the packages needed for the analysis. I usually like to import the packages in the alphabetical order, so that it is easy for reviewing if needed

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import seaborn as sns

from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVR

import statsmodels.api as sm

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max.columns', None)

Importing dataset and exploration¶

There are many files in the input folder for each of the car brands. We will import the file that is with VW naming on it.

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/used-car-dataset-ford-and-mercedes/vw.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/bmw.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/merc.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/cclass.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/ford.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/unclean cclass.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/hyundi.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/vauxhall.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/audi.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/skoda.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/toyota.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/focus.csv
/kaggle/input/used-car-dataset-ford-and-mercedes/unclean focus.csv

data_vw = pd.read_csv("/kaggle/input/used-car-dataset-ford-and-mercedes/vw.csv")
print(data_vw.shape)
data_vw.head()

(15157, 9)

Seeing if there are any missing values in the records

data_vw.isnull().sum()

model           0
year            0
price           0
transmission    0
mileage         0
fuelType        0
tax             0
mpg             0
engineSize      0
dtype: int64

Nice :) it is a nice and clean data, very good one to work with!

data_vw.describe()

Exploratory data analysis¶

sns.countplot(data_vw["transmission"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d7d2a6d0>

Most of the cars on the dataset are with manual transmission with very few cars in automatic and seim automatic transmission

print(data_vw["model"].value_counts() / len(data_vw))
sns.countplot(y = data_vw["model"])

 Golf               0.320842
 Polo               0.216863
 Tiguan             0.116448
 Passat             0.060368
 Up                 0.058323
 T-Roc              0.048360
 Touareg            0.023949
 Touran             0.023224
 T-Cross            0.019793
 Golf SV            0.017682
 Sharan             0.017154
 Arteon             0.016362
 Scirocco           0.015966
 Amarok             0.007323
 Caravelle          0.006664
 CC                 0.006268
 Tiguan Allspace    0.006004
 Beetle             0.005476
 Shuttle            0.004025
 Caddy Maxi Life    0.003893
 Jetta              0.002111
 California         0.000990
 Caddy Life         0.000528
 Eos                0.000462
 Caddy              0.000396
 Fox                0.000264
 Caddy Maxi         0.000264
Name: model, dtype: float64

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d7871c90>

Top 3 cars are Golf, Polo and Tiguan on the dataset constuite 64% of all the VW cars, with all other cars contributing to 36%

sns.countplot(data_vw["fuelType"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d5734a90>

sns.countplot(y = data_vw["year"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d565d750>

plt.figure(figsize=(15,5),facecolor='w') 
sns.barplot(x = data_vw["year"], y = data_vw["price"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d55e16d0>

The recently manufactured cars (year = 2018, 2019) are sold for more average price when compared to the cars that are manufactured earlier.

sns.barplot(x = data_vw["transmission"], y = data_vw["price"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d558e890>

plt.figure(figsize=(15,10),facecolor='w') 
sns.scatterplot(data_vw["mileage"], data_vw["price"], hue = data_vw["year"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d54a5050>

plt.figure(figsize=(15,5),facecolor='w') 
sns.scatterplot(data_vw["mileage"], data_vw["price"], hue = data_vw["fuelType"])

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9d54e1fd0>

sns.pairplot(data_vw)

<seaborn.axisgrid.PairGrid at 0x7fb9d535be90>

Now I am computing a age field, subtracting 2020 from the year field and dropping the year field

data_vw["age_of_car"] = 2020 - data_vw["year"]
data_vw = data_vw.drop(columns = ["year"])
data_vw.sample(10)

Pre-processing for modeling¶

I like to use pd.get_dummies option over OHE in SKLearn to get the one hot encoded variables for the categorical variables. It is usually tidy on the dataset and the column names are preserved.

data_vw_expanded = pd.get_dummies(data_vw)
data_vw_expanded.head()

Applying the standard scalar option to standardize all the variables in the dataset.

std = StandardScaler()
data_vw_expanded_std = std.fit_transform(data_vw_expanded)
data_vw_expanded_std = pd.DataFrame(data_vw_expanded_std, columns = data_vw_expanded.columns)
print(data_vw_expanded_std.shape)
data_vw_expanded_std.head()

(15157, 40)

X_train, X_test, y_train, y_test = train_test_split(data_vw_expanded_std.drop(columns = ['price']), data_vw_expanded_std[['price']])
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(11367, 39)
(3790, 39)
(11367, 1)
(3790, 1)

Modeling¶

Selecting best features for model¶

Since ther are 40 variables in the dataset after the one hot encoding, I am using SelectKBest option from sklearn to select the best features from the dataset for applying the regression.

For this, I am executing the SelectKBest() on f_regression by taking into consideration from 3 variables to 40 variables to see where we get the best score.

column_names = data_vw_expanded.drop(columns = ['price']).columns

no_of_features = []
r_squared_train = []
r_squared_test = []

for k in range(3, 40, 2):
    selector = SelectKBest(f_regression, k = k)
    X_train_transformed = selector.fit_transform(X_train, y_train)
    X_test_transformed = selector.transform(X_test)
    regressor = LinearRegression()
    regressor.fit(X_train_transformed, y_train)
    no_of_features.append(k)
    r_squared_train.append(regressor.score(X_train_transformed, y_train))
    r_squared_test.append(regressor.score(X_test_transformed, y_test))
    
sns.lineplot(x = no_of_features, y = r_squared_train, legend = 'full')
sns.lineplot(x = no_of_features, y = r_squared_test, legend = 'full')

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9ce617f10>

We get score of 0.88 around 23 variables befor the curve stablizes. Hence keeping k as 23 selecting 23 best variables from the dataset

selector = SelectKBest(f_regression, k = 23)
X_train_transformed = selector.fit_transform(X_train, y_train)
X_test_transformed = selector.transform(X_test)
column_names[selector.get_support()]

Index(['mileage', 'tax', 'mpg', 'engineSize', 'age_of_car', 'model_ Amarok',
       'model_ Arteon', 'model_ California', 'model_ Caravelle', 'model_ Polo',
       'model_ Sharan', 'model_ Shuttle', 'model_ T-Roc', 'model_ Tiguan',
       'model_ Tiguan Allspace', 'model_ Touareg', 'model_ Up',
       'transmission_Automatic', 'transmission_Manual',
       'transmission_Semi-Auto', 'fuelType_Diesel', 'fuelType_Hybrid',
       'fuelType_Petrol'],
      dtype='object')

def regression_model(model):
    """
    Will fit the regression model passed and will return the regressor object and the score
    """
    regressor = model
    regressor.fit(X_train_transformed, y_train)
    score = regressor.score(X_test_transformed, y_test)
    return regressor, score

model_performance = pd.DataFrame(columns = ["Features", "Model", "Score"])

models_to_evaluate = [LinearRegression(), Ridge(), Lasso(), SVR(), RandomForestRegressor(), MLPRegressor()]

for model in models_to_evaluate:
    regressor, score = regression_model(model)
    model_performance = model_performance.append({"Features": "Linear","Model": model, "Score": score}, ignore_index=True)

model_performance

The best score we are getting is on a RandomForestRegressor() with a score of 0.9513

Backward selection for variable selection on linear regression¶

Fitting a linear regression model and checking the model parameters

regressor = sm.OLS(y_train, X_train).fit()
print(regressor.summary())

X_train_dropped = X_train.copy()

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                  price   R-squared (uncentered):                   0.889
Model:                            OLS   Adj. R-squared (uncentered):              0.888
Method:                 Least Squares   F-statistic:                              2513.
Date:                Fri, 31 Jul 2020   Prob (F-statistic):                        0.00
Time:                        06:55:00   Log-Likelihood:                         -3711.5
No. Observations:               11367   AIC:                                      7495.
Df Residuals:                   11331   BIC:                                      7759.
Df Model:                          36                                                  
Covariance Type:            nonrobust                                                  
==========================================================================================
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
mileage                   -0.2089      0.005    -39.838      0.000      -0.219      -0.199
tax                       -0.0772      0.004    -18.385      0.000      -0.085      -0.069
mpg                       -0.1549      0.006    -27.087      0.000      -0.166      -0.144
engineSize                 0.4255      0.007     59.650      0.000       0.412       0.439
age_of_car                -0.3410      0.005    -64.543      0.000      -0.351      -0.331
model_ Amarok              0.0180      0.003      5.136      0.000       0.011       0.025
model_ Arteon              0.0490      0.003     15.623      0.000       0.043       0.055
model_ Beetle             -0.0091      0.003     -2.892      0.004      -0.015      -0.003
model_ CC                 -0.0249      0.003     -7.944      0.000      -0.031      -0.019
model_ Caddy              -0.0006      0.003     -0.174      0.862      -0.007       0.006
model_ Caddy Life         -0.0031      0.003     -0.893      0.372      -0.010       0.004
model_ Caddy Maxi          0.0034      0.003      1.063      0.288      -0.003       0.010
model_ Caddy Maxi Life    -0.0172      0.003     -4.918      0.000      -0.024      -0.010
model_ California          0.1397      0.003     47.263      0.000       0.134       0.145
model_ Caravelle           0.1905      0.003     58.630      0.000       0.184       0.197
model_ Eos                 0.0013      0.003      0.450      0.653      -0.004       0.007
model_ Fox                 0.0070      0.003      2.218      0.027       0.001       0.013
model_ Golf               -0.0274      0.002    -11.246      0.000      -0.032      -0.023
model_ Golf SV            -0.0334      0.003    -10.654      0.000      -0.040      -0.027
model_ Jetta              -0.0204      0.003     -6.607      0.000      -0.027      -0.014
model_ Passat             -0.0114      0.003     -3.651      0.000      -0.018      -0.005
model_ Polo               -0.1247      0.004    -34.840      0.000      -0.132      -0.118
model_ Scirocco           -0.0333      0.003    -10.390      0.000      -0.040      -0.027
model_ Sharan              0.0284      0.003      8.828      0.000       0.022       0.035
model_ Shuttle             0.0295      0.003      9.404      0.000       0.023       0.036
model_ T-Cross             0.0328      0.003     10.116      0.000       0.026       0.039
model_ T-Roc               0.0659      0.003     21.683      0.000       0.060       0.072
model_ Tiguan              0.1023      0.003     32.386      0.000       0.096       0.108
model_ Tiguan Allspace     0.0548      0.003     17.367      0.000       0.049       0.061
model_ Touareg             0.1045      0.004     25.636      0.000       0.097       0.113
model_ Touran              0.0405      0.003     13.234      0.000       0.034       0.046
model_ Up                 -0.1325      0.004    -37.508      0.000      -0.139      -0.126
transmission_Automatic     0.0329      0.003     12.109      0.000       0.028       0.038
transmission_Manual       -0.0570      0.002    -26.830      0.000      -0.061      -0.053
transmission_Semi-Auto     0.0384      0.002     17.116      0.000       0.034       0.043
fuelType_Diesel           -0.0828      0.003    -30.395      0.000      -0.088      -0.077
fuelType_Hybrid            0.1564      0.004     37.095      0.000       0.148       0.165
fuelType_Other             0.0182      0.003      5.890      0.000       0.012       0.024
fuelType_Petrol            0.0490      0.003     16.813      0.000       0.043       0.055
==============================================================================
Omnibus:                     2551.082   Durbin-Watson:                   2.050
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            15150.663
Skew:                           0.945   Prob(JB):                         0.00
Kurtosis:                       8.330   Cond. No.                     1.21e+16
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 2.89e-28. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

while True:
    if max(regressor.pvalues) > 0.05:
        drop_variable = regressor.pvalues[regressor.pvalues == max(regressor.pvalues)]
        print("Dropping " + drop_variable.index[0] + " and running regression again because pvalue is: " + str(drop_variable[0]))
        X_train_dropped = X_train_dropped.drop(columns = [drop_variable.index[0]])
        regressor = sm.OLS(y_train, X_train_dropped).fit()
    else:
        print("All p values less than 0.05")
        break

Dropping model_ Caddy and running regression again because pvalue is: 0.8615422166753985
Dropping model_ Passat and running regression again because pvalue is: 0.9125329548360342
Dropping model_ Caddy Life and running regression again because pvalue is: 0.5666150517208496
Dropping model_ Golf and running regression again because pvalue is: 0.47965914705962986
Dropping model_ Eos and running regression again because pvalue is: 0.38949430132000473
Dropping model_ Caddy Maxi and running regression again because pvalue is: 0.17901474935694434
Dropping model_ Beetle and running regression again because pvalue is: 0.12094753413132125
All p values less than 0.05

8 variables are dropped because p value is higher than our alpha level of 0.05. We fit the model with the remaining variables and see the summary below.

We can see a slight improvement over the linear regression in our earlier step with SKLearn fit which yielded a r_squared value of 0.87, this vies us a r_square value of 0.89

print(regressor.summary())

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                  price   R-squared (uncentered):                   0.889
Model:                            OLS   Adj. R-squared (uncentered):              0.888
Method:                 Least Squares   F-statistic:                              3016.
Date:                Fri, 31 Jul 2020   Prob (F-statistic):                        0.00
Time:                        06:55:00   Log-Likelihood:                         -3714.4
No. Observations:               11367   AIC:                                      7489.
Df Residuals:                   11337   BIC:                                      7709.
Df Model:                          30                                                  
Covariance Type:            nonrobust                                                  
==========================================================================================
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
mileage                   -0.2078      0.005    -40.043      0.000      -0.218      -0.198
tax                       -0.0773      0.004    -18.507      0.000      -0.086      -0.069
mpg                       -0.1552      0.006    -27.231      0.000      -0.166      -0.144
engineSize                 0.4256      0.007     59.786      0.000       0.412       0.440
age_of_car                -0.3421      0.005    -65.669      0.000      -0.352      -0.332
model_ Amarok              0.0227      0.004      6.446      0.000       0.016       0.030
model_ Arteon              0.0562      0.003     17.443      0.000       0.050       0.063
model_ CC                 -0.0204      0.003     -6.449      0.000      -0.027      -0.014
model_ Caddy Maxi Life    -0.0137      0.004     -3.885      0.000      -0.021      -0.007
model_ California          0.1415      0.003     47.723      0.000       0.136       0.147
model_ Caravelle           0.1950      0.003     59.183      0.000       0.189       0.201
model_ Fox                 0.0080      0.003      2.529      0.011       0.002       0.014
model_ Golf SV            -0.0258      0.003     -7.918      0.000      -0.032      -0.019
model_ Jetta              -0.0178      0.003     -5.735      0.000      -0.024      -0.012
model_ Polo               -0.1003      0.005    -22.283      0.000      -0.109      -0.091
model_ Scirocco           -0.0261      0.003     -7.940      0.000      -0.033      -0.020
model_ Sharan              0.0358      0.003     10.750      0.000       0.029       0.042
model_ Shuttle             0.0331      0.003     10.452      0.000       0.027       0.039
model_ T-Cross             0.0409      0.003     11.969      0.000       0.034       0.048
model_ T-Roc               0.0784      0.003     23.261      0.000       0.072       0.085
model_ Tiguan              0.1206      0.004     32.378      0.000       0.113       0.128
model_ Tiguan Allspace     0.0592      0.003     18.515      0.000       0.053       0.065
model_ Touareg             0.1130      0.004     27.600      0.000       0.105       0.121
model_ Touran              0.0491      0.003     15.284      0.000       0.043       0.055
model_ Up                 -0.1185      0.004    -30.371      0.000      -0.126      -0.111
transmission_Automatic     0.0331      0.003     12.185      0.000       0.028       0.038
transmission_Manual       -0.0573      0.002    -27.001      0.000      -0.061      -0.053
transmission_Semi-Auto     0.0385      0.002     17.190      0.000       0.034       0.043
fuelType_Diesel           -0.0825      0.003    -30.633      0.000      -0.088      -0.077
fuelType_Hybrid            0.1568      0.004     37.424      0.000       0.149       0.165
fuelType_Other             0.0182      0.003      5.887      0.000       0.012       0.024
fuelType_Petrol            0.0485      0.003     16.926      0.000       0.043       0.054
==============================================================================
Omnibus:                     2547.532   Durbin-Watson:                   2.049
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            15119.832
Skew:                           0.944   Prob(JB):                         0.00
Kurtosis:                       8.325   Cond. No.                     4.30e+15
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 2.3e-27. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

Fitting on polynomial features¶

I would like to explore the dataset a bit further to see if a polynomial variable model is performing better on the same models.

I am using PolynomialFeatures() to engineer polynomial features from the dataset. We have around 820 features from PolynomialFeatures(), so again using SelectKBest to see how much is our optimum feature set size

poly = PolynomialFeatures()
X_train_transformed_poly = poly.fit_transform(X_train)
X_test_transformed_poly = poly.transform(X_test)

print(X_train_transformed_poly.shape)

no_of_features = []
r_squared = []

for k in range(10, 277, 5):
    selector = SelectKBest(f_regression, k = k)
    X_train_transformed = selector.fit_transform(X_train_transformed_poly, y_train)
    regressor = LinearRegression()
    regressor.fit(X_train_transformed, y_train)
    no_of_features.append(k)
    r_squared.append(regressor.score(X_train_transformed, y_train))
    
sns.lineplot(x = no_of_features, y = r_squared)

(11367, 820)

<matplotlib.axes._subplots.AxesSubplot at 0x7fb9ce678690>

From the above graph we can see that we are hitting 0.93 score around 110 features.

selector = SelectKBest(f_regression, k = 110)
X_train_transformed = selector.fit_transform(X_train_transformed_poly, y_train)
X_test_transformed = selector.transform(X_test_transformed_poly)

models_to_evaluate = [LinearRegression(), Ridge(), Lasso(), SVR(), RandomForestRegressor(), MLPRegressor()]

for model in models_to_evaluate:
    regressor, score = regression_model(model)
    model_performance = model_performance.append({"Features": "Polynomial","Model": model, "Score": score}, ignore_index=True)

model_performance

Conclusion:¶

I got maximum r^2 score of 0.955 for polynomian data on RandomForest regressor.

As next steps, I can concentrate on individual features, and make some transformations such as log transforms on each of the features to make the model perform even better.

Please upvote the notebook if you liked it, and leave me a feedback if you think something could have been better.

References:¶

https://medium.com/@mayankshah1607/machine-learning-feature-selection-with-backward-elimination-955894654026

	year	price	mileage	tax	mpg	engineSize
count	15157.000000	15157.000000	15157.000000	15157.000000	15157.000000	15157.000000
mean	2017.255789	16838.952365	22092.785644	112.744277	53.753355	1.600693
std	2.053059	7755.015206	21148.941635	63.482617	13.642182	0.461695
min	2000.000000	899.000000	1.000000	0.000000	0.300000	0.000000
25%	2016.000000	10990.000000	5962.000000	30.000000	46.300000	1.200000
50%	2017.000000	15497.000000	16393.000000	145.000000	53.300000	1.600000
75%	2019.000000	20998.000000	31824.000000	145.000000	60.100000	2.000000
max	2020.000000	69994.000000	212000.000000	580.000000	188.300000	3.200000

	price	mileage	tax	mpg	engineSize	age_of_car	model_ Amarok	model_ Arteon	model_ Beetle	model_ CC	model_ Caddy	model_ Caddy Life	model_ Caddy Maxi	model_ Caddy Maxi Life	model_ California	model_ Caravelle	model_ Eos	model_ Fox	model_ Golf	model_ Golf SV	model_ Jetta	model_ Passat	model_ Polo	model_ Scirocco	model_ Sharan	model_ Shuttle	model_ T-Cross	model_ T-Roc	model_ Tiguan	model_ Tiguan Allspace	model_ Touareg	model_ Touran	model_ Up	transmission_Automatic	transmission_Manual	transmission_Semi-Auto	fuelType_Diesel	fuelType_Hybrid	fuelType_Other	fuelType_Petrol
0	1.052392	-0.387209	0.508120	-0.304459	0.864902	-0.849595	-0.085892	-0.128974	-0.074204	-0.079418	-0.0199	-0.02298	-0.016247	-0.062512	-0.031474	-0.081904	-0.021495	-0.016247	-0.687322	-0.134164	-0.045997	-0.253469	-0.526229	-0.127378	-0.13211	-0.063567	-0.1421	4.435993	-0.363036	-0.077718	-0.156643	-0.154194	-0.248868	2.594834	-1.280856	-0.576411	1.174175	-0.09828	-0.075981	-1.138035
1	1.295211	-0.828948	0.508120	-0.304459	0.864902	-0.849595	-0.085892	-0.128974	-0.074204	-0.079418	-0.0199	-0.02298	-0.016247	-0.062512	-0.031474	-0.081904	-0.021495	-0.016247	-0.687322	-0.134164	-0.045997	-0.253469	-0.526229	-0.127378	-0.13211	-0.063567	-0.1421	4.435993	-0.363036	-0.077718	-0.156643	-0.154194	-0.248868	2.594834	-1.280856	-0.576411	1.174175	-0.09828	-0.075981	-1.138035
2	0.407627	-0.694090	0.508120	-0.245816	0.864902	-0.849595	-0.085892	-0.128974	-0.074204	-0.079418	-0.0199	-0.02298	-0.016247	-0.062512	-0.031474	-0.081904	-0.021495	-0.016247	-0.687322	-0.134164	-0.045997	-0.253469	-0.526229	-0.127378	-0.13211	-0.063567	-0.1421	4.435993	-0.363036	-0.077718	-0.156643	-0.154194	-0.248868	-0.385381	0.780728	-0.576411	1.174175	-0.09828	-0.075981	-1.138035
3	2.147462	-0.816512	0.508120	-1.557966	0.864902	-0.849595	-0.085892	-0.128974	-0.074204	-0.079418	-0.0199	-0.02298	-0.016247	-0.062512	-0.031474	-0.081904	-0.021495	-0.016247	-0.687322	-0.134164	-0.045997	-0.253469	-0.526229	-0.127378	-0.13211	-0.063567	-0.1421	4.435993	-0.363036	-0.077718	-0.156643	-0.154194	-0.248868	2.594834	-1.280856	-0.576411	-0.851661	-0.09828	-0.075981	0.878707
4	0.781591	-0.737309	0.586884	-1.022843	-0.218101	-0.849595	-0.085892	-0.128974	-0.074204	-0.079418	-0.0199	-0.02298	-0.016247	-0.062512	-0.031474	-0.081904	-0.021495	-0.016247	-0.687322	-0.134164	-0.045997	-0.253469	-0.526229	-0.127378	-0.13211	-0.063567	-0.1421	4.435993	-0.363036	-0.077718	-0.156643	-0.154194	-0.248868	-0.385381	-1.280856	1.734874	-0.851661	-0.09828	-0.075981	0.878707

	model	year	price	transmission	mileage	fuelType	tax	mpg	engineSize
0	T-Roc	2019	25000	Automatic	13904	Diesel	145	49.6	2.0
1	T-Roc	2019	26883	Automatic	4562	Diesel	145	49.6	2.0
2	T-Roc	2019	20000	Manual	7414	Diesel	145	50.4	2.0
3	T-Roc	2019	33492	Automatic	4825	Petrol	145	32.5	2.0
4	T-Roc	2019	22900	Semi-Auto	6500	Petrol	150	39.8	1.5

	model	price	transmission	mileage	fuelType	tax	mpg	engineSize	age_of_car
2888	Golf	10995	Semi-Auto	32741	Diesel	20	67.3	1.6	6
7801	Polo	10495	Manual	31697	Petrol	145	60.1	1.2	3
900	Golf	25990	Semi-Auto	6705	Petrol	145	37.7	2.0	1
14745	Amarok	39999	Automatic	2451	Diesel	260	33.6	3.0	0
6400	Passat	9795	Manual	97060	Diesel	20	67.3	2.0	4
3913	Golf	18299	Manual	7562	Petrol	145	45.6	1.5	1
15144	Caddy Maxi	9949	Automatic	93113	Diesel	160	52.3	1.6	5
7484	Polo	12999	Semi-Auto	31243	Petrol	20	61.4	1.4	3
4625	Golf	10393	Manual	44906	Diesel	0	74.3	1.6	4
5013	Golf	18490	Automatic	6289	Petrol	145	44.1	1.5	1

	Features	Model	Score
0	Linear	LinearRegression()	0.883684
1	Linear	Ridge()	0.883686
2	Linear	Lasso()	-0.001168
3	Linear	SVR()	0.938504
4	Linear	(DecisionTreeRegressor(max_features='auto', ra...	0.952169
5	Linear	MLPRegressor()	0.942873