Multivariate Linear Regression Program in Python
Multivariate Linear Regression Program in Python
Here,
There are many dependent variables and only one independent variable
# Importing packages
import pandas as pd
import numpy as np
import math
from word2number import w2n
from sklearn import linear_model
# Read the csv File (Input File/Train data)
df = pd.read_csv("D:\hiring.csv")
# Filling the experience column considering that experience as zero
df.experience = df.experience.fillna("zero")
df
# To fill the Test score column, we will take mean of all test_score and replace NaN with that value
median_test_score = math.floor(df['test_score(out of 10)'].mean())
median_test_score
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(median_test_score)
df
# Converting the experience column from string(str) to Value(num)
df.experience = df.experience.astype(str)
df.experience = df.experience.apply(w2n.word_to_num)
df
# Train the Model
reg = linear_model.LinearRegression()
reg.fit(df[['experience','test_score(out of 10)','interview_score(out of 10)']],df['salary($)']) # reg.fit(dependent variable, independent variable)
reg.predict([[10,10,10]])
reg.predict([[0,5,5]])
For reference:
Input Data File : https://drive.google.com/file/d/1BGDw1G_pJChEhUBtBR5XSXFHg96dt9K5/view?usp=sharing