Corona Patient prediction using machine learning – covid19

covid19 - www.devpyjp.com

People were dying, people are suffering. Day by day it’s increasing. Is it our job to protect ourselves, only doctors will do that..? I want to contribute at least a small to covid19 issue, so I decided to build a machine learning model that predicts corona patients based on the given data.

Stay home, stay healthy, save the world.

What you learn here

  1. Build a corona dataset with best features
  2. Applying the machine learning models to predict covid19 patients
  3. Turning Machine learning models hyperparameters .
  4. Deploying our models thorugh Flask

I took a random data and arrange them properly. here is my data set. I applied different conditions to arrange them properly.

1. How arranged this random data to predict properly – covid19

I used these 3 conditions, you can use more to predict more properly. who are having fever, body pains, runny Nose and breathing difficulty then they almost have corona, in contrast, they were not.

Condition – 1:

If someone satisfying these condition,

if 
# Fever > 100
# Bodypains =1
# runnyNose =1
# breath = 1

return - 1

I applied the condition like below, they are simple you can understand them easily.

cond_1 =(df['Fever']>100) & (df['BodyPains']==1) & (df['RunnyNose']==1) & (df['Difficulty_in_Breath']==1)
df['infection_Probability'][cond_1] = 1

Condition – 2

# Age > 60
        (and)
# Fever > 99
        (and)
# runnyNose =1
        (and)     
 # breath = 1
     (or)
# Bodypains =1

return - 1
cond_2 = (df['Age']>=60) & (df['Fever']>99) & (df['RunnyNose']==1) &  (df['Difficulty_in_Breath']==1) | (df['BodyPains']==1)
df['infection_Probability'][cond_2] = 1

Condition – 3

# Fever > 99
        (and)
# runnyNose =0
        (and)     
 # breath = 0
     (and)
# Bodypains =0

return - 0
cond_3 =(df['Fever']>99) & (df['BodyPains']==0) & (df['RunnyNose']==0) & (df['Difficulty_in_Breath']==0)
df['infection_Probability'][cond_3] = 0

df[cond_3]

Alright, we build good and valid data points to predict future infected patients using the machine learning model.

Now the target variable is like below, I know it is an unbalanced dataset. Don’t worry we will take care of that when we build our machine learning model.

Data splitting

Now we are good to build a machine learning model that predicts corona patients earlier. Now we are going to split our data into train and test datasets.

you can use k-fold or stratified or shuffled split to split the data.

from sklearn.model_selection import StratifiedKFold,KFold,cross_val_score,ShuffleSplit,GridSearchCV

cv = StratifiedKFold(n_splits=5,random_state=11)

#kf = KFold(n_splits=5,random_state=100)

for train_index, test_index in cv.split(X,Y):
    
    #print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = Y[train_index], Y[test_index]

print(X_train.shape)

print(X_test.shape)

2.Building a corona patient prediction machine learning model

Now we are good to go to build a machine learning model that predicts corona patient prediction.

I used 2 classical machine learning algorithms which are good to handle unbalanced datasets. Those are

Corona Patient Prediction using Logistic regression model

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,confusion_matrix


# Logistice Regression

lr = LogisticRegression(class_weight='balanced')
lr.fit(X_train,y_train)
lr_pred = lr.predict(X_test)
lr_acc = accuracy_score(y_test,lr_pred)
print(lr_acc)



cv_scores = cross_val_score(lr,X,Y,cv=cv)

print(cv_scores)

By using the logistic regression model to this data I got 80% accuracy, its a really good accuracy. I tried training accuracy that also fine, we are not overfitting or underfitting.

lr_cm = confusion_matrix(y_test,lr_pred)
lr_df = pd.DataFrame(data=lr_cm,columns=['0','1'],index=['0','1'])
sns.heatmap(lr_df,annot=True,cbar=False)
plt.show()

Confusion matrix – logistic regression

here we need to focus on positive samples because we don’t want to miss a single corona patient to die, we must consider precision and recall or f1-score.

Corona Patient Prediction using Logistic regression model

Now its time to experiment with our data with the decision tree classifier. you can try your favorite models.

tr = DecisionTreeClassifier(class_weight='balanced')
tr.fit(X_train,y_train)
tr_pred = tr.predict(X_test)
tr_acc = accuracy_score(y_test,tr_pred)
print(tr_acc)

# Tree 
cv_sccores = cross_val_score(tr,X,Y,cv=cv)
print(cv_scores)

Decision tree classifier also gives the same accuracy,

tr_cm = confusion_matrix(y_test,tr_pred)

tr_df = pd.DataFrame(data=tr_cm,columns=['0','1'],index=['0','1'])

sns.heatmap(tr_df,annot=True,cbar=False)
plt.show()

3. Tuning machine models hyperparameters

I am giving a super-simplified function, you can use anywhere you want wherever you need to tune your hyperparameters of your machine learning models.

def find_best_model(X,Y):
    algos = {
        
        'logistic_reg':{
            'model':LogisticRegression(class_weight='balanced'),
        'params' :{
            'penalty':['l1','l2'],
            'C':[0.0001,0.001,0.01,0.1,1.0,10,100,1000]
        }
    },
        
    'DT_clf':{
        'model':DecisionTreeClassifier(),
        'params':{
            'criterion':['gini', 'entropy'],
            'max_depth': [2,4,6,8,12]
        }
    }
    }
    
    
    scores =[]
    
    cv = ShuffleSplit(n_splits=5,test_size=0.2,random_state=567)
    
    for algo_name,config in algos.items():
        gd = GridSearchCV(config['model'],param_grid=config['params'],cv=cv,return_train_score=False)
        gd.fit(X,Y)
        
        scores.append({
            'model':algo_name,
            'best_score':gd.best_score_,
            'best_params':gd.best_params_
            
        })
        
    return pd.DataFrame(scores,columns=['model','best_score','best_params'])

It will return a data frame that consists of the model name and best parameters and scores.

here the winner is the Decision tree classifier, now its time to build our final machine learning model that predicts corona (covid19) infected patients.

Final machine learning model – covid19

Here I am going to build a machine learning model with a decision tree classifier with good precision, recall, and F1-score.

dt_clf  = DecisionTreeClassifier(criterion='gini',max_depth=6,class_weight='balanced')
dt_clf.fit(X_train,y_train)
y_pred = dt_clf.predict(X_test)
accuracy_score(y_test,y_pred)

# It gives 82% accuracy

Confusion matrix

tr_cm = confusion_matrix(y_test,y_pred)
tr_df = pd.DataFrame(data=tr_cm,columns=['0','1'],index=['0','1'])
sns.heatmap(tr_df,annot=True,cbar=False)
plt.show()

Precision, Recall and F1-score check:

We never miss a patient who needs a medical diagnosis. so we must improve F1-score.

from sklearn.metrics import f1_score,precision_score,recall_score

f1 = f1_score(y_test,y_pred)
print(f1)

pr = precision_score(y_test,y_pred)
print(pr)

rc = recall_score(y_test,y_pred)
print(rc)


#Output:

0.8631346578366447
0.9949109414758269
0.7621832358674464

Here we improve the Precision and recall.so it is useful to avoid false negatives.

Positive Prediction Check

dt_clf.predict([[60,100,1,1,1]])[0]

#Output : 1

Negative Prediction Check

dt_clf.predict([[60,100,0,0,0]])[0]

#Output : 0

4. Exporting Machine learning Model & Deploying

Its time to export our machine learning model to deploy online. so here we use the python pickle module to export.

import pickle
with open('corona.pkl','wb') as f:
    pickle.dump(dt_clf,f)

Finally, we build a machine learning model that predicts corona patients who are infecting in the future.

I deployed this machine learning model using flask, I gave my GitHub link to get that code.

Github code: Machine learning model deployment using flask

Outputs:

I hope you definitely love this tutorial, at least it leads some good idea to help covid19. I contribute a little to the covid19 problem, Now it’s your time to help the World. let’s do that guy. #waragainstcovid19

1 thought on “Corona Patient prediction using machine learning – covid19”

  1. The debugger caught an exception in your WSGI application. You can now look at the traceback which led to the error.
    To switch between the interactive traceback and the plaintext one, you can click on the “Traceback” headline. From the text traceback you can also create a paste of it. For code execution mouse-over the frame you want to debug and click on the console icon on the right side.

    You can execute arbitrary Python code in the stack frames and there are some extra helpers available for introspection:

    dump() shows all variables in the frame
    dump(obj) dumps all that’s known about the object

Leave a Reply