What is Overfitting and Underfitting in Machine Learning?

What is Overfitting and Underfitting in Machine Learning?

Do you want to know What is Overfitting and Underfitting in Machine Learning?… If yes, then this article is for you. In this article, I will explain Overfitting and Underfitting in Machine Learning. I tried to explain the concepts in an easy way so that you can understand them easily.

Now without further ado, let’s get started-

What is Overfitting and Underfitting in Machine Learning?

What is Overfitting?

Let’s understand the concept of Overfitting with the help of an image-

What is Overfitting and Underfitting in Machine Learning?

The curve you are seeing in this image is the output of your training. That means when we give our data in the training phase and the model or the output which comes after training is shown in this curve.

The small points or circles you are seeing are nothing but your data points or observation points.

So, when this curved line tries to cover all the data points in the x and y plot, this situation is known as Overfitting. As you can see in this image, the curved lines covered all the data points. And this is an Overfitting problem.

Now, let’s see the Reasons for Overfitting.

Reasons for Overfitting

  • The data you are using is not clean which means it has noise.
  • The model is too complex.
  • When you pass extra data to the Model.

When we pass extra data during the training phase, or you can say when we trained our model with extra and unnecessary data, the Overfitting problem arrives.

Let’s see the example of Overfitting.

Example of Overfitting

Suppose we trained our model to predict the ball and gave the following features-

->Sphere

->Play

->Eat

->Radius

So, we trained the model on these features.

After the training phase, we perform the test phase and give random data and the model has to predict whether it is a ball or not. Model predict based on these given features-

1- Sphere– Model check whether the image is in sphere shape or not. If it is in a sphere shape, it passes the test.

2- Play– If there is a yes in play, it will also pass the test.

3- Eat– If this is not true that means it is not eatable, then it passes the test.

4- Radius– The Radius is given as 5 cm in the training phase. So, if the given ball radius is within 5 cm, it passes the test.

Now, suppose I gave data of football ball. So, the first three features will be passed but the last feature radius where the radius should be 5 cm will fail.

And this is due to Overfitting. When we train our model with extra features for eg “Radius” feature in this example, our model gives the wrong prediction.

I hope you understood the overfitting with this example. Now, let’s understand the Underfitting problem.

What is Underfitting?

Again, let’s understand Underfitting with the help of this image-

What is Overfitting and Underfitting in Machine Learning?

Underfitting is the opposite of Overfitting. The straight line in this image is the output of your training phase. And very few data points are nearby to the line. Most of the data points are far from this line. That means the line is not trying to fit properly to the data points. And this is known as Underfitting.

Reasons for Underfitting

  • The data is not clean and has garbage values.
  • The size of the training data is too small.
  • The model is too simple.

Now, let’s see an example of Underfitting.

Example of Underfitting

Let’s take the same example of Overfitting. We have to train the model to predict the ball. And now, we train our model with only one feature that is-

->Sphere

We give only one feature for training our model. Once the model is trained, the next is the test phase. So, when we give random data to the model, the model has to check whether it is a ball or not.

As we give only one feature “Sphere” for predicting a ball. So, suppose we give orange data to the model, and the model predicts it as a ball. Why? Because the model checks whether it is a sphere or not and orange is a sphere that’s why it predicts orange as a ball.

And this is an Underfitting problem. When we train our model with little data, the model gave the wrong prediction. I hope you understood.

And here the concept of Good Fit comes.

What is a Good Fit?

A Linear curve that fit the data right and is neither an Overfit nor Underfit.

What is Overfitting and Underfitting in Machine Learning?

As you can see in this image, the line doesn’t try to overfit or underfit the data points. This is a perfect or best fit.

That’s all.

Conclusion

I hope you understood What is Overfitting and Underfitting in Machine Learning? If you have any doubts, feel free to ask me in the comment section.

Happy Learning!

Thank YOU!

Learn Machine Learning A to Z Basics

Subscribe For More Updates!

[mc4wp_form id=”28437″]

Though of the Day…

Anyone who stops learning is old, whether at twenty or eighty. Anyone who keeps learning stays young.

– Henry Ford

author image

Written By Aqsa Zafar

Founder of MLTUT, Machine Learning Ph.D. scholar at Dayananda Sagar University. Research on social media depression detection. Create tutorials on ML and data science for diverse applications. Passionate about sharing knowledge through website and social media.

Leave a Comment

Your email address will not be published. Required fields are marked *