Embarking on the journey of learning Machine Learning through NPTEL’s course offered by IIT Kharagpur is an exciting venture. This course, spanning several weeks, dives deep into the fundamentals and advanced concepts of machine learning. Week 1 lays the groundwork, setting the stage for the rest of the course. This article aims to guide you through the Week 1 assignment, providing detailed answers and explanations to ensure you grasp the foundational concepts.
Question 1
Which of the following is a classification task?
a. Detect pneumonia from chest X-ray image
b. Predict the price of a house based on floor area, number of rooms etc.
c. Predict the temperature for the next day
d. Predict the amount of rainfall
Answer: a. Detect pneumonia from chest X-ray image
Explanation: Classification involves predicting a discrete label. Detecting pneumonia from chest X-rays is a classification task, as it involves determining whether or not pneumonia is present (binary classification).
Question 2
Which of the following is not a type of supervised learning?
a. Classification
b. Regression
c. Clustering
d. None of the above
Answer: c. Clustering
Explanation: Clustering is a type of unsupervised learning, where the goal is to group similar data points without prior labels.
Question 3
Which of the following tasks is NOT a suitable machine learning task?
a. Finding the shortest path between a pair of nodes in a graph
b. Predicting if a stock price will rise or fall
c. Predicting the price of petroleum
d. Grouping mails as spams or non-spams
Answer: a. Finding the shortest path between a pair of nodes in a graph
Explanation: Finding the shortest path between nodes is a graph algorithm problem, typically solved by algorithms like Dijkstra's or A*. It is not a machine learning task, which involves learning from data.
Question 4
Suppose I have 10,000 emails in my mailbox out of which 300 are spams. The spam detection system detects 150 mails as spams, out of which 50 are actually spams. What is the precision and recall of my spam detection system?
a. Precision = 33.33%, Recall = 25%
b. Precision = 25%, Recall = 33.33%
c. Precision = 33.33%, Recall = 16.66%
d. Precision = 75%, Recall = 33.33%
Answer: a. Precision = 33.33%, Recall = 25%
Explanation: Precision is calculated as , and recall is calculated as .
- True Positives (TP) = 50
- False Positives (FP) = 150 - 50 = 100
- False Negatives (FN) = 300 - 50 = 250
Precision =
Recall =
Question 5
Which of the following is/are supervised learning problems?
A. Predicting disease from blood samples.
B. Grouping students in the same class based on similar features.
C. Face recognition to unlock your phone.
Answer: A. Predicting disease from blood samples.
Answer: C. Face recognition to unlock your phone.
Explanation: Predicting disease from blood samples and face recognition are supervised learning problems because they involve training a model on labeled data. Grouping students is an unsupervised learning problem (clustering).
Question 6
Aliens challenge you to a complex game that no human has seen before. They give you time to learn the game and develop strategies before the final showdown. You choose to use machine learning because an intelligent machine is your only hope. Which machine learning paradigm should you choose for this?
a. Supervised learning
b. Unsupervised learning
c. Reinforcement learning
d. Use a random number generator and hope for the best
Answer: c. Reinforcement learning
Explanation: Reinforcement learning is suitable for scenarios where an agent needs to learn from interactions with an environment to achieve a goal, making it ideal for developing strategies in a new and complex game.
Question 7
How many Boolean functions are possible with N features?
a.
b.
c.
d.
Answer: a.
Explanation: For N Boolean variables, there are possible input combinations. Each combination can map to either 0 or 1, resulting in possible Boolean functions.
Question 8
What is the use of Validation dataset in Machine Learning?
a. To train the machine learning model.
b. To evaluate the performance of the machine learning model.
c. To tune the hyperparameters of the machine learning model.
d. None of the above.
Answer: c. To tune the hyperparameters of the machine learning model.
Explanation: A validation dataset is used to tune hyperparameters and prevent overfitting by providing an unbiased evaluation of a model fit on the training dataset.
Question 9
Regarding bias and variance, which of the following statements are true? (Here 'high' and 'low' are relative to the ideal model.)
a. Models which overfit have a high bias.
b. Models which overfit have a low bias.
c. Models which underfit have a high variance.
d. Models which underfit have a low variance.
Answer: b. Models which overfit have a low bias.
Answer: d. Models which underfit have a low variance.
Explanation: Models that overfit have low bias (they fit the training data very well) but high variance (they don't generalize well). Models that underfit have high bias (they don't fit the training data well) but low variance (they are too simple to capture the underlying trend).
Question 10
Which of the following is a categorical feature?
a. Height of a person
b. Price of petroleum
c. Mother tongue of a person
d. Amount of rainfall in a day
Answer: c. Mother tongue of a person
Explanation: A categorical feature represents discrete categories or labels. "Mother tongue of a person" is categorical, as it involves different languages (e.g., English, Spanish, Chinese). The other options are numerical features.