1. What is data reduction?
Answer:
Data reduction is the process of reducing the volume of data while producing the same or similar analytical results.
2. Why do we need data reduction?
Answer:
-
Improves model performance (speed & accuracy)
-
Helps in data visualization
-
Reduces dimensionality
-
Removes noise
-
Leads to simpler, faster, and more accurate models
3. What is feature selection (aka attribute/variable selection)?
Answer:
It’s the process of selecting an optimal subset of features from the data that contribute most to the model, based on a specific evaluation criterion.
5. What are the main techniques for feature selection (data reduction)?
| Method | Description | Tools/Details |
|---|---|---|
| Wrapper Method | Uses a classifier to evaluate feature subsets based on their performance | - Generates all possible subsets- Uses a search technique to find best one |
| Filter Method | Ranks features using an attribute evaluator and selects top-ranked ones | - Doesn’t rely on classifier- Example in WEKA: InfoGainAttributeEval + Ranker |
6. Difference between Wrapper and Filter methods?
| Feature | Wrapper Method | Filter Method |
|---|---|---|
| Based on | Classifier performance | Statistical evaluation |
| Speed | Slower (computationally expensive) | Faster |
| Accuracy | Generally more accurate | May not consider interaction between features |
| Tool Example | Classifier + subset evaluator | InfoGain + Ranker in WEKA |
Comments
Post a Comment