Top 10 Data Mining Mistakes

Maybe some of you have read this white paper before, but I just want to add here as resource collection for future data mining beginners. The paper is a book excerpts from “Handbook of Statistical Analysis and Data Mining Applications“, Elsevier (ISBN: 978-0-123747655). According to the authors, mining data to extract useful and enduring patterns remains a skill arguably more art than science itself. In the paper, they briefly describe, and illustrate from examples, what they believe are the “Top 10” mistakes of data mining, in terms of frequency and seriousness.

Top 10 DM Mistakes (white paper)

0. Lack of Data (important too!)
1. Focus on Training
2. Rely on One Technique
3. Ask the Wrong Question
4. Listen (Only) to the Data
5. Accept Leaks from the Future
6. Discount Pesky Cases
7. Extrapolate
8. Answer Every Inquiry
9. Sample Casually
10. Believe the Best Model

I would like to emphasize on mistake no. 2 (Rely on 1 technique only) which I think is important for us to consider. In data mining task, it is important that we try variations of modeling algorithms to make sure that we get the best result. Find new algorithms/tools that are available in the market (sometimes it is good to read new publication in conference/journal to get latest improvement of the algorithms) to mine your data. There is a popular folklore “No Free Lunch” (NFL Theorem) that states no algorithm is better to solve all the problems!

Continue Reading