Screen Scraping vs Data Mining vs Web Mining

I know the topic got many ‘vs’ but I want to highlight the differences between all of them together. Currently there is a video on YouTube titled “screen scraping, web data mining, web data scraping” and I am calling to clarify the misleading topic. You can watch the video below:

Read my posts about “Data Mining vs Screen-Scraping” and “Data Mining vs Web Mining” to get the whole idea of the topic. I just want to highlight some of the main differences as below:

Screen scraping was used to extract characters from the screens so that they could be analyzed. Screen scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

Web mining, on the other hand, is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.

For More Information about Data Minining click here

Continue Reading

Top 10 Search Competition of Data Mining Algorithms in Google, Yahoo & Bing

Following to my earlier post titled “Data Mining Trends“, I would like to know the popularity of data mining algorithms in major search engines (Google, Yahoo & Bing). Using keyword research tool, I managed to pull out top 10 data mining algorithms in these search engines. FYI, for the search competition, I used setting such as “keyword exists anywhere” in the page document. I also add keyword “algorithm” to  each of the data mining algorithm to make it specific that we try to search “data mining algorithms”, not something else. Maybe my approach is wrong, if so please correct me, OK.

By the way, have a look at the result of the top 10 data mining algorithms:

Data Mining Algorithms Google Yahoo Bing
C4.5 ALGORITHM 18,100,000 1,060,000 562,000
REGRESSION ALGORITHM 11,900,000 5,810,000 1,260,000
APRIORI ALGORITHM 9,140,000 59,900 1,090,000
NEURAL NETWORK ALGORITHM 8,870,000 7,870,000 1,560,000
K-MEANS ALGORITHM 4,680,000 219,000 9,470,000
SUPPORT VECTOR MACHINE ALGORITHM 4,440,000 4,310,000 1,120,000
ID3 ALGORITHM 3,380,000 491,000 389,000
NEAREST NEIGHBORS ALGORITHM 2,370,000 763,000 530,000
GENETIC ALGORITHM 1,790,000 10,600,000 1,840,000
RIPPER ALGORITHM 487,000 1,650,000 350,000


Continue Reading

When To Use Genetic Algorithm For Data Mining Task?

You already got model(s) for your data but not sure whether the models are accurate enough for predictive data mining. Well, one of the way you can optimize your predictive model is through the use of Genetic Algorithm (one of the application of evolutionary computation). According to Wikipedia:

A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (EA) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover.

Currently, genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics,physics and other fields.

Read white paper about how to “Using Genetic Algorithms for Parameter Optimization in Building Predictive Data Mining Models“, which describes the problem of finding optimal predictive model building parameter as an optimization problem and examine the usefulness of genetic algorithms. They perform experiments on several datasets and report empirical results to show the applicability of genetic algorithms to the problem of finding optimal predictive model building parameters.

For More Information about Data Minining click here

Continue Reading