When To Use Genetic Algorithm For Data Mining Task?

You already got model(s) for your data but not sure whether the models are accurate enough for predictive data mining. Well, one of the way you can optimize your predictive model is through the use of Genetic Algorithm (one of the application of evolutionary computation). According to Wikipedia:

A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (EA) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover.

Currently, genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics,physics and other fields.

Read white paper about how to “Using Genetic Algorithms for Parameter Optimization in Building Predictive Data Mining Models“, which describes the problem of finding optimal predictive model building parameter as an optimization problem and examine the usefulness of genetic algorithms. They perform experiments on several datasets and report empirical results to show the applicability of genetic algorithms to the problem of finding optimal predictive model building parameters.

Continue Reading

Data Mining vs Web Mining

What is the difference between data mining and web mining? Well, one of the significant factor is the structure of the mining data. Common data mining applications discover patterns in a structured data such as database (i.e. DBMS). Web mining, likewise discover patterns in a less structured data such as Internet (WWW). In other words, we can say that Web Mining is Data Mining techniques applied to the WWW.

Types of Web Mining
Basically the web mining is of three types:

1. Web structure mining

This involves the usage of graph theory for analyzing the connections and node structure of the website. According to the type and nature of the data of the web structure, it is again divided into two kinds:

Extraction of patterns from the hyperlinks on the net: The hyperlink is structural form of web address connecting a web page to some other location.
Mining of the structure of the document: The tree like structure gets used for analyzing and describing the XHTML or the HTML tags in the web page.
2.Web Usage mining process
In the web usage mining process, the techniques of data mining are applied so as to discover the trends and the patterns in the browsing nature of the visitors of the website. There is extraction of the navigation patterns as the browsing patterns could be traced and the structure of the website can be designed accordingly. For example, a particular feature of website that is used by the visitors frequently, then you must look forward to enhance and pronounce so as to increase the usage that can appeal more to users of the website. This kind of mining makes use of accesses and logs of the web. Simply by understanding the movement of the guests and the behavior of surfing the net, you can look forward to meet the preferences and the needs in a better manner and popularize your website among the masses in the internet arena.

3.Web Content Mining
Such kind of mining process attempts to discover all links of the hyperlinks in a document so as to generate the structural report on a web page. The information regarding the different facets, for instance, if the users are in a position to find the information, if the structure of the website is too shallow or deep, whether the elements of the web page are correctly placed, the least visited and the most visited website areas and whether they have something to do with page design, etc. Such kinds of things are analyzed and evaluated for deep research.

Continue Reading

Readings on Data Mining for Big Data

Big Data has been an interesting topic in data mining community lately. As in today (17/3/10) there are about 240,000,000 pages for big data (broad search) in Google search. If you are new to big data, see visualization below about big data in wonder wheel to find out what related terms associated with it.

Further readings on Big Data can be found on these posts:
1. What is Big Data?

Big Data is the “modern scale” at which we are defining or data usage challenges. Big Data begins at the point where need to seriously start thinking about the technologies used to drive our information needs. While Big Data as a term seems to refer to volume this isn’t the case. Many existing technologies have little problem physically handling large volumes (TB or PB) of data. Instead the Big Data challenges result out of the combination of volume and our usage demands from that data. And those usage demands are nearly always tied to timeliness.

Big Data is therefore the push to utilize “modern” volumes of data within “modern” timeframes. The exact definitions are of course are relative & constantly changing, however right now this is somewhere along the path towards the end goal. This is of course the ability to handle an unlimited volume of data, processing all requests in real time.

2. Big Data Technologies

Some key points on the big data technologies are summarized in two extended clips:

Big Data Technologies (1:35 minutes)
Key Technology Dimensions (4:52 minutes)
3. Data Mining of Big Data

The Data Mining Renaissance – Hadoop, an open-source implementation of MapReduce.
Algorithms for Massive Data Set Analysis – algorithmic and statistical methods for large-scale data analysis (course)
Method for fast large scale data mining using logistic regression
4. Current and Future Trends of Big Data

The Pathologies of Big Data – discusses the problems and how to deals with big data.
The Future Is Big Data in the Cloud – talks about distributed, non-relational database systems (DNRDBMS) for tackling “Big Data stack”.
Big Data Is Less About Size, And More About Freedom – big data trend is about the democratization of large data.
Data Singularity – another way of handling big data!

Continue Reading