What Is the Difference Between Data Mining and Data Warehousing?

The terms data mining and data warehousing are often confused by both business and technical staff. The entire field of data management has experienced a phenomenal growth with the implementation of data collection software programs and the decreased cost of computer memory. The primary purpose behind both these functions is to provide the tools and methodologies to explore the patterns and meaning in large amount of data.

The primary differences between data mining and data warehousing are the system designs, methodology used, and the purpose. Data mining is the use of pattern recognition logic to identity trends within a sample data set and extrapolate this information against the larger data pool. Data warehousing is the process of extracting and storing data to allow easier reporting.

Data mining is a general term used to describe a range of business processes that derive patterns from data. Typically, a statistical analysis software package is used to identify specific patterns, based on the data set and queries generated by the end user. A typical use of data mining is to create targeted marketing programs, identify financial fraud, and to flag unusual patterns in behavior as part of a security review.

An excellent example of data mining is the process used by telephone companies to market products to existing customers. The telephone company uses data mining software to access its database of customer information. A query is written to identify customers who have subscribed to the basic phone package and the Internet service over a specific time frame. Once this data set is selected, another query is written to determine how many of these customers took advantage of free additional phone features during a trial promotion. The results of this data mining exercise reveal patterns of behavior that can drive or help refine a marketing plan to increase the use of additional telephone services.

It is important to note that the primary purpose of data mining is to spot patterns in the data. The specifications used to define the sample set has a huge impact on the relevance of the output and the accuracy of the analysis. Returning to the example above, if the data set is limited to customers within a specific geographical area, the results and patterns will differ from a broader data set. Although both data mining and data warehousing work with large volumes of information, the processes used are quite different.

A data warehouse is a software product that is used to store large volumes of data and run specifically designed queries and reports. Business intelligence is a growing field of study that focuses on data warehousing and related functionality. These tools are designed to extract data and store it in a method designed to provide enhanced system performance. Much of the terminology in data mining and data warehousing are the same, leading to more confusion.

Data warehousing tools included in a standard software package can be divided into four primary categories: data extraction, table management, query management, and data integrity. A data warehouse is a repository for large sets of transactional data. The data in the warehouse varies widely, depending on the discipline and the focus of the organization. For example, many scientific research projects collect huge amounts of data for analysis and review. A data warehouse may be the best technology to manage and store this information.

A data warehouse requires a method of adding data to the warehouse. An extraction, transform, and load (ETL) tool is typically used for this purpose. The tool itself is a software program used to correctly identify the appropriate information from another computer system, based on the user’s criteria. This data may need to be normalized or modified for consistency or to match the warehouse database structure. Loading the data is critical, as all the relationships and connections to other databases must be maintained to ensure the integrity of the database, so it can be used with other data warehousing tools.

Every data warehouse contains a vast number of database tables. These tables are organized to work with each other in a logical, systematic way. The maintenance of these tables is essential to the continuing operation and accuracy of the data warehouse. Using the concept of relational databases, these tables must be maintained and validated on a regular basis. Any faults or failures will result in inaccurate reporting.

A query is simply a programmed question or report request. There is an entire business process surrounding the creation of a data warehouse query. This process requires in-depth knowledge and understanding of the business needs, as well as the data structures within the data warehouse. Business intelligence specialists are trained professionals who have the combination of skills and training necessary to create and manage multiple, customized queries.

Continue Reading

Web Mining Tasks

Web mining means employing the technique of data mining into the documents on the net. Web mining can be used for studying varied aspects of a site can recognize the patterns and relationships in the user behavior so as to get the insight in crucial information. For instance, In order to improvise accessibility quotient of the website, you surely need to have the knowledge of the crucial points that have to be improvised. The web mining service gives you required results. Web mining takes into account, the IP address of the visitors of the website, cookies, browser logs, etc.

These web mining tools examine and evaluate the logs and process the same accordingly for producing it into understandable and meaningful information. For Instance, different bits of information could be analyzed for tracking the browsing the route of the visitors of the website. This will, in turn help you in devising the methods for making your website very effective.

The overall process of web mining includes extraction of information from the net through the conventional practices of the data mining and putting the same into the website features.

Types of Web Mining
As we mentioned earlier, web mining assists in discovery of information and finds the related documents and data, and also identifies the trends and the patterns to confirm the efficiency of the web resources. Basically the web mining is of three types:

Web structure mining
This involves the usage of graph theory for analyzing the connections and node structure of the website. According to the type and nature of the data of the web structure, it is again divided into two kinds:

Extraction of patterns from the hyperlinks on the net: The hyperlink is structural form of web address connecting a web page to some other location.
Mining of the structure of the document: The tree like structure gets used for analyzing and describing the XHTML or the HTML tags in the web page.
Web Usage mining process
In the web usage mining process, the techniques of data mining are applied so as to discover the trends and the patterns in the browsing nature of the visitors of the website. There is extraction of the navigation patterns as the browsing patterns could be traced and the structure of the website can be designed accordingly. For example, a particular feature of website that is used by the visitors frequently, then you must look forward to enhance and pronounce so as to increase the usage that can appeal more to users of the website. This kind of mining makes use of accesses and logs of the web. Simply by understanding the movement of the guests and the behavior of surfing the net, you can look forward to meet the preferences and the needs in a better manner and popularize your website among the masses in the internet arena.

Web Content Mining
Such kind of mining process attempts to discover all links of the hyperlinks in a document so as to generate the structural report on a web page. The information regarding the different facets, for instance, if the users are in a position to find the information, if the structure of the website is too shallow or deep, whether the elements of the web page are correctly placed, the least visited and the most visited website areas and whether they have something to do with page design, etc. Such kinds of things are analyzed and evaluated for deep research.

For More Information about Data Minining click here

Continue Reading

Data Mining vs Screen-Scraping

Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts. In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll elaborate a bit.

The term “screen-scraping” comes from the old mainframe terminal days where people worked on computers with green and black screens containing only text. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Fast-forwarding to the web world of today, screen-scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, on the other hand, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

The difficulty is that people who don’t know the term “screen-scraping” will try Googling for anything that resembles it. We include a number of these terms on our web site to help such folks; for example, we created pages entitled Text Data Mining, Automated Data Collection, Web Site Data Extraction, and even Web Site Ripper (I suppose “scraping” is sort of like “ripping”). So it presents a bit of a problem–we don’t necessarily want to perpetuate a misconception (i.e., screen-scraping = data mining), but we also have to use terminology that people will actually use.

Continue Reading