My Data Mining Resources

A Compilation of the Web's Best Resources for Data Mining geeks. Specialized on algorithms, predictive analytic, business intelligence, data visualization, software and tools.

What Is the Difference Between Data Mining and Data Warehousing?

The terms data mining and data warehousing are often confused by both business and technical staff. The entire field of data management has experienced a phenomenal growth with the implementation of data collection software programs and the decreased cost of computer memory. The primary purpose behind both these functions is to provide the tools and methodologies to explore the patterns and meaning in large amount of data.

The primary differences between data mining and data warehousing are the system designs, methodology used, and the purpose. Data mining is the use of pattern recognition logic to identity trends within a sample data set and extrapolate this information against the larger data pool. Data warehousing is the process of extracting and storing data to allow easier reporting.

Data mining is a general term used to describe a range of business processes that derive patterns from data. Typically, a statistical analysis software package is used to identify specific patterns, based on the data set and queries generated by the end user. A typical use of data mining is to create targeted marketing programs, identify financial fraud, and to flag unusual patterns in behavior as part of a security review.

An excellent example of data mining is the process used by telephone companies to market products to existing customers. The telephone company uses data mining software to access its database of customer information. A query is written to identify customers who have subscribed to the basic phone package and the Internet service over a specific time frame. Once this data set is selected, another query is written to determine how many of these customers took advantage of free additional phone features during a trial promotion. The results of this data mining exercise reveal patterns of behavior that can drive or help refine a marketing plan to increase the use of additional telephone services.

It is important to note that the primary purpose of data mining is to spot patterns in the data. The specifications used to define the sample set has a huge impact on the relevance of the output and the accuracy of the analysis. Returning to the example above, if the data set is limited to customers within a specific geographical area, the results and patterns will differ from a broader data set. Although both data mining and data warehousing work with large volumes of information, the processes used are quite different.

A data warehouse is a software product that is used to store large volumes of data and run specifically designed queries and reports. Business intelligence is a growing field of study that focuses on data warehousing and related functionality. These tools are designed to extract data and store it in a method designed to provide enhanced system performance. Much of the terminology in data mining and data warehousing are the same, leading to more confusion.

Data warehousing tools included in a standard software package can be divided into four primary categories: data extraction, table management, query management, and data integrity. A data warehouse is a repository for large sets of transactional data. The data in the warehouse varies widely, depending on the discipline and the focus of the organization. For example, many scientific research projects collect huge amounts of data for analysis and review. A data warehouse may be the best technology to manage and store this information.

A data warehouse requires a method of adding data to the warehouse. An extraction, transform, and load (ETL) tool is typically used for this purpose. The tool itself is a software program used to correctly identify the appropriate information from another computer system, based on the user’s criteria. This data may need to be normalized or modified for consistency or to match the warehouse database structure. Loading the data is critical, as all the relationships and connections to other databases must be maintained to ensure the integrity of the database, so it can be used with other data warehousing tools.

Every data warehouse contains a vast number of database tables. These tables are organized to work with each other in a logical, systematic way. The maintenance of these tables is essential to the continuing operation and accuracy of the data warehouse. Using the concept of relational databases, these tables must be maintained and validated on a regular basis. Any faults or failures will result in inaccurate reporting.

A query is simply a programmed question or report request. There is an entire business process surrounding the creation of a data warehouse query. This process requires in-depth knowledge and understanding of the business needs, as well as the data structures within the data warehouse. Business intelligence specialists are trained professionals who have the combination of skills and training necessary to create and manage multiple, customized queries.

source: http://www.wisegeek.com/what-is-the-difference-between-data-mining-and-data-warehousing.htm

Other Related Blogs

Web Hosting
https://ua.siteground.com/img/banners/general/comfort/468x60.gifDigg ThisSubmit to reddit

3 Responses to “What Is the Difference Between Data Mining and Data Warehousing?”

  1. [...] data warehouse trend is on the risen. For the beginner, you can read my previous post about “Data Mining vs Data Warehouse” to get an idea between their [...]

  2. Understanding the differences between data mining & data warehouse
    Data Mining:
    Sifting through very large amounts of data for useful information. Data mining uses artificial intelligence techniques, neural networks, and advanced statistical tools to reveal trends, patterns, and relationships, which might otherwise have remained undetected. In contrast to an expert system data mining attempts to discover hidden rules underlying the data. Also called data surfing.

    Data mining parameters or tools Include:
    • Sequence or path analysis – looking for patterns where one event leads to another later event
    • Classification – looking for new patterns (May result in a change in the way the data is organized
    • Clustering – finding and visually documenting groups of facts not previously known
    • Forecasting – discovering patterns in data that can lead to reasonable predictions about the future This area of data mining is known as predictive analytics.)

    Data mining techniques are used in a many research areas, including mathematics, cybernetics, genetics and marketing. Web mining, a type of data mining used in customer relationship management (CRM), takes advantage of the huge amount of information gathered by a Web site to look for patterns in user behavior.
    Data warehousing
    The process for assembling and managing data from various sources for the purpose of gaining a single of an interprise. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.
    A data warehouse maintains its functions in three layers:
    Staging :is used to store raw data for use by developers.
    The integration layer: is used to integrate data and to have a level of abstraction from users.
    The access layer: is for getting data out for users.
    Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.
    Data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.


  3. John

    Thanks for sharing your knowledge. Excellent explanation, clear and concise.

Leave a Reply

  • Real Time Data Mining on the Web