What You Should Know About Predictive Analytics?

What is the relation between Data Mining and Predictive Analytics? Well, if you know nothing about analytics, read my previous post about analytics definition. According to Wikipedia, Predictive analytics encompasses a variety of techniques from statistics, data mining and game theory that analyze current and historical facts to make predictions about future events. Meanwhile, SearchCRM.com defines predictive analytics as a branch of data mining concerned with the prediction of future probabilities and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other entity to predict future behavior. For example, an insurance company is likely to take into account potential driving safety predictors such as age, gender, and driving record when issuing car insurance policies.

Multiple predictors are combined into a predictive model, which, when subjected to analysis, can be used to forecast future probabilities with an acceptable level of reliability. In predictive modeling, data is collected, a statistical model is formulated, predictions are made and the model is validated (or revised) as additional data becomes available.

In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.

One of the most well-known predictive analytic applications is credit scoring, which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.

Predictive Analytics Techniques

There are some basic and more complex predictive analytics techniques, including:

Data profiling and transformations are functions that analyze row and column attributes and dependencies, change data formats, merge fields, aggregate records, and join rows and columns.
Sequential pattern analysis discovers relationships between rows of data. Sequential pattern analysis is used to identify frequently observed sequential occurrence of items across ordered transactions over time. Such a frequently observed sequential occurrence of items (called a sequential pattern) must satisfy a user-specified minimum support. Understanding long-term customer purchase behavior is an example of the sequential pattern analysis. Other examples include customer shopping sequences, click-stream sessions, and telephone calling patterns.
Time series tracking tracks metrics that represent key behaviors or business strategies. It is an ordered sequence of values of a variable at equally spaced time intervals. Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. Examples include patterning customer sales that indicate product satisfaction and buying habits, budgetary analysis, stock market analysis, census analysis, and workforce projections.
Time series forecasting predicts the future value of a measure based on past values. Time series forecasting uses a model to forecast future events based on known past events. Examples include stock prices and sales revenue.
Data profiling and transformation uses functions that analyze row and column attributes and dependencies, change data formats, merge fields, aggregate records, and join rows and columns.
Bayesian analytics capture the concepts used in probability forecasting. It is a statistical procedure which estimate parameters of an underlying distribution based on the observed distribution. An example is used in a court setting by an individual juror to coherently accumulate the evidence for and against the guilt of the defendant, and to see whether, in totality, it meets their threshold for ‘beyond a reasonable doubt’.
Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator seeks to ascertain the causal effect of one variable upon another-the effect of a price increase upon demand, for example, or the effect of changes in the money supply upon the inflation rate.
Classification used attributes in data to assign an object to a predefined class or predict the value of a numeric variable of interest. Examples include credit risk analysis, likelihood to purchase. Examples include acquisition, cross-sell, attrition, credit scoring and collections.
Clustering or segmentation separates data into homogeneous subgroups based on attributes. Clustering assigns a set of observations into subsets (clusters) so that observations in the same cluster are similar. An example is customer demographic segmentation.
Dependency or association analysis describes significant associations between data items. An example is market basket analysis. Market basket analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.
Simulation models a system structure to estimate the impact of management decisions or changes. Simulation model behavior will change in each simulation according to the set of initial parameters assumed for the environment. Examples include inventory reorder policies, currency hedging, military training.
Optimization models a system structure in terms of constraints to find the best possible solution. Optimization models form part of a larger system which people use to help them make decisions. The user is able to influence the solutions which the model produces and reviews them before making a final decision as to what to do. Examples include scheduling of shift workers, routing of train cargo, and pricing airline seats.

Continue Reading

What You Should Know About Predictive Analytics?

What is the relation between Data Mining and Predictive Analytics? Well, if you know nothing about analytics, read my previous post about analytics definition. According to Wikipedia, Predictive analytics encompasses a variety of techniques from statistics, data mining and game theory that analyze current and historical facts to make predictions about future events. Meanwhile, SearchCRM.com defines predictive analytics as a branch of data mining concerned with the prediction of future probabilities and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other entity to predict future behavior. For example, an insurance company is likely to take into account potential driving safety predictors such as age, gender, and driving record when issuing car insurance policies.

Multiple predictors are combined into a predictive model, which, when subjected to analysis, can be used to forecast future probabilities with an acceptable level of reliability. In predictive modeling, data is collected, a statistical model is formulated, predictions are made and the model is validated (or revised) as additional data becomes available.

In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.

One of the most well-known predictive analytic applications is credit scoring, which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.

Predictive Analytics Techniques

There are some basic and more complex predictive analytics techniques, including:

  1. Data profiling and transformations are functions that analyze row and column attributes and dependencies, change data formats, merge fields, aggregate records, and join rows and columns.
  2. Sequential pattern analysis discovers relationships between rows of data. Sequential pattern analysis is used to identify frequently observed sequential occurrence of items across ordered transactions over time. Such a frequently observed sequential occurrence of items (called a sequential pattern) must satisfy a user-specified minimum support. Understanding long-term customer purchase behavior is an example of the sequential pattern analysis. Other examples include customer shopping sequences, click-stream sessions, and telephone calling patterns.
  3. Time series tracking tracks metrics that represent key behaviors or business strategies. It is an ordered sequence of values of a variable at equally spaced time intervals. Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. Examples include patterning customer sales that indicate product satisfaction and buying habits, budgetary analysis, stock market analysis, census analysis, and workforce projections.
  4. Time series forecasting predicts the future value of a measure based on past values. Time series forecasting uses a model to forecast future events based on known past events. Examples include stock prices and sales revenue.
  5. Data profiling and transformation uses functions that analyze row and column attributes and dependencies, change data formats, merge fields, aggregate records, and join rows and columns.
  6. Bayesian analytics capture the concepts used in probability forecasting. It is a statistical procedure which estimate parameters of an underlying distribution based on the observed distribution. An example is used in a court setting by an individual juror to coherently accumulate the evidence for and against the guilt of the defendant, and to see whether, in totality, it meets their threshold for ‘beyond a reasonable doubt’.
  7. Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator seeks to ascertain the causal effect of one variable upon another-the effect of a price increase upon demand, for example, or the effect of changes in the money supply upon the inflation rate.
  8. Classification used attributes in data to assign an object to a predefined class or predict the value of a numeric variable of interest. Examples include credit risk analysis, likelihood to purchase. Examples include acquisition, cross-sell, attrition, credit scoring and collections.
  9. Clustering or segmentation separates data into homogeneous subgroups based on attributes. Clustering assigns a set of observations into subsets (clusters) so that observations in the same cluster are similar. An example is customer demographic segmentation.
  10. Dependency or association analysis describes significant associations between data items. An example is market basket analysis. Market basket analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.
  11. Simulation models a system structure to estimate the impact of management decisions or changes. Simulation model behavior will change in each simulation according to the set of initial parameters assumed for the environment. Examples include inventory reorder policies, currency hedging, military training.
  12. Optimization models a system structure in terms of constraints to find the best possible solution. Optimization models form part of a larger system which people use to help them make decisions. The user is able to influence the solutions which the model produces and reviews them before making a final decision as to what to do. Examples include scheduling of shift workers, routing of train cargo, and pricing airline seats.
Continue Reading

Video Compilation of Practical Data Mining Techniques

Nowadays, it is very difficult for new data mining beginner to watch free video tutorial of data mining techniques in a single place. Because of that reason, I have created new page in my site to showcase video compilation of practical data mining techniques using known software/tools available in the market today. The compilation videos are categorized into data mining techniques, which are classification; clustering; regression; neural network etc. I will try to update the list every week if new video is available in the web. If you know other sources of video tutorial using different software/tools not listed here then feel free to add in the comment section.
For an Overview of Data Mining Techniques, visit here.

1. Classification
Software: Rapidminer 5.0
This video shows how to import training and prediction data, add a classification learner, and apply the model.
Application: building a Gold Classification trend model.

Software: Rapidminer 5.0
This video shows how to use Rapidminer to create a decision tree to help us find “sweet spots” in a particular market segment. This video tutorial uses the Rapidminer direct mail marketing data generator and a split validation operator to build the decision tree.
Application: creating Decision Trees for Market Segmentation.

Software: STATISTICA
This video shows how to use CHAID decision trees to classify good and bad credit risk. CHAID decision trees are particularly well suited for large data sets and often find application in marketing segmentation. This session discusses the analysis options in STATISTICA and review CHAID output including the decision trees and performance indices.

2. Clustering
Software: STATISTICA
This video shows how to use clustering tools available in STATISTICA Data Miner and demonstrates the K-means clustering tool as well as the Kohonen network clustering tool.
Application: Clustering tools are beneficial when you want to find structure or clusters in data, but dont necessarily have a target variable. Clustering is used often in marketing research applications as well as many others.

Software: WEKA
This video shows how to use clustering algorithm available in WEKA and demonstrates the K-means clustering tool.
Application: building cluster model of bank customer based on mortgage application.

3. Neural Networks (NN)
– Neural Networks is a sophisticated modeling tool, capable of modeling very complex relationships in data.
Software: WEKA
This video shows how to use neural network functions available in WEKA to classify real weather data.
Application: building weather prediction model

4. Regression

This video explores the application of neural networks in a regression problem using STATISITCA Automated Neural Networks. The options used for regression are similar for other neural networks applications such as classification and time series. The episode explores various analysis options and demonstrates working with neural network output.

This video uses the regression data, beverage manufacturing, to explore C&RT as well as the other tree algorithms. The options and parameters are reviewed as well as important output.

5. Evolutionary/Genetic Algorithms
Software: Rapidminer 5.0
This video highlights the data generation capabilities for Rapidminer 5.0 if you want to tinker around, and how to use a Genetic Optimization data pre-processor within a nested nested experiment.

Software: Rapidminer 5.0
This video discusses some of the parameters that are available in the Genetic Algorithm data transformers to select the best attributes in the data set. We also replace the first operator with another Genetic Algorithm data transformer that allows us to manipulate population size, mutation rate, and change the selection schemes (tournament, roulette, etc).

Software: Rapidminer 5.0
This tutorial highlights Rapidminer’s weighting operator using an evolutionary approach. We use financial data to preweight inputs before we feed them into a neural network model to try to better classify a gold trend.

For More Information about Data Minining click here

Continue Reading