What is the relation between Data Mining and Predictive Analytics? Well, if you know nothing about analytics, read my previous post about analytics definition. According to Wikipedia, Predictive analytics encompasses a variety of techniques from statistics, data mining and game theory that analyze current and historical facts to make predictions about future events. Meanwhile, SearchCRM.com defines predictive analytics as a branch of data mining concerned with the prediction of future probabilities and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other entity to predict future behavior. For example, an insurance company is likely to take into account potential driving safety predictors such as age, gender, and driving record when issuing car insurance policies.
Multiple predictors are combined into a predictive model, which, when subjected to analysis, can be used to forecast future probabilities with an acceptable level of reliability. In predictive modeling, data is collected, a statistical model is formulated, predictions are made and the model is validated (or revised) as additional data becomes available.
In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.
One of the most well-known predictive analytic applications is credit scoring, which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rank-order individuals by their likelihood of making future credit payments on time.
Predictive Analytics Techniques
There are some basic and more complex predictive analytics techniques, including:
Data profiling and transformations are functions that analyze row and column attributes and dependencies, change data formats, merge fields, aggregate records, and join rows and columns.
Sequential pattern analysis discovers relationships between rows of data. Sequential pattern analysis is used to identify frequently observed sequential occurrence of items across ordered transactions over time. Such a frequently observed sequential occurrence of items (called a sequential pattern) must satisfy a user-specified minimum support. Understanding long-term customer purchase behavior is an example of the sequential pattern analysis. Other examples include customer shopping sequences, click-stream sessions, and telephone calling patterns.
Time series tracking tracks metrics that represent key behaviors or business strategies. It is an ordered sequence of values of a variable at equally spaced time intervals. Time series analysis accounts for the fact that data points taken over time may have an internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for. Examples include patterning customer sales that indicate product satisfaction and buying habits, budgetary analysis, stock market analysis, census analysis, and workforce projections.
Time series forecasting predicts the future value of a measure based on past values. Time series forecasting uses a model to forecast future events based on known past events. Examples include stock prices and sales revenue.
Data profiling and transformation uses functions that analyze row and column attributes and dependencies, change data formats, merge fields, aggregate records, and join rows and columns.
Bayesian analytics capture the concepts used in probability forecasting. It is a statistical procedure which estimate parameters of an underlying distribution based on the observed distribution. An example is used in a court setting by an individual juror to coherently accumulate the evidence for and against the guilt of the defendant, and to see whether, in totality, it meets their threshold for ‘beyond a reasonable doubt’.
Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator seeks to ascertain the causal effect of one variable upon another-the effect of a price increase upon demand, for example, or the effect of changes in the money supply upon the inflation rate.
Classification used attributes in data to assign an object to a predefined class or predict the value of a numeric variable of interest. Examples include credit risk analysis, likelihood to purchase. Examples include acquisition, cross-sell, attrition, credit scoring and collections.
Clustering or segmentation separates data into homogeneous subgroups based on attributes. Clustering assigns a set of observations into subsets (clusters) so that observations in the same cluster are similar. An example is customer demographic segmentation.
Dependency or association analysis describes significant associations between data items. An example is market basket analysis. Market basket analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.
Simulation models a system structure to estimate the impact of management decisions or changes. Simulation model behavior will change in each simulation according to the set of initial parameters assumed for the environment. Examples include inventory reorder policies, currency hedging, military training.
Optimization models a system structure in terms of constraints to find the best possible solution. Optimization models form part of a larger system which people use to help them make decisions. The user is able to influence the solutions which the model produces and reviews them before making a final decision as to what to do. Examples include scheduling of shift workers, routing of train cargo, and pricing airline seats.