Predictive analytics: Difference between revisions

Content deleted Content added
m added s
rm COI / selfcite
 
(47 intermediate revisions by 32 users not shown)
Line 2:
{{More citations needed|date=June 2011}}
 
'''Predictive analytics''' isencompasses a formvariety of [[business analyticsStatistics|statistical]] applyingtechniques [[machine learning]] to generate afrom [[predictivedata modelmining]] for certain [[business]] applications. As such, it encompasses a variety of [[Statistics|statistical]] techniques from [[Predictive modelling|predictive modeling]], and [[machine learning]] that analyze current and historical facts to make [[prediction]]s about future or otherwise unknown events.<ref name=":52">{{Cite web |title=To predict or not to Predict |url=https://mccoy-partners.com/updates/to-predict-or-not-to-predict |access-date=2022-05-05 |website=mccoy-partners.com}}</ref> It represents a major subset of [[machine learning]] applications; in some contexts, it is synonymous with [[machine learning]].<ref>{{Cite book |last=Siegel |first=Eric |title=Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (1st ed.) |publisher=[[Wiley (publisher)|Wiley]] |year=2013 |isbn=978-1-1183-5685-2 |language=English}}</ref>
 
In business, predictive models exploit [[Pattern detection|patterns]] found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding [[decision-making]] for candidate transactions.<ref>{{Cite book |last=Coker |first=Frank |title=Pulse: Understanding the Vital Signs of Your Business (1st ed.) |___location=Bellevue, WA |publisher=Ambient Light Publishing |year=2014 |isbn=978-0-9893086-0-1 |pages=30, 39, 42, more}}</ref>
Line 9:
 
== Definition ==
{{generalize-section|date=December 2024}}
Predictive analytics is a set of [[business intelligence]] (BI) technologies that uncovers relationships and patterns within large volumes of data that can be used to predict behavior and events. Unlike other BI technologies, predictive analytics is forward-looking, using past events to anticipate the future.<ref name=":4">{{Cite web |last=Eckerson |first=Wayne, W |date=2007 |title=Predictive Analytics. Extending the Value of Your Data Warehousing Investment |url=http://download.101com.com/pub/tdwi/files/pa_report_q107_f.pdf}}</ref> Predictive analytics statistical techniques include [[data modeling]], [[machine learning]], [[Artificial intelligence|AI]], [[deep learning]] algorithms and [[data mining]]. Often the unknown event of interest is in the future, but predictive analytics can be applied to any type of unknown whether it be in the past, present or future. For example, identifying suspects after a crime has been committed, or credit card fraud as it occurs.<ref>{{Cite book |last=Finlay |first=Steven |title=Predictive Analytics, Data Mining and Big Data. Myths, Misconceptions and Methods (1st ed.) |publisher=[[Palgrave Macmillan]] |year=2014 |isbn=978-1137379276 |___location=Basingstoke |pages=237 |language=English}}</ref> The core of predictive analytics relies on capturing relationships between [[explanatory variable]]s and the predicted variables from past occurrences, and exploiting them to predict the unknown outcome. It is important to note, however, that the accuracy and usability of results will depend greatly on the level of data analysis and the quality of assumptions.<ref name=":52" />
 
Predictive analytics is often defined as predicting at a more detailed level of granularity, i.e., generating predictive scores (probabilities) for each individual organizational element. This distinguishes it from [[forecasting]]. For example, "Predictive analytics—Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions."<ref>{{Cite book |last=Siegel |first=Eric |title=Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (1st ed.) |publisher=[[Wiley (publisher)|Wiley]] |year=2013 |isbn=978-1-1183-5685-2 |language=English}}</ref> In future industrial systems, the value of predictive analytics will be to predict and prevent potential issues to achieve near-zero break-down and further be integrated into [[prescriptive analytics]] for decision optimization.<ref>{{Cite book |last=Spalek |first=Seweryn |title=Data Analytics in Project Management |publisher=Taylor & Francis Group, LLC |year=2019 |language=English}}</ref>
 
== Analytical techniques ==
The approaches and techniques used to conduct predictive analytics can broadly be grouped into regression techniques and machine learning techniques.
 
=== Machine Learninglearning ===
{{mainMain|Machine Learninglearning}}
Machine learning can be defined as the ability of a machine to learn and then mimic human behavior that requires intelligence. This is accomplished through artificial intelligence, algorithms, and models.<ref>{{Cite web |title=Machine learning, explained |url=https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained |access-date=2022-05-06 |website=MIT Sloan |date=21 April 2021 |language=en}}</ref>
 
=== Autoregressive Integrated Moving Average (ARIMA) ===
{{main|ARIMA}}
 
==== Autoregressive Integrated Moving Average (ARIMA) ====
ARIMA models are a common example of time series models. These models use autoregression, which means the model can be fitted with a regression software that will use machine learning to do most of the regression analysis and smoothing. ARIMA models are known to have no overall trend, but instead have a variation around the average that has a constant amplitude, resulting in statistically similar time patterns. Through this, variables are analyzed and data is filtered in order to better understand and predict future values.<ref name=":0">{{Cite journal |last=Kinney |first=William R. |date=1978 |title=ARIMA and Regression in Analytical Review: An Empirical Test |journal=The Accounting Review |volume=53 |issue=1 |pages=48–60 |jstor=245725 |issn=0001-4826}}</ref><ref>{{Cite web |title=Introduction to ARIMA models |url=https://people.duke.edu/~rnau/411arim.htm |access-date=2022-05-06 |website=people.duke.edu}}</ref>
 
One example of an ARIMA method is exponential smoothing models. Exponential smoothing takes into account the difference in importance between older and newer data sets, as the more recent data is more accurate and valuable in predicting future values. In order to accomplish this, exponents are utilized to give newer data sets a larger weight in the calculations than the older sets.<ref>{{Cite web |title=6.4.3. What is Exponential Smoothing? |url=https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc43.htm |access-date=2022-05-06 |website=www.itl.nist.gov}}</ref>
 
==== Time series models ====
{{main|Time series}}
 
Time series models are a subset of machine learning that utilize time series in order to understand and forecast data using past values. A time series is the sequence of a variable's value over equally spaced periods, such as years or quarters in business applications.<ref>{{Cite web |title=6.4.1. Definitions, Applications and Techniques |url=https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm |access-date=2022-05-06 |website=www.itl.nist.gov}}</ref> To accomplish this, the data must be smoothed, or the random variance of the data must be removed in order to reveal trends in the data. There are multiple ways to accomplish this.
 
===== MovingSingle moving average =====
{{main|Moving average}}
 
Single moving average methods utilize smaller and smaller numbered sets of past data to decrease error that is associated with taking a single average, making it a more accurate average than it would be to take the average of the entire data set.<ref>{{Cite web |title=6.4.2.1. Single Moving Average |url=https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc421.htm |access-date=2022-05-06 |website=www.itl.nist.gov}}</ref>
 
===== Centered moving average =====
Centered moving average methods utilize the data found in the single moving average methods by taking an average of the median-numbered data set. However, as the median-numbered data set is difficult to calculate with even-numbered data sets, this method works better with odd-numbered data sets than even.<ref>{{Cite web |title=6.4.2.2. Centered Moving Average |url=https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc422.htm |access-date=2022-05-06 |website=www.itl.nist.gov}}</ref>
 
=== Predictive modeling ===
{{mainMain|Predictive modelingmodelling}}
Predictive Modelingmodeling is a statistical technique used to predict future behavior. It utilizes predictive models to analyze a relationship between a specific unit in a given sample and one or more features of the unit. The objective of these models is to assess the possibility that a unit in another sample will display the same pattern. Predictive model solutions can be considered a type of data mining technology. The models can analyze both historical and current data and generate a model in order to predict potential future outcomes.<ref name=":1">{{Cite book |last1=McCarthy |first1=Richard |title=Applying Predictive Analytics: Finding Value in Data |last2=McCarthy |first2=Mary |last3=Ceccucci |first3=Wendy |publisher=Springer |year=2021}}</ref>
 
Regardless of the methodology used, in general, the process of creating predictive models involves the same steps. First, it is necessary to determine the project objectives and desired outcomes and translate these into predictive analytic objectives and tasks. Then, analyze the source data to determine the most appropriate data and model building approach (models are only as useful as the applicable data used to build them). Select and transform the data in order to create models. Create and test models in order to evaluate if they are valid and will be able to meet project goals and metrics. Apply the model's results to appropriate business processes (identifying patterns in the data doesn't necessarily mean a business will understand how to take advantage or capitalize on it). Afterward, manage and maintain models in order to standardize and improve performance (demand will increase for model management in order to meet new compliance regulations).<ref name=":4" />
 
=== Regression analysis ===
{{mainMain|Regression analysis}}
Generally, regression analysis uses structural data along with the past values of independent variables and the relationship between them and the dependent variable to form predictions.<ref name=":0" />
 
==== Linear regression ====
{{mainMain|Linear regression}}
In [[linear regression]], a plot is constructed with the previous values of the dependent variable plotted on the Y-axis and the independent variable that is being analyzed plotted on the X-axis. A regression line is then constructed by a statistical program representing the relationship between the independent and dependent variables which can be used to predict values of the dependent variable based only on the independent variable. With the regression line, the program also shows a slope intercept equation for the line which includes an addition for the error term of the regression, where the higher the value of the error term the less precise the regression model is. In order to decrease the value of the error term, other independent variables are introduced to the model, and similar analyses are performed on these independent variables.<ref name=":0" /><ref>{{Cite web |title=Linear Regression |url=http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm |access-date=2022-05-06 |website=www.stat.yale.edu}}</ref>
 
== Applications ==
 
=== Analytical Review and Conditional Expectations in Auditing ===
{{mainMain|ARIMA}}
 
 
An important aspect of auditing includes analytical review. In analytical review, the reasonableness of reported account balances being investigated is determined. Auditors accomplish this process through predictive modeling to form predictions called conditional expectations of the balances being audited using autoregressive integrated moving average (ARIMA) methods and general regression analysis methods,<ref name=":0" /> specifically through the Statistical Technique for Analytical Review (STAR) methods.<ref name=":3">{{Cite journal |last1=Kinney |first1=William R. |last2=Salamon |first2=Gerald L. |date=1982 |title=Regression Analysis in Auditing: A Comparison of Alternative Investigation Rules |journal=Journal of Accounting Research |volume=20 |issue=2 |pages=350–366 |doi=10.2307/2490745 |jstor=2490745 |issn=0021-8456}}</ref>
 
Line 77 ⟶ 70:
 
=== Child protection ===
Some child welfare agencies have started using predictive analytics to flag high risk cases.<ref>{{Cite web |last=Reform |first=Fostering |date=2016-02-03 |title=New Strategies Long Overdue on Measuring Child Welfare Risk |url=https://imprintnews.org/blogger-co-op/new-strategies-long-overdue-measuring-child-welfare-risk/15442 |access-date=2022-05-03 |website=The Imprint |language=en-US}}</ref> For example, in [[Hillsborough County, Florida]], the child welfare agency's use of a predictive modeling tool has prevented abuse-related child deaths in the target population.<ref>{{Cite journal |date=2016 |title=Within Our Reach: A National Strategy to Eliminate Child Abuse and Neglect Fatalities |url=https://www.acf.hhs.gov/sites/default/files/documents/cb/cecanf_final_report.pdf |archive-url=https://web.archive.org/web/20210614092123/https://www.acf.hhs.gov/sites/default/files/documents/cb/cecanf_final_report.pdf |url-status=dead |archive-date=June 14, 2021 |journal=Commission to Eliminate Child Abuse and Neglect Fatalities}}</ref>
 
=== Predicting outcomes of legal decisions ===
Line 83 ⟶ 76:
 
=== Portfolio, product or economy-level prediction ===
Often the focus of analysis is not the consumer but the product, portfolio, firm, industry or even the economy. For example, a retailer might be interested in predicting store-level demand for inventory management purposes. Or the Federal Reserve Board might be interested in predicting the unemployment rate for the next year. These types of problems can be addressed by predictive analytics using time series techniques (see below). They can also be addressed via machine learning approaches which transform the original time series into a feature vector space, where the learning algorithm finds patterns that have predictive power.<ref>{{Cite journal |last=Dhar |first=Vasant |date=May 6, 2011 |title=Prediction in financial markets: The case for small disjuncts |url=https://dl.acm.org/doi/10.1145/1961189.1961191 |journal=ACM Transactions on Intelligent Systems and Technology |language=en |volume=2 |issue=3 |pages=1–22 |doi=10.1145/1961189.1961191 |s2cid=11213278 |issn=2157-6904|url-access=subscription }}</ref><ref>{{Cite journal |last1=Dhar |first1=Vasant |last2=Chou |first2=Dashin |last3=Provost |first3=Foster |date=2000-10-01 |title=Discovering Interesting Patterns for Investment Decision Making with GLOWER ◯-A Genetic Learner Overlaid with Entropy Reduction |journal=Data Mining and Knowledge Discovery |volume=4 |issue=4 |pages=251–280 |doi=10.1023/A:1009848126475 |s2cid=1982544 |issn=1384-5810}}</ref>
 
=== Underwriting ===
Many businesses have to account for risk exposure due to their different services and determine the costs needed to cover the risk. Predictive analytics can help [[underwrite]] these quantities by predicting the chances of illness, [[Default (finance)|default]], [[bankruptcy]], etc. Predictive analytics can streamline the process of customer acquisition by predicting the future risk behavior of a customer using application level data. Predictive analytics in the form of credit scores have reduced the amount of time it takes for loan approvals, especially in the mortgage market. Proper predictive analytics can lead to proper pricing decisions, which can help mitigate future risk of default. Predictive analytics can be used to mitigate moral hazard and prevent accidents from occurring.<ref>{{Cite journal |last1=Montserrat |first1=Guillen |last2=Cevolini |first2=Alberto |date=November 2021 |title=Using Risk Analytics to Prevent Accidents Before They Occur – The Future of Insurance |url=https://www.capco.com/Capco-Institute/Journal-54-Insurance/Using-Risk-Analytics-To-Prevent-Accidents-Before-They-Occur-The-Future-Of-Insurance |journal=Journal of Financial Transformation}}</ref>
 
=== Policing ===
Police agencies are now utilizing proactive strategies for crime prevention. Predictive analytics, which utilizes statistical tools to forecast crime patterns, provides new ways for police agencies to mobilize resources and reduce levels of crime.<ref>{{Cite journal |last1=Towers |first1=Sherry |last2=Chen |first2=Siqiao |last3=Malik |first3=Abish |last4=Ebert |first4=David |date=2018-10-24 |editor-last=Eisenbarth |editor-first=Hedwig |title=Factors influencing temporal patterns in crime in a large American city: A predictive analytics perspective |journal=PLOS ONE |language=en |volume=13 |issue=10 |pages=e0205151 |doi=10.1371/journal.pone.0205151 |issn=1932-6203 |pmc=6200217 |pmid=30356321 |bibcode=2018PLoSO..1305151T |doi-access=free }}</ref> With this predictive analytics of crime data, the police can better allocate the limited resources and manpower to prevent more crimes from happening. Directed patrol or problem-solving can be employed to protect crime hot spots, which exhibit crime densities much higher than the average in a city.<ref>{{Cite journal |last1=Fitzpatrick |first1=Dylan J. |last2=Gorr |first2=Wilpen L. |last3=Neill |first3=Daniel B. |date=2019-01-13 |title=Keeping Score: Predictive Analytics in Policing |url=https://www.annualreviews.org/doi/10.1146/annurev-criminol-011518-024534 |journal=Annual Review of Criminology |language=en |volume=2 |issue=1 |pages=473–491 |doi=10.1146/annurev-criminol-011518-024534 |s2cid=169389590 |issn=2572-4568}}</ref>
 
=== '''Sports''' ===
Several firms have emerged specializing in predictive analytics in the field of professional sports for both teams and individuals.<ref>{{Cite web |title=Free AI Sports Picks & Predictions for Today's Games |url=https://leans.ai/ |access-date=2023-07-08 |website=LEANS.AI |language=en-US}}</ref> While predicting human behavior creates a wide variance due to many factors that can change after predictions are made, including injuries, officiating, coaches decisions, weather, and more, the use of predictive analytics to project long term trends and performance is useful. Much of the field was started by the Moneyball concept of [[Billy Beane]] near the turn of the century, and now most professional sports teams employ their own analytics departments.
 
== See also ==
Line 124 ⟶ 111:
* {{cite book |last1=Guidère |first1=Mathieu |last2=Howard |first2=N |last3=Argamon |first3=Sh. |author3-link=Shlomo Argamon |title=Rich Language Analysis for Counterterrorism |___location=Berlin, London, New York |publisher=Springer-Verlag |year=2009 |isbn=978-3-642-01140-5}}
* {{cite book |last=Mitchell |first=Tom |title=Machine Learning |___location=New York |publisher=[[McGraw-Hill]] |year=1997 |isbn=0-07-042807-7}}
* {{cite book |last=Siegel |first=Eric |title=Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die |publisher=[[John Wiley & Sons]] |year=2016 |isbn=978-1119145677}}
* {{cite book |last=Tukey |first=John |title=Exploratory Data Analysis |___location=New York |publisher=Addison-Wesley |year=1977 |isbn=0-201-07616-0 |url-access=registration |url=https://archive.org/details/exploratorydataa00tuke_0}}
{{refend}}
Line 136 ⟶ 122:
[[Category:Types of analytics]]
[[Category:Predictive analytics|*]]
[[Category:Management cybernetics]]