Box–Jenkins method: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 05:33, 19 May 2015 edit BG19bot (talk \| contribs) 1,005,055 edits m WP:CHECKWIKI error fix for #03. Missing Reflist. Do general fixes if a problem exists. - using AWB ← Previous edit		Latest revision as of 08:30, 10 February 2025 edit undo 121.58.232.91 (talk) wtf was that Tag: Manual revert
(44 intermediate revisions by 27 users not shown)
Line 1: {{Short description\|Method to find best fit of a time-series model}} In [[time series analysis]], the '''Box–Jenkins method''' ~~method~~,<ref>{{cite book \|last1=Box \|first1=George \|last2=Jenkins \|first2=Gwilym \|year=1970 \|title=Time Series Analysis: Forecasting and Control \|url=https://archive.org/details/timeseriesanalys0000boxg \|url-access=registration \|___location=San Francisco \|publisher=Holden-Day }}</ref> named after the [[statistician]]s [[George Box]] and [[Gwilym Jenkins]], applies ~~autoregressive moving average~~ [[~~Autoregressive~~autoregressive moving average~~\|ARMA~~]] (ARMA) or [[~~Autoregressive~~autoregressive integrated moving average~~\|ARIMA~~]] (ARIMA) models to find the best fit of a time-series model to past values of a [[time series]]. ==Modeling approach== The original model uses an iterative three-stage modeling approach: #''Model identification and [[model selection]]'': making sure that the variables are [[stationary process\|stationary]], identifying [[seasonality]] in the dependent series (seasonally differencing it if necessary), and using plots of the [[autocorrelation\|autocorrelation (ACF)]] and [[partial autocorrelation\|partial autocorrelation (PACF)]] functions of the dependent time series to decide which (if any) autoregressive or moving average component should be used in the model. #''[[Parameter estimation]]'' using computation algorithms to arrive at coefficients that best fit the selected ARIMA model. The most common methods use [[maximum likelihood estimation]] or [[non-linear least-squares estimation]]. #''[[Statistical model validation\|~~Model~~Statistical model checking]]'' by testing whether the estimated model conforms to the specifications of a stationary univariate process. In particular, the residuals should be independent of each other and constant in mean and variance over time. (Plotting the mean and variance of residuals over time and performing a [[Ljung–Box test]] or plotting autocorrelation and partial autocorrelation of the residuals are helpful to identify misspecification.) If the estimation is inadequate, we have to return to step one and attempt to build a better model. The data they used were from a gas furnace. These data are well known as the Box and Jenkins gas furnace data for benchmarking predictive models. Commandeur & Koopman (2007, §10.4)<ref>{{cite book \|last1=Commandeur \|first1=J. J. F. \|last2=Koopman \|first2=S. J. \|year=2007 \|title=Introduction to State Space Time Series Analysis \|publisher=[[Oxford University Press]] }}</ref> argue that the Box–Jenkins approach is fundamentally problematic. The problem arises because in "the economic and social fields, real series are never stationary however much differencing is done". Thus the investigator has to face the question: how close to stationary is close enough? As the authors note, "This is a hard question to answer". The authors further argue that rather than using Box–Jenkins, it is better to use state space methods, as stationarity of the time series is then not required. ==~~Box-Jenkins~~Box–Jenkins model identification== ===Stationarity and seasonality=== The first step in developing a Box–Jenkins model is to determine ifwhether the [[time series]] is [[Stationary process\|stationary]]. Ifand itwhether there is ~~not~~any ~~stationary~~significant ~~the~~[[seasonality]] ~~data~~that ~~set must be differenced~~needs to ~~attain~~be ~~stationarity~~modelled. ====Detecting stationarity==== Stationarity can be assessed from a [[run sequence plot]]. The run sequence plot should show constant ___location and [[Scale (ratio)\|scale]]. It can also be detected from an [[autocorrelation plot]]. Specifically, non-~~stationary data~~stationarity is often indicated by ~~patterns~~an ofautocorrelation ~~decay~~plot inwith ~~the~~very ~~autocorrelation~~slow ~~plot~~decay. ~~The~~One ~~Partial autocorrelation function should~~can also beutilize ~~viewed,~~a ~~together~~[[Dickey-Fuller ~~the shapes and spikes of the ACF and PACF will indicate what type~~test]] ofor [[~~Autoregressive~~Augmented ~~integrated~~Dickey-Fuller ~~moving average~~test]] ~~will best predict future forecasts~~. ====Detecting seasonality==== Line 29 ⟶ 30: ===Identify ''p'' and ''q''=== Once stationarity and seasonality have been addressed, the next step is to identify the order (i.e., the ''p'' and ''q'') of the autoregressive and moving average terms. Different authors have different approaches for identifying ''p'' and ''q''. Brockwell and Davis (1991,)<ref>{{cite pbook \|last1=Brockwell \|first1=Peter J.~~ ~~ \|last2=Davis \|first2=Richard A. \|year=1991 \|title=Time Series: Theory and Methods \|publisher=Springer-Verlag \|page=273)\|bibcode=1991tstm.book.....B }}</ref> state "our prime criterion for model selection [among ARMA(p,q) models] will be the AICc", i.e. the [[Akaike information criterion]] with correction. Other authors use the autocorrelation plot and the partial autocorrelation plot, described below. ~~Other authors use the autocorrelation plot and the partial autocorrelation plot. For example, Hyndman & Athanasopoulos suggest the following:~~ The data may follow an ARIMA(''p'',d,0) model if the ACF and PACF plots of the differenced data show the following patterns:▼ * the ACF is exponentially decaying or sinusoidal;▼ * there is a significant spike at lag ''p'' in PACF, but none beyond lag ''p''.▼ The data may follow an ARIMA(0,d,''q'') model if the ACF and PACF plots of the differenced data show the following patterns:▼ * the PACF is exponentially decaying or sinusoidal;▼ * there is a significant spike at lag ''q'' in ACF, but none beyond lag ''q''.<ref>{{cite web\|last1=Hyndman\|first1=Rob J\|last2=Athanasopoulos\|first2=George\|title=Forecasting: principles and practice\|url=https://www.otexts.org/fpp/8/5\|accessdate=18 May 2015}}</ref>▼ ====Autocorrelation and partial autocorrelation plots==== Line 46 ⟶ 37: Specifically, for an [[AR(1)]] process, the sample autocorrelation function should have an exponentially decreasing appearance. However, higher-order AR processes are often a mixture of exponentially decreasing and damped sinusoidal components. For higher-order autoregressive processes, the sample autocorrelation needs to be supplemented with a partial autocorrelation plot. The partial autocorrelation of an AR(''p'') process becomes zero at lag ''p''  +  1 and greater, so we examine the sample partial autocorrelation function to see if there is evidence of a departure from zero. This is usually determined by placing a 95% [[confidence interval]] on the sample partial autocorrelation plot (most software programs that generate sample autocorrelation plots also plot this confidence interval). If the software program does not generate the confidence band, it is approximately <math>\pm 2/\sqrt{N}</math>, with ''N'' denoting the sample size. The autocorrelation function of a [[moving average model\|MA(''q'')]] process becomes zero at lag ''q''  +  1 and greater, so we examine the sample autocorrelation function to see where it essentially becomes zero. We do this by placing the 95% confidence interval for the sample autocorrelation function on the sample autocorrelation plot. Most software that can generate the autocorrelation plot can also generate this confidence interval. The sample partial autocorrelation function is generally not helpful for identifying the order of the moving average process. Line 63 ⟶ 54: \| Autoregressive model. Use the partial autocorrelation plot to help identify the order. \|- ! One or more spikes, rest are essentially zero (or close to zero) \| [[Moving average model]], order identified by where plot becomes zero. \|- Line 75 ⟶ 66: \| Include seasonal autoregressive term. \|- ! No decay to zero (or it decays extremely slowly) \| Series is not stationary. \|} ▲Hyndman ~~there~~& isAthanasopoulos asuggest ~~significant~~the ~~spike at lag ''q'' in ACF, but none beyond lag ''q''.~~following:<ref>{{cite ~~web~~book\|last1=Hyndman\|first1=Rob J\|last2=Athanasopoulos\|first2=George\|title=Forecasting: principles and practice\|url=https://www.otexts.org/fpp/8/5\|~~accessdate~~access-date=18 May 2015}}</ref> ▲:The data may follow an ARIMA(''p'',''d'',0) model if the ACF and PACF plots of the differenced data show the following patterns: ▲: the ACF is exponentially decaying or sinusoidal; ▲:* there is a significant spike at lag ''p'' in PACF, but none beyond lag ''p''. ▲:The data may follow an ARIMA(0,''d'',''q'') model if the ACF and PACF plots of the differenced data show the following patterns: ▲:* the PACF is exponentially decaying or sinusoidal; :* there is a significant spike at lag ''q'' in ACF, but none beyond lag ''q''. In practice, the sample autocorrelation and partial autocorrelation functions are [[random variable]]s and do not give the same picture as the theoretical functions. This makes the model identification more difficult. In particular, mixed models can be particularly difficult to identify. Although experience is helpful, developing good models using these sample plots can involve much trial and error. ==Box–Jenkins model estimation== Estimating the parameters for Box–Jenkins models involves numerically approximating the solutions of nonlinear equations. For this reason, it is common to use statistical software designed to handle to the approach – virtually all modern statistical packages feature this capability. The main approaches to fitting Box–Jenkins models are ~~non-linear~~nonlinear least squares and maximum likelihood estimation. Maximum likelihood estimation is generally the preferred technique. The likelihood equations for the full Box–Jenkins model are complicated and are not included here. See (Brockwell and Davis, 1991) for the mathematical details.▼ Estimating the parameters for the Box–Jenkins models is a quite complicated non-linear estimation problem. For this reason, the parameter estimation should be left to a high quality software program that fits Box–Jenkins models. Fortunately, many statistical software programs now fit Box–Jenkins models. ▲The main approaches to fitting Box–Jenkins models are non-linear least squares and maximum likelihood estimation. Maximum likelihood estimation is generally the preferred technique. The likelihood equations for the full Box–Jenkins model are complicated and are not included here. See (Brockwell and Davis, 1991) for the mathematical details. ==Box–Jenkins model diagnostics== Line 95 ⟶ 94: If these assumptions are not satisfied, one needs to fit a more appropriate model. That is, go back to the model identification step and try to develop a better model. Hopefully the analysis of the residuals can provide some clues as to a more appropriate model. One way to assess if the residuals from the Box–Jenkins model follow the assumptions is to generate [[statistical graphics]] (including an autocorrelation plot) of the residuals. One could also look at the value of the [[~~Ljung-Box~~Ljung–Box test\|Box–Ljung statistic]]. ==References== {{Reflist}} * {{cite book \|last=Box \|first=George \|last2=Jenkins \|first2=Gwilym \|year=1970 \|title=Time Series Analysis: Forecasting and Control \|___location=San Francisco \|publisher=Holden-Day }} ==Further reading== * {{cite book \|last=Brockwell \|first=Peter J. \|last2=Davis \|first2=Richard A. \|year=1991 \|title=Time Series: Theory and Methods \|publisher=Springer-Verlag }} * {{citation \| title= Comparison of Box–Jenkins and objective methods for determining the order of a non-seasonal ARMA model \| author1-first= S. \| author1-last= Beveridge \| author2-first= C. \| author2-last= Oickle \| journal= [[Journal of Forecasting]] \| year= 1994 \| volume= 13 \| issue= 5 \| pages= 419–434 \| doi= 10.1002/for.3980130502}} * {{cite book \|last=Commandeur \|first=J. J. F. \|last2=Koopman \|first2=S. J. \|year=2007 \|title=Introduction to State Space Time Series Analysis \|___location= \|publisher=[[Oxford University Press]] \|isbn= }} * {{~~cite book~~citation \|last=Pankratz \|first=Alan \|year=1983 \|title=Forecasting with Univariate Box–Jenkins Models: Concepts and Cases \|~~___location~~publisher=~~New~~ ~~York \|publisher=~~[[John Wiley & Sons]] }} ==External links== * [https://web.archive.org/web/20070318000551/http://statistik.mathematik.uni-wuerzburg.de/timeseries/ A First Course on Time Series Analysis] -– an open source book on time series analysis with SAS (Chapter 7) * [http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc445.htm Box–Jenkins models] in the Engineering Statistics Handbook of [[NIST]] * [http://robjhyndman.com/papers/BoxJenkins.pdf Box–Jenkins modelling] by Rob J Hyndman * [http://support.sas.com/resources/papers/proceedings13/454-2013.pdf The Box–Jenkins methodology for time series models] by Theresa Hoang Diem Ngo {{NIST-PD}} {{Authority control}} {{DEFAULTSORT:Box-Jenkins}} [[Category:Time series ~~analysis~~models]]