Data validation and reconciliation: Difference between revisions

Content deleted Content added
m A sentence regarding the effect of "error remediation" seemed to be truncated. I added a couple of words that I think do match what should be there.
m task, replaced: Chem. Commun → Chem. Commun. using AWB
Line 16:
# [[systematic errors]] (or gross errors) due to sensor [[calibration]] or faulty data transmission.
 
[[Random error]]s means that the measurement <math>y\,\!</math> is a [[random variable]] with [[mean]] <math>y^*\,\!</math>, where <math>y^*\,\!</math> is the true value that is typically not known. A [[systematic error]] on the other hand is characterized by a measurement <math>y\,\!</math> which is a random variable with [[mean]] <math>\bar{y}\,\!</math>, which is not equal to the true value <math>y^*\,</math>. For ease in deriving and implementing an optimal estimation solution, and based on arguments that errors are the sum of many factors (so that the [[Central limit theorem]] has some effect), data reconciliation assumes these errors are [[normal distribution|normally distributed]].
 
Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses.
 
The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses.
 
This use of average values, like a [[moving average]], acts as a [[low-pass filter]], so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases.
Line 31:
 
==History==
DVR has become more and more important due to industrial processes that are becoming more and more complex. DVR started in the early 1960s with applications aiming at closing [[mass balance|material balances]] in production processes where raw measurements were available for all [[variable (mathematics)|variables]].<ref>D.R. Kuehn, H. Davidson, ''Computer Control II. Mathematics of Control'', Chem. Eng. Process 57: 44–47, 1961.</ref> At the same time the problem of [[systematic error|gross error]] identification and elimination has been presented.<ref>V. Vaclavek, ''Studies on System Engineering I. On the Application of the Calculus of the Observations of Calculations of Chemical Engineering Balances'', Coll. Czech Chem. Commun. 34: 3653, 1968.</ref> In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process.,<ref>V. Vaclavek, M. Loucka, ''Selection of Measurements Necessary to Achieve Multicomponent Mass Balances in Chemical Plant'', Chem. Eng. Sci. 31: 1199–1205, 1976.</ref><ref name="Mah-Stanley-Downing-1976">[http://gregstanleyandassociates.com/ReconciliationRectificationProcessData-1976.pdf R.S.H. Mah, G.M. Stanley, D.W. Downing, ''Reconciliation and Rectification of Process Flow and Inventory Data'', Ind. & Eng. Chem. Proc. Des. Dev. 15: 175–183, 1976.]</ref> DVR also became more mature by considering general nonlinear equation systems coming from thermodynamic models.,<ref>J.C. Knepper, J.W. Gorman, ''Statistical Analysis of Constrained Data Sets'', AiChE Journal 26: 260–164, 1961.</ref>
,<ref name="Stanley-Mah-1977">[http://gregstanleyandassociates.com/AIChEJ-1977-EstimationInProcessNetworks.pdf G.M. Stanley and R.S.H. Mah, ''Estimation of Flows and Temperatures in Process Networks'', AIChE Journal 23: 642–650, 1977.]</ref>
<ref>P. Joris, B. Kalitventzeff, ''Process measurements analysis and validation'', Proc. CEF’87: Use Comput. Chem. Eng., Italy, 41–46, 1987.</ref> Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah.<ref name="Stanley-Mah-1977"/> Dynamic DVR was formulated as a nonlinear optimization problem by Liebman et al. in 1992.<ref>M.J. Liebman, T.F. Edgar, L.S. Lasdon, ''Efficient Data Reconciliation and Estimation for Dynamic Processes Using Nonlinear Programming Techniques'', Computers Chem. Eng. 16: 963–986, 1992.</ref>
Line 62:
File:topological_red.jpg|Topological redundancy arising from model information, using the mass conservation constraint <math>a=b+c\,\!</math>, for example one can calculate <math>c\,\!</math>, when <math>a\,\!</math> and <math>b\,\!</math> are known.
</gallery>
Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from [[Redundancy (information theory)|redundancy in information theory]]. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy",<ref name="Stanley-Mah-1977"/> "analytical redundancy", or "topological redundancy".
 
Redundancy can be due to [[redundancy (engineering)|sensor redundancy]], where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints.
 
Redundancy is linked to the concept of [[observability]]. A variable (or system) is observable if the models and sensor measurements can be used to uniquely determine its value (system state). A sensor is redundant if its removal causes no loss of observability. Rigorous definitions of observability, calculability, and redundancy, along with criteria for determining it, were established by Stanley and Mah,<ref name="Stanley-Mah-1981a">
Line 78:
\end{align}</math>
 
i.e. the redundancy is the difference between the number of equations <math>p\,</math> and the number of unmeasured variables <math>m\,</math>. The level of total redundancy is the sum of sensor redundancy and topological redundancy. We speak of positive redundancy if the system is calculable and the total redundancy is positive. One can see that the level of topological redundancy merely depends on the number of equations (the more equations the higher the redundancy) and the number of unmeasured variables (the more unmeasured variables, the lower the redundancy) and not on the number of measured variables.
 
Simple counts of variables, equations, and measurements are inadequate for many systems, breaking down for several reasons: (a) Portions of a system might have redundancy, while others do not, and some portions might not even be possible to calculate, and (b) Nonlinearities can lead to different conclusions at different operating points. As an example, consider the following system with 4 streams and 2 units.
Line 166:
* V. Veverka, F. Madron, 'Material and Energy Balancing in the Process Industries'', Elsevier Science BV, Amsterdam, 1997.
* J. Romagnoli, M.C. Sanchez, ''Data processing and reconciliation for chemical process operations'', Academic Press, 2000.
 
 
{{DEFAULTSORT:Data Validation And Reconciliation}}