Data validation and reconciliation: Difference between revisions

Content deleted Content added
Gmstanley (talk | contribs)
m History: Eliminated incorrect date reference to 1980's for nonlinear thermodynamic models, since 2/3 of references were before that ~~~~
Gmstanley (talk | contribs)
Redundancy: Expanded history and examples of redundancy
Line 62:
File:topological_red.jpg|Topological redundancy arising from model information, using the mass conservation constraint <math style="vertical-align:-10%;">a=b+c\,\!</math>, for example one can calculate <math style="vertical-align:-0%;">c\,\!</math>, when <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">b\,\!</math> are known.
</gallery>
Data reconciliation isrelies strongly relying on the concept of [[redundancy (information theory)|redundancy]]. Redundancy is a source of information that is used to correct the measurements as little as possible in order to satisfy the process constraints. Redundancy canHere, beredundancy dueis todefined differently than [[redundancyRedundancy (engineeringinformation theory) |sensor redundancy]], wherein sensorsinformation aretheory]]. duplicated inInstead, orderredundancy toarises havefrom morecombining thansensor onedata measurement ofwith the samemodel quantity.(algebraic Redundancyconstraints), cansometimes alsomore arisespecifically fromcalled [[topological"spatial redundancy]]"<ref name="Stanley-Mah-1977"/>, where"analytical a single variable can be estimated in several independent waysredundancy", fromor separate"topological setsredundancy". of measurements./
 
Topological redundancy is intimately linked with the [[degrees of freedom (physics and chemistry)|degree of freedom]] (<math style="vertical-align:-25%;">dof\,\!</math>) of a mathematical system,<ref name="vdi">VDI-Gesellschaft Energie und Umwelt, "Guidelines - VDI 2048 Blatt 1 - Uncertainties of measurements at acceptance tests for energy conversion and power plants - Fundamentals", ''[http://www.vdi.de/401.0.html Association of German Engineers]'', 2000.</ref> i.e. the minimum number of pieces of information (i.e. measurements) that are required in order to calculate all of the system variables. For instance, in the example above the flow conservation requires that <math style="vertical-align:-10%;">a=b+c\,</math>, and it is clear that one needs to know the value of two of the 3 variables in order to calculate the third one. Therefore the degree of freedom in that case is equal to 2.
Redundancy can be due to [[redundancy (engineering)|sensor redundancy]], where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints.
 
Redundancy is linked to the concept of [[observability]]. A variable (or system) is observable if the models and sensor measurements can be used to uniquely determine its value (system state). A sensor is redundant if its removal causes no loss of observability. Rigorous definitions of observability, calculability, and redundancy, along with criteria for determining it, were established by Stanley and Mah<ref name="Stanley-Mah-1981a">
[http://gregstanleyandassociates.com/whitepapers/DataRec/CES-1981a-ObservabilityRedundancy.pdf Stanley G.M. and Mah, R.S.H., "Observability and Redundancy in Process Data Estimation, Chem. Engng. Sci. 36, 259 (1981)]</ref>, for these cases with set constraints such as algebraic equations and inequalities. Next, we illustrate some special cases:
 
Topological redundancy is intimately linked with the [[degrees of freedom (physics and chemistry)|degreedegrees of freedom]] (<math style="vertical-align:-25%;">dof\,\!</math>) of a mathematical system,<ref name="vdi">VDI-Gesellschaft Energie und Umwelt, "Guidelines - VDI 2048 Blatt 1 - Uncertainties of measurements at acceptance tests for energy conversion and power plants - Fundamentals", ''[http://www.vdi.de/401.0.html Association of German Engineers]'', 2000.</ref> i.e. the minimum number of pieces of information (i.e. measurements) that are required in order to calculate all of the system variables. For instance, in the example above the flow conservation requires that <math style="vertical-align:-10%;">a=b+c\,</math>,. and it is clear that oneOne needs to know the value of two of the 3 variables in order to calculate the third one. ThereforeThe the degreedegrees of freedom for the model in that case is equal to 2. At least 2 measurements are needed to estimate all the variables, and 3 would be needed for redundancy.
 
When speaking about topological redundancy we have to distinguish between measured and unmeasured variables. In the following let us denote by <math style="vertical-align:-0%;">x\,\!</math> the unmeasured variables and <math style="vertical-align:-30%;">y\,\!</math> the measured variables. Then the system of the process constraints becomes <math style="vertical-align:-25%;">F(x,y)=0\,\!</math>, which is a nonlinear system in <math style="vertical-align:-30%;">y\,\!</math> and <math style="vertical-align:-0%;">x\,\!</math>.
Line 72 ⟶ 78:
\end{align}</math>
 
i.e. the redundancy is the difference between the number of equations <math style="vertical-align:-30%;">p\,</math> and the number of unmeasured variables <math>m\,</math>. The level of total redundancy is the sum of sensor redundancy and topological redundancy. We speak of positive redundancy if the system is calculable and the total redundancy is positive. One can see that the level of topological redundancy merely depends on the number of equations (the more equations the higher the redundancy) and the number of unmeasured variables (the more unmeasured variables, the lower the redundancy) and not on the number of measured variables. However, it is possible that the system <math style="vertical-align:-30%;">F(x,y)=0\,\!</math> is not calculable, even though <math style="vertical-align:-30%;">p-m\ge 0\,\!</math>, as illustrated in the following example.
 
Simple counts of variables, equations, and measurements are inadequate for many systems, breaking down for several reasons: (a) Portions of a system might have redundancy, while others do not, and some portions might not even be possible to calculate, and (b) Nonlinearities can lead to different conclusions at different operating points. As an example, consider the following system with 4 streams and 2 units.
 
====Example of calculable and non-calculable systems====
Line 79 ⟶ 87:
File:uncalculable_system.jpg|non-calculable system, knowing <math style="vertical-align:-0%;">c\,\!</math> does not give information about <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">b\,\!</math>.
</gallery>
 
Let us consider a small system with 4 streams and 2 units. We incorporate only flow conservation constraints and obtain <math style="vertical-align:-10%;">a+b=c\,\!</math> and <math style="vertical-align:-0%;">c=d\,\!</math>. If we have measurements for <math style="vertical-align:-0%;">c\,\!</math> and <math style="vertical-align:-0%;">d\,\!</math>, but not for <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">b\,\!</math>, then the system cannot be calculated (knowing <math style="vertical-align:-0%;">c\,\!</math> does not give information about <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">b\,\!</math>). On the other hand, if <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">c\,\!</math> are known, but not <math style="vertical-align:-0%;">b\,\!</math> and <math style="vertical-align:-0%;">d\,\!</math>, then the system can be calculated.
We incorporate only flow conservation constraints and obtain <math style="vertical-align:-10%;">a+b=c\,\!</math> and <math style="vertical-align:-0%;">c=d\,\!</math>. It is possible that the system <math style="vertical-align:-30%;">F(x,y)=0\,\!</math> is not calculable, even though <math style="vertical-align:-30%;">p-m\ge 0\,\!</math>.
 
Let us consider a small system with 4 streams and 2 units. We incorporate only flow conservation constraints and obtain <math style="vertical-align:-10%;">a+b=c\,\!</math> and <math style="vertical-align:-0%;">c=d\,\!</math>. If we have measurements for <math style="vertical-align:-0%;">c\,\!</math> and <math style="vertical-align:-0%;">d\,\!</math>, but not for <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">b\,\!</math>, then the system cannot be calculated (knowing <math style="vertical-align:-0%;">c\,\!</math> does not give information about <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">b\,\!</math>). On the other hand, if <math style="vertical-align:-0%;">a\,\!</math> and <math style="vertical-align:-0%;">c\,\!</math> are known, but not <math style="vertical-align:-0%;">b\,\!</math> and <math style="vertical-align:-0%;">d\,\!</math>, then the system can be calculated.
 
In 1981, observability and redundancy criteria were proven for these sorts of flow networks involving only mass and energy balance constraints <ref name="Stanley-Mah-1981b">[http://gregstanleyandassociates.com/whitepapers/DataRec/CES-1981b-ObservabilityRedundancyProcessNetworks.pdf Stanley G.M., and Mah R.S.H., "Observability and Redundancy Classification in Process Networks", Chem. Engng. Sci. 36, 1941 (1981) ]</ref>. After combining all the plant inputs and outputs into an "environment node", loss of observability corresponds to cycles of unmeasured streams. That is seen in the second case above, where streams a and b are in a cycle of unmeasured streams. Redundancy classification follows, by testing for a path of unmeasured streams, since that would lead to an unmeasured cycle if the measurement was removed. Measurements c and d are redundant in the second case above, even though part of the system is unobservable.
 
===Benefits===