Page 1 of 1

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Wed Sep 26, 2018 10:48 am America/New_York
by b.detracey
At risk of putting my foot in my mouth (yet again) on these forums, I have a concern about the variance calculation in oel_hdf4/libbin++/L3File64.cpp
/**
* calculate the variance for this product
* @param prodId index of the product you want
* @return the variance value
*/
float L3Bin::getVariance(int32_t prodId) const {
    checkProdId(prodId);
    if (weights * weights > nscenes) {
        double val = sums[prodId] / weights;
        val = (sumSquares[prodId] / weights) - (val * val);
        return val * weights * weights / (weights * weights - nscenes);
    }
    return 0.0;
}

My concern is whether nscenes should instead be nobs in this function. If the weights are sqrt(nobs) for each scene then the sum of the squares of the weights for a composite of scenes should be the nobs value for the composite, assuming the weights are treated as reliability weights of an unbiased estimator (https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights).

Edit: Should have posted this under Satellite Data Products & Algorithms instead of SeaDAS General, I suppose. Feel free to move it there.

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Thu Sep 27, 2018 10:38 am America/New_York
by OB SeaDAS - dshea
Yes, the Math is confusing. 
The equation for calculating variance is correct.  The tricky part is what is stored in the weight, sum and sum of squares.  We account for the number of observations in each scene and the number of scenes contributing to each bin.  Below is the code from l2bin.c that writes the bin file:



                  /* Process data file by file */
                    /* ------------------------- */
                    i32 = 1;
                    sum = data_values[ibin][0 * l3b_nprod + iprod];
                    sum2 = sum * sum;
                    for (j = 1; j < nobs[ibin]; j++) {
                        if (file_index[ibin][j] == file_index[ibin][j - 1]) {
                            i32++;
                            sum += data_values[ibin][j * l3b_nprod + iprod];
                            sum2 += data_values[ibin][j * l3b_nprod + iprod] *
                                    data_values[ibin][j * l3b_nprod + iprod];
                        } else {
                            sum_data[ibin] += (sum / sqrt(i32));
                            sum2_data[ibin] += (sum2 / sqrt(i32));

                            i32 = 1;
                            sum = data_values[ibin][j * l3b_nprod + iprod];
                            sum2 = sum * sum;
                        }
                    } /* observation loop */
                    sum_data[ibin] += (sum / sqrt(i32));
                    sum2_data[ibin] += (sum2 / sqrt(i32));


don (and Sean and Fred)

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Thu Sep 27, 2018 12:01 pm America/New_York
by b.detracey
Sorry. I know you guys are too busy for this... but...here is the source of my confusion:

The biased weighted variance is calculated by the lines:
double val = sums[prodId] / weights;
val = (sumSquares[prodId] / weights) - (val * val);


So I am assuming the line:
return val * weights * weights / (weights * weights - nscenes);
is your bias correction.

However, following the wikipedia page, the bias correction for (reliability) weighted variance is to divide by (pseudocode):
(1 - sum_of_square_of_weights/sum_of_weights_squared)
which in the code would be:
(1 -  nobs/(weights * weights))
and gives:
return val * weights * weights / (weights * weights - nobs);
:confused:
In other words, nscenes plays no role in the weighting, so why should it appear in the variance?

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Sat Sep 29, 2018 2:34 pm America/New_York
by OB.DAACx - SeanBailey
The l2bin program can do both spatial (it's primary purpose) and temporal binning.  The l3bin program is intended solely for temporal binning (but can be used to bin multiple missions together...)
nscenes comes into play to account for the temporal binning, so as to not bias toward a single day with a lot of pixels contributing to the bin.  For some historical context that doesn't rely on Wikipedia, see Volume 32 of the SeaWiFS Pre-Launch Technical Memoranda Series...just ignore all that mumbo jumbo about MLE and geometric means...we abandoned that 20 years ago :grin:

Sean

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Mon Oct 01, 2018 10:11 am America/New_York
by b.detracey
Believe me, I have had a copy of that report on my hard drive for a couple years and have spent a good amount of time staring :eek: at it. And from reading the forums I know that MLE, geometric means and taking the average of the natural log of CHL were all abandoned.

But the variance calculation in the report (Eq.C14) does not use NSEG i.e. nscenes . NSEG is not used in any calculations except the summing of itself over scenes. My mouth is wide open and I am holding my foot, but I am not convinced it is time to insert it.

Edit: Okay. I believe the correct answer is that weights * weights / (weight * weights - nscenes) is a correction based on a degrees of freedom argument, but my stats knowledge is mostly rubbish and I am trying to dig up a reference to verify. I feel my confusion is justified given the disconnect between the code and the old report and that no one could explain the bias correction. So I am leaving my sock on when I put my foot in my mouth. Tastes like, cotton.

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Tue Oct 16, 2018 10:42 am America/New_York
by b.detracey
I could not find any references, so I am concluding the bias correction is ad hoc.

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Tue Oct 16, 2018 7:51 pm America/New_York
by OB.DAACx - SeanBailey
Brendan,
Let's take a step back...the value of the SUM stored in the bin file is not the SUM, but the SUM/nobs.  IF we stored the unnormalized SUM, then yes, the weight would want to be NOBS.
Since we're storing an NOBS normalized SUM, the value store as the weight is the sqrt of the number of scenes contributing to the bin.  In the simple case of a single scene contributing to the bin the SUM is the nobs weighted SUM and the weight is 1.  When more than one scene contributes to the bin (i.e. temporal binning), the weight is not 1, but the sqrt of the NSCENES (ne NSEG) because the SUM is the sum of the per scene NOBS weighted SUM and the weight accounts for the number of scenes...

Should be clear as ...mud :grin:

Sean

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Posted: Thu Oct 18, 2018 9:28 am America/New_York
by b.detracey
Can't step back with foot in mouth. Must hop.