Variance calculation in oel_hdf4/libbin++/L3File64.cpp

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
b.detracey
Posts: 38
Joined: Tue Sep 29, 2015 11:10 am America/New_York
Answers: 0

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by b.detracey » Wed Sep 26, 2018 10:48 am America/New_York

At risk of putting my foot in my mouth (yet again) on these forums, I have a concern about the variance calculation in oel_hdf4/libbin++/L3File64.cpp
/**
* calculate the variance for this product
* @param prodId index of the product you want
* @return the variance value
*/
float L3Bin::getVariance(int32_t prodId) const {
    checkProdId(prodId);
    if (weights * weights > nscenes) {
        double val = sums[prodId] / weights;
        val = (sumSquares[prodId] / weights) - (val * val);
        return val * weights * weights / (weights * weights - nscenes);
    }
    return 0.0;
}

My concern is whether nscenes should instead be nobs in this function. If the weights are sqrt(nobs) for each scene then the sum of the squares of the weights for a composite of scenes should be the nobs value for the composite, assuming the weights are treated as reliability weights of an unbiased estimator (https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights).

Edit: Should have posted this under Satellite Data Products & Algorithms instead of SeaDAS General, I suppose. Feel free to move it there.

Filters:

OB SeaDAS - dshea
Subject Matter Expert
Subject Matter Expert
Posts: 271
Joined: Thu Mar 05, 2009 10:25 am America/New_York
Answers: 0
Been thanked: 2 times

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by OB SeaDAS - dshea » Thu Sep 27, 2018 10:38 am America/New_York

Yes, the Math is confusing. 
The equation for calculating variance is correct.  The tricky part is what is stored in the weight, sum and sum of squares.  We account for the number of observations in each scene and the number of scenes contributing to each bin.  Below is the code from l2bin.c that writes the bin file:



                  /* Process data file by file */
                    /* ------------------------- */
                    i32 = 1;
                    sum = data_values[ibin][0 * l3b_nprod + iprod];
                    sum2 = sum * sum;
                    for (j = 1; j < nobs[ibin]; j++) {
                        if (file_index[ibin][j] == file_index[ibin][j - 1]) {
                            i32++;
                            sum += data_values[ibin][j * l3b_nprod + iprod];
                            sum2 += data_values[ibin][j * l3b_nprod + iprod] *
                                    data_values[ibin][j * l3b_nprod + iprod];
                        } else {
                            sum_data[ibin] += (sum / sqrt(i32));
                            sum2_data[ibin] += (sum2 / sqrt(i32));

                            i32 = 1;
                            sum = data_values[ibin][j * l3b_nprod + iprod];
                            sum2 = sum * sum;
                        }
                    } /* observation loop */
                    sum_data[ibin] += (sum / sqrt(i32));
                    sum2_data[ibin] += (sum2 / sqrt(i32));


don (and Sean and Fred)

b.detracey
Posts: 38
Joined: Tue Sep 29, 2015 11:10 am America/New_York
Answers: 0

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by b.detracey » Thu Sep 27, 2018 12:01 pm America/New_York

Sorry. I know you guys are too busy for this... but...here is the source of my confusion:

The biased weighted variance is calculated by the lines:
double val = sums[prodId] / weights;
val = (sumSquares[prodId] / weights) - (val * val);


So I am assuming the line:
return val * weights * weights / (weights * weights - nscenes);
is your bias correction.

However, following the wikipedia page, the bias correction for (reliability) weighted variance is to divide by (pseudocode):
(1 - sum_of_square_of_weights/sum_of_weights_squared)
which in the code would be:
(1 -  nobs/(weights * weights))
and gives:
return val * weights * weights / (weights * weights - nobs);
:confused:
In other words, nscenes plays no role in the weighting, so why should it appear in the variance?

OB.DAACx - SeanBailey
Posts: 1519
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by OB.DAACx - SeanBailey » Sat Sep 29, 2018 2:34 pm America/New_York

The l2bin program can do both spatial (it's primary purpose) and temporal binning.  The l3bin program is intended solely for temporal binning (but can be used to bin multiple missions together...)
nscenes comes into play to account for the temporal binning, so as to not bias toward a single day with a lot of pixels contributing to the bin.  For some historical context that doesn't rely on Wikipedia, see Volume 32 of the SeaWiFS Pre-Launch Technical Memoranda Series...just ignore all that mumbo jumbo about MLE and geometric means...we abandoned that 20 years ago :grin:

Sean

b.detracey
Posts: 38
Joined: Tue Sep 29, 2015 11:10 am America/New_York
Answers: 0

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by b.detracey » Mon Oct 01, 2018 10:11 am America/New_York

Believe me, I have had a copy of that report on my hard drive for a couple years and have spent a good amount of time staring :eek: at it. And from reading the forums I know that MLE, geometric means and taking the average of the natural log of CHL were all abandoned.

But the variance calculation in the report (Eq.C14) does not use NSEG i.e. nscenes . NSEG is not used in any calculations except the summing of itself over scenes. My mouth is wide open and I am holding my foot, but I am not convinced it is time to insert it.

Edit: Okay. I believe the correct answer is that weights * weights / (weight * weights - nscenes) is a correction based on a degrees of freedom argument, but my stats knowledge is mostly rubbish and I am trying to dig up a reference to verify. I feel my confusion is justified given the disconnect between the code and the old report and that no one could explain the bias correction. So I am leaving my sock on when I put my foot in my mouth. Tastes like, cotton.

b.detracey
Posts: 38
Joined: Tue Sep 29, 2015 11:10 am America/New_York
Answers: 0

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by b.detracey » Tue Oct 16, 2018 10:42 am America/New_York

I could not find any references, so I am concluding the bias correction is ad hoc.

OB.DAACx - SeanBailey
Posts: 1519
Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by OB.DAACx - SeanBailey » Tue Oct 16, 2018 7:51 pm America/New_York

Brendan,
Let's take a step back...the value of the SUM stored in the bin file is not the SUM, but the SUM/nobs.  IF we stored the unnormalized SUM, then yes, the weight would want to be NOBS.
Since we're storing an NOBS normalized SUM, the value store as the weight is the sqrt of the number of scenes contributing to the bin.  In the simple case of a single scene contributing to the bin the SUM is the nobs weighted SUM and the weight is 1.  When more than one scene contributes to the bin (i.e. temporal binning), the weight is not 1, but the sqrt of the NSCENES (ne NSEG) because the SUM is the sum of the per scene NOBS weighted SUM and the weight accounts for the number of scenes...

Should be clear as ...mud :grin:

Sean

b.detracey
Posts: 38
Joined: Tue Sep 29, 2015 11:10 am America/New_York
Answers: 0

Variance calculation in oel_hdf4/libbin++/L3File64.cpp

by b.detracey » Thu Oct 18, 2018 9:28 am America/New_York

Can't step back with foot in mouth. Must hop.

Post Reply