Revision as of 14:14, 12 February 2025 edit RJFJR (talk \| contribs) Administrators 166,873 edits →OpenMP implementation: move ref out of heading ← Previous edit		Revision as of 14:15, 12 February 2025 edit undo RJFJR (talk \| contribs) Administrators 166,873 edits →CUDA implementation: rem dup ref Next edit →
Line 181: The EEMDs comprising MEEMD are assigned to independent threads for parallel execution, relying on the OpenMP runtime to resolve any load imbalance issues. Stride memory accesses of high-dimensional data are eliminated by transposing these data to lower dimensions, resulting in better utilization of cache lines. The partial results of each EEMD are made thread-private for correct functionality. Memory requirements depend on the number of OpenMP threads and are managed by OpenMP runtime.<ref name=":8" /> === CUDA implementation~~<ref name=":8" />~~ === In the GPU CUDA implementation, each EMD, is mapped to a thread. The memory layout, especially of high-dimensional data, is rearranged to meet memory coalescing requirements and fit into the 128-byte cache lines. The data is first loaded along the lowest dimension and then consumed along a higher dimension. This step is performed when the Gaussian noise is added to form the ensemble data. In the new memory layout, the ensemble dimension is added to the lowest dimension to reduce possible branch divergence. The impact of the unavoidable branch divergence from data irregularity, caused by the noise, is minimized via a regularization technique using the on-chip memory. Moreover, the cache memory is utilized to amortize unavoidable uncoalesced memory accesses.<ref name=":8" />

Multidimensional empirical mode decomposition: Difference between revisions