Figure 4. Spatial matchups of in-situ data with satellite data.

2.1.3. Global DEM Data
Since descriptive statistics showed a negative relationship between surface altitude and in-situ concentration, with a Pearson's correlation of r = -0.3907 in our in-situ dataset, we used global Digital Elevation Model (DEM) data as one of the input variables, "Altitude", in order to estimate the ground-level concentration. The relationship between the variables "Height" and "Altitude" is shown in Figure 3b. In our study, we used the Shuttle Radar Topography Mission (SRTM) DEM product and resampled it to a resolution of 0.05°. This dataset had an original resolution of 90 m at the equator and was provided in WGS84 projection with a resolution of 1 arc [48].

2.2. Data Processing
After collecting and organizing data into formattable structure, we visualized and preprocessed these data. Then, two neural networks were implemented for point and interval estimations by using PyTorch, a popular deep-learning framework. Our code is available online (https://github.com/dingyizhe2000/Interval-HCHO-ConcentrationEstimation accessed on 21 June 2021). The preprocessed data with the ground truth from in-situ HCHO concentration were then divided randomly into two groups; 90% of the dataset was used to train our models and 10% was used for validation. After that, global VCD data were fed into the model in order to derive global surface level HCHO concentration.

2.2.1. Preprocessing
In theory, a neural network is able to handle input data with a varied distribution; however, a significant defect was noticed in the training process without preprocessing, owing to the highly imbalanced and skewed distribution of the HCHO concentration (both column and in-situ). Therefore, we first applied log-transformation to the raw data. As shown in Figure S1, the logarithm of the HCHO concentration data shows a bell-shaped distribution, and increments in estimation accuracy have also confirmed the effectiveness of log-transformation.

2.2.2. Neural Network Architecture
As a universal function approximator, the neural network played an important role in helping us derive the point and interval estimations of the HCHO concentration. However, instead of training a single network to obtain these estimations jointly, two separate neural networks were constructed for point and interval estimation, respectively, because multiple experiments which we carried out indicated that a joint model usually has to compromise between point estimation and in.