Skip to main content

Determining the effects of social–environmental factors on the incidence and mortality of lung cancer in China based on remote sensing and GIS technology during 2007–2016

Abstract

Background

Lung cancer is the leading cause of cancer-related death in China. However, its relationship with social–environmental factors has not been revealed comprehensively. We are the first group to determine cold and hot spots associated with the incidence and mortality of lung cancer (IMLC) in both females and males and their spatiotemporal changes and to explore the social‒environmental burden of lung cancer in China between 2007 and 2016.

Methods

The explanatory powers of various social–environmental factors for the IMLC were evaluated through correlation analysis and the Geodetector tool. Spatial analysis models were applied to determine the relationships between the IMLC and social–environmental factors.

Results

The results are as follows: (1) The distribution of the IMLC exhibited significant spatial heterogeneity; the Global Moran’s index values for incidence ranged from 0.04–0.2 and 0.09–0.33 in males and females, respectively, and the values for mortality ranged from 0.01–0.12 and 0.11–0.32 in males and females, respectively. (2) The IMLC was spatially clustered with an overall positive autocorrelation. Male population-related hot spots were observed in the central–southern region of China, and cold spots were observed in western China. Female population-related hot spots were observed primarily in northeastern China. The cold spots occurred primarily in southern and some western regions of China. (3) The effects of social–environmental factors on the IMLC showed significant spatial and temporal variability: in males, the interaction between terrain undulation and road area exhibited the highest explanatory power for the incidence and mortality, with a value of 0.22 for both; in females, the interaction between O3 and road area and the interaction between O3 and the number of medical beds exhibited the highest explanatory powers for the incidence and mortality, reaching 0.27 and 0.34, respectively. (4) The optimal model capturing the relationships between the IMLC and social–environmental factors was the GTWR model, which relies on reclassified data. The best R2 value is 0.456.

Conclusions

The influence of each social‒environmental factor on the IMLC showed significant spatiotemporal variability, providing a systematic basis for governments to implement better targeted control of lung cancer.

Peer Review reports

Introduction

Lung cancer has been the leading cause of cancer-related incidences and deaths globally [1, 2]. Between 2006 to 2016, the mortality of lung cancer increased by 18.3% [3]. This increase has been apparent in China, as lung cancer is the leading cause of cancer incidence and mortality in both males and females [4]. Due to this increase, the “Healthy China 2030” Plan puts forward goals for cancer prevention and treatment, and emphasizes the importance of cancer prevention. To implement the plan, the National Health Commission of China and 12 other departments proposed the “Healthy China Action–Implementation Plan for Cancer Prevention and Control Action (2023–2030)”, with the one of the goals being that the overall 5-year survival rate of patients with cancer should surpass 46.6%, and the disease burden should effectively be controlled by 2030. The program also proposes the establishment of a scientific system to monitor and evaluate the impacts of drinking water, air, and soil on public health to better control environmental diseases.

Factors should be carefully considered when exploring their relationship with the incidence and mortality of lung cancer (IMLC). Some studies have focused on exploring factors related to personal health, such as smoking [5], asbestos and coal dust inhalation [6, 7], dietary intake [8], and genetic background [9],and analyzing their contributions to the development of lung cancer. For example, based on a retrospective study of 6,446 lung cancer patients, environmental asbestos exposure was found to increase the risk of lung cancer and the relationship between exposure dose and risk was linear [7]. Cheng et al. identified 45 genetic regions associated with lung cancer based on large-scale multistage genome-wide association studies [9]. Other studies have examined social–environmental factors from a macro-perspective and explored their impact on lung cancer, such as air pollutants, meteorological conditions, topography, socioeconomics, and urbanization. It was shown that air pollutants, mainly particulate and gaseous matter, such as PM2.5 [10, 11], PM10 [12], NO2 [12], SO2 [12], and O3 [13], have an impact on the IMLC. A nationwide analysis was performed in 295 counties (districts) in China from 2006 to 2014, and significant positive associations between PM2.5 and lung cancer incidence rates in both males and females were found [10]. A linear and non-threshold exposure response relationship between short-term O3 exposure and cancer mortality was found, with all cancer mortality risks, including lung cancer, increasing with increasing O3 concentration [13]. Hassan et al. [14] and Swiatkowska et al. [15] showed that the dispersion of airborne particles, such as haze, dust, and sand, is a potential influencing factor for the incidence of lung cancer. For example, Hassan et al. reported that the lung cancer incidence rate increased from 1.8 cases per week during the non-haze period to 4.5 cases per week during the haze period in Southeast Asia, indicating a relationship between haze and lung cancer [14]. In addition, some researchers have pointed out that natural environmental conditions are important influencing factors for developing lung cancer. Yang et al. found that a 0.1 unit increase in vegetation coverage indexed using normalized difference vegetation index (NDVI) was linked to a 2% reduction in mortality rate among lung cancer patients in Beijing, revealing a positive correlation between urban greenness and increased survival opportunities for lung cancer patients [16]. Ren et al. found that the mortality rate of lung cancer was negatively correlated with topographic factors at the village scale in Xuanwei City, China [17]. Furthermore, socioeconomic conditions also play an indispensable role in the progression of the IMLC. Areas with better socioeconomic conditions tend to have higher land use intensity [18], urbanization rates [18], population density [18], and gross domestic product (GDP) [5] and more developed transportation [19, 20]. For example, it was found that the urbanization rate and population density were negatively correlated with lung cancer incidence [18]. When exploring the relationship between road traffic-related indicators and the risk of lung cancer, Shao et al. found that living less than 50 m from a major road was significantly associated with an increase in lung cancer risk in Jiading District, Shanghai, China [20]. Overall, previous studies have demonstrated that social‒environmental factors—including air pollutant levels and meteorological, natural, environmental, and socioeconomic variables—exert significant impacts on the occurrence of lung cancer. Examining these macro-factors at the national level can provide evidence for effectively controlling lung cancer from a broad perspective. However, the synergistic effects of the abovementioned social‒environmental factors on the IMLC have not been comprehensively investigated in China.

Owing to the large area of China, the IMLC, as well as the macro-influencing factors, are spatially, temporally, or heterogeneously distributed, which complicates the evaluation of the influence of social–environmental factors on lung cancer. Studies have indicated that spatial clustering exists in infectious and chronic diseases affected by social and environmental factors [21, 22]. Guo et al. [23] adopted spatial autocorrelation to reveal the spatial aggregation dynamic of lung cancer incidence and introduced the emerging hot spot analysis to indicate the hot spot changes in lung cancer incidence in China. However, they did not analyze the spatial clustering in men and women separately, in which gender differences may exist, or reveal the spatiotemporal migration of the cold and hot spots of lung cancer incidence. On the other hand, to reveal the influence of regional social–environmental factors on the IMLC, an effective and valid model from a geospatial perspective is required. With the development of public health and geographic information systems (GIS), the amount of disease and/or health data with spatial attributes is increasing. GIS technology plays an important role in evaluating environmental factors and analyzing the spatiotemporal clustering of diseases. Geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR) models are typical analytical methods that account for the spatial and spatiotemporal variations in influencing factors, respectively. Compared with traditional models, such as the ordinary least squares (OLS) model, the GWR and GTWR models can better reflect the impact of regional environmental factors on the IMLC. Recent studies exploring the influencing factors related to lung cancer have only focused on countries such as Portugal [24], the USA [8], Brazil [25], and China [17, 23]. In China, Guo et al. used the GTWR model to determine the influencing factors of lung cancer incidence; however, they did not include other air pollutants except for PM2.5, or the total number of days with blowing sand, floating dust, haze, or sandstorms, which we believe may directly indicate the dispersion and diffusion of the air pollutants in the air. In addition, they did not analyze the impact of influencing factors on lung cancer mortality [23]. The study by Ren et al. only focused on topographic factors and the lung cancer epidemic [17]. The pathogenesis of lung cancer is complex, studying the variable macro-influencing factors for the IMLC over a long-time span is beneficial for an in-depth analysis of the development trend of and potential risk factors for lung cancer. However, there is a lack of studies systematically exploring the spatiotemporal evolution trends of the hot and cold spots of the IMLC as well as integrating and applying social–environmental factors in large-scale (China) and long-term research on both the IMLC.

Tobler's first law of geography states that everything is related to everything else, but nearby things are more related than distant things are [26]. This provides a theoretical framework for analyzing spatial heterogeneity, which is especially applicable to countries such as China, with a vast territory and significant environmental gradients, where disease risks exhibit significant spatial variations. The Geodetector model, a tool designed for detecting spatial heterogeneity, can be used to effectively identify driving factors and their spatial coupling mechanisms, making it a critical tool in environmental health research. In the fields of remote sensing and GIS technology, the Geodetector tool has been successfully applied to better understand the spatial heterogeneity in the land surface temperature [27], air quality [28], and the environment [29]. In this study, we employed the Geodetector tool to determine the factors contributing to the spatial heterogeneity in the IMLC.

To gain a better understanding of the social–environmental factors contributing to the high IMLC in China, based on remote sensing and GIS, this study aims to (1) explore the spatial clustering characteristics of IMLC over a ten-year time period (2007—2016); (2) detect the moving trajectories of the centers of gravity of cold and hot spots of IMLC; (3) determine the correlation between the IMLC and various social–environmental factors using Pearson’s correlation analysis and Geodetector; (4) reveal the relationship between the correlated social–environmental factors and the IMLC based on OLS, GWR, and GTWR. Workflow of the whole study is shown in Fig. 1. Our study was the first to determine cold and hot spots of the IMLC in both females and males and their spatiotemporal changes and to explore the social–environmental burden of lung cancer in China between 2007 and 2016. The findings of this study could potentially provide a scientific basis for controlling the risk factors and thus reducing the risk of lung cancer as proposed in the “Healthy China 2030” Plan and the “Healthy China Action–Cancer Prevention and Control Action Implementation Program (2023–2030)”.

Fig. 1
figure 1

Workflow of the whole study

Data and methods

Study area

The study was conducted in China as it has a vast land area comprising 34 provincial-and 333 prefecture-level administrative regions. Figure 2 shows the administrative map of China obtained from the Standard Map Service System of the Ministry of Natural Resources (http://bzdt.ch.mnr.gov.cn/) with an approval number of GS (2019)1822.

Fig. 2
figure 2

Study area and the distribution of tumor registries in 2016

Data collection and preparation

This study comprehensively selected PM2.5, PM10, NO2, SO2, and O3 as air pollutant factors; the number of days with blowing sand, floating dust, haze, or sandstorms and temperatures as meteorological factors; NDVI, geomorphic type, types of land use, and terrain undulation derived from digital elevation model (DEM) as natural environmental factors; and economic density, population density, road area, healthcare conditions (the number of hospitals, medical beds, and doctors) as socioeconomic factors, to explore the relationship between social–environmental factors and the IMLC. Given that remote sensing can effectively monitor the distribution of environmental factors over large areas, we used remote sensing products and GIS to obtain the following social–environmental data: concentrations of air pollutants, natural environmental factors, temperatures, economic density, and population density. The collected data sets are shown in Table 1.

Table 1 Sources of the data for the independent and dependent variables

Age-standardized IMLC in China

The age-standardized IMLC in China (International Classification of Diseases (ICD) code: C33–34) was obtained from the China Cancer Registry Annual Report for a total of ten years from 2007 to 2016. Data on incidence and mortality were collected at city or county level where the tumor registries were located.

Air pollutants

Air pollutants included SO2, NO2, O3, PM2.5, and PM10. The concentrations of PM2.5 and PM10 between 2007 to 2016, as well as concentrations of SO2 and NO2 between 2013 to 2016 were obtained from the National Earth System Science Data Center (NESSDC, http://www.geodata.cn). The concentrations of O3 between 2007 to 2016, as well as concentrations of SO2 and NO2 between 2007 to 2012 were obtained from the total column products of the Ozone Monitoring Instrument (OMI, https://search.earthdata.nasa.gov).

The original OMI data were obtained as daily averages, from these, annual averages were calculated and obtained. In addition, the concentrations of SO2 and NO2 from the NESSDC and OMI have different units; therefore, regression analyses was conducted using OMI data from 2013 as the independent variable and NESSDC data from 2013 as the dependent variable. These analyses were conducting using the IBM SPSS Statistics (Version 26) software. From the linear, logarithmic, inverse, composite, quadratic, cubic, power, S, growth, exponential, and logistic function models, the cubic function model had the highest R2 and was used to convert the OMI (2007–2012) data into units consistent with those utilized by the NESSDC (2013–2016).

Meteorological data

The temperature data were obtained from the National Oceanic and Atmospheric Administration (NOAA, https://www.ncei.noaa.gov). The remaining meteorological data were obtained from the National Meteorological Information Center of China (https://data.cma.cn).

Based on the latitudes and longitudes of meteorological stations, as well as the data of monthly or daily averages from each meteorological station, annual averages were first calculated and then interpolated using the Inverse Distance Weighting (IDW) tool in ArcGIS (Version 10.7) software to obtain a grid map of the annual number of days with blowing sand, floating dust, haze, sandstorms, and annual average temperatures in the study area.

Natural environmental data

The natural environmental data used in this study included the normalized difference NDVI, geomorphic types, and types of land use derived from the Resource and Environmental Science Data Platform of the Chinese Academy of Sciences (https://www.resdc.cn). Global DEM data for the year 2000 were provided by the shuttle radar topography mission (SRTM) (https://search.earthdata.nasa.gov).

Among them, types of land use were reclassified based on the first-level classification in the original dataset, and corresponding values were assigned to each group: woodland, 1; grassland, 2; water, 3; agricultural land, 4; unutilized land, 5; and construction land, 6. Data were only available years for 2005, 2010, 2013, and 2015 during the study period. Therefore, for 2007, 2008–2011, 2012–2013, and 2014–2016, data from 2005, 2010, 2013, and 2015 were used, respectively. Geomorphic types were reclassified based on the first-level classification in the original dataset, and corresponding values were assigned to each group: extreme high mountains, 1; high mountains, 2; mid-high mountains, 3; low mountains, 4; hills, 5; plateaus, 6; and plains, 7. The ArcGIS (Version 10.7) software was also used to calculate the terrain undulation [17] from the DEM data, which was used as an environmental factor.

Socioeconomic data

Grid data on economic and population densities were obtained from the Resource and Environmental Science Data Platform of the Chinese Academy of Sciences (https://www.resdc.cn/). The road area, the number of hospitals, medical beds, and doctors were obtained from the China Urban Statistical Yearbook. These two datasets were available every five years for 2005, 2010, and 2015 during the study period. Therefore, for 2007, 2008–2012, and 2013–2016, data from 2005, 2010, and 2015 were used, respectively.

In addition, in order to perform the overlay analysis, all the above social–environmental factors were uniformly resampled into grid data with a resolution of 1 km × 1 km. We conducted analysis and found that reclassifying the raw data (except for types of land use and geomorphic types) into five categories based on the original data using the natural discontinuity method can yield better modeling results than just using the raw data. Therefore, we only presented the results based on the reclassified data for the following studies.

Spatial clustering analysis

Spatial autocorrelation reflects whether there is a correlation between adjacent positions geographically [30], and is an important method for analyzing geographic variability of spatial data. In this study, Global Moran’s index (Moran’s I) and local Getis-Ord Gi* [31, 32] available in the spatial autocorrelation and hot spot analysis toolbox of ArcGIS (Version 10.7) were used to investigate the spatial clustering characteristics of the IMLC in both males and females and to locate clusters of cold and hot spots. In addition, standard deviation ellipse was used to reveal the moving trajectories of centers of gravity of cold and hot spots of IMLC.

Global Moran's I

Global Moran's I is a commonly used index of global spatial autocorrelation and is calculated as [33]:

$$I=\frac{n{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}\left({y}_{i}-\overline{y }\right)\left({y}_{j}-\overline{y }\right)}{{s}_{0}{\sum }_{i=1}^{n}{\left({y}_{i}-\overline{y }\right)}^{2}}$$
(1)

where \(n\) is the number of spatial data points, \({y}_{i}\) and \({y}_{j}\) represent the attribute values of the \(i\)th spatial unit and the \(j\)th spatial unit, respectively, \(\overline{y }\) is the mean of the attribute values across all spatial units, \({w}_{ij}\) is the spatial weight value, \({s}_{0}={\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}\) is the sum of all the spatial weight values. Global Moran's I was used to indicate whether there is spatial aggregation in regional data as a whole. The value of Global Moran's I is normalized to [-1,1]. The greater the index value, the more obvious the spatial correlation.

Local Getis-Ord Gi*

Local Getis-Ord Gi* is a commonly used index of local spatial autocorrelation and is calculated as [34]:

$${Gi}^{*}=\frac{\sum_{j=1}^{n}{w}_{ij}{x}_{j}-\overline{x}\sum_{j=1}^{n}{w}_{ij}}{S\sqrt{\frac{\left[n\sum_{j=1}^{n}{{w}_{ij}}^{2}-{\left(\sum_{j=1}^{n}{w}_{ij}\right)}^{2}\right]}{\left(n-1\right)}}}$$
(2)

where \(i\) represents the central element, \(j\) represents all elements in the neighborhood, \(S\) is the standard deviation of the incidence over the entire region, \({w}_{ij}\) is the spatial distance between elements \(i\) and \(j\), \({x}_{j}\) represents the attribute value of the \(j\)th element in the neighborhood, and \(n\) is the total number of elements in the neighborhood. When the Gi* index is positive, it indicates hot spot areas, and vice versa. The higher the absolute value of the Gi* index, the higher the degree of clustering in the study area.

Standard deviation ellipse

Based on the identification of cold and hot spot clustering areas, the spatial evolution trends of these areas were further analyzed using the center of gravity and directional distribution analysis (standard deviation ellipse) to reveal the directional deviation and changing patterns of the IMLC in time and space.

The formula for calculating coordinates of the center of gravity is as follows [35]:

$$\left(\overline{X },\overline{Y }\right)=\left(\frac{\sum_{i=1}^{n}{M}_{i}{X}_{i}}{\sum_{i=1}^{n}{M}_{i}},\frac{\sum_{i=1}^{n}{M}_{i}{Y}_{i}}{\sum_{i=1}^{n}{M}_{i}}\right)$$
(3)

where \(\left(\overline{X },\overline{Y }\right)\) are the coordinates of the center of gravity, \({M}_{i}\) is the mass or weight of \(i\), \({X}_{i}\) and \({Y}_{i}\) are the longitude and latitude of point \(i\), respectively.

The standard deviation ellipse visually displays the distribution of the centers of gravity of hot and cold spots and reveals their clustering or dispersing trends. It also demonstrates their directional deviation [36,37,38]. The long and short axes represent the degrees of dispersion in the primary and secondary directions, respectively. The center of gravity coordinates of each element that are brought into the spatial center of gravity migration in Eq. (4), and the spatiotemporal migration trajectory of the center of gravity of cold and hot spots of the IMLC is plotted by the distance (D) and the offset angle (θ) of the center of gravity migration of each element [39]:

$$D=C\times \sqrt{{\left({X}_{t2}-{X}_{t1}\right)}^{2}+{\left({Y}_{t2}-{Y}_{t1}\right)}^{2}}$$
(4)

where \(D\) is the distance of the center of gravity migration (km), and \(\left({X}_{t1},{Y}_{t1}\right)\) and \(\left({X}_{t2},{Y}_{t2}\right)\) represent the coordinates of the center of gravity at times \({t}_{1}\) and \({t}_{2}\), respectively. \(C\) is a constant that represents the coefficient of conversion from earth latitude and longitude coordinates (\(^\circ\)) to plane distance (km) and takes the value of 111.111.

Correlation analysis of influencing factors

Pearson’s correlation analysis

The Pearson’s correlation analysis is used to measure the degree of linear correlation between two continuous variables with a value range of [-1,1], which we hereby used to study the correlation between social–environmental factors and IMLC. The correlation coefficient is [40]:

$${r}_{xy}=\frac{\sum_{i=1}^{n}\left[\left({x}_{i}-\overline{X}\right)\left({y}_{i}-\overline{Y}\right)\right]}{\sqrt{\sum_{i=1}^{n}{\left({x}_{i}-\overline{X}\right)}^{2}\sum_{i=1}^{n}{\left({y}_{i}-\overline{Y}\right)}^{2}}}$$
(5)

where \({r}_{xy}\) is the correlation coefficient between factors \(x\) and \(y\), \({x}_{i}\) and \({y}_{i}\) are respectively the \(i\)th observation values of two variables \(x\) and \(y\), \(\overline{X}\) and \(\overline{Y}\) are respectively the mean values of the two variables, \(n\) is the number of study areas. When p < 0.05, there was a significant correlation between social–environmental factors and IMLC. A positive correlation coefficient indicates a positive correlation between two factors, and vice versa.

Geodetector

Geodetector is a tool used to test the spatial differentiation of a single variable and to detect possible causal relationships between two variables [41]. In this study, the GD_10.8 package was invoked in RStudio 2024.12.1 to implement Geodetector analysis for quantifying spatial heterogeneity and identifying key driving factors.

The factor detector was used to calculate the explanatory power of each of these social–environmental factors using the following formula [42]:

$$q=1-\frac{\sum_{h=1}^{L}{N}_{h}{\sigma }_{h}^{2}}{N{\sigma }^{2}}$$
(6)

where \(q\) is the explanatory power of various social–environmental factors on the IMLC; \(L\) is the number of classifications of each social–environmental factor; N and \(N\) are the IMLC in the \(h\)th category and the entire study area, respectively; \({\sigma }_{h}^{2}\) and \({\sigma }^{2}\) are the variances of the IMLC in the \(h\)th category and the entire study area, respectively.

The interaction detector determines whether the interactions between social–environmental factors enhance or weaken the explanatory power of the IMLC. There are five types of interactions between the two social–environmental factors \({x}_{a}\) and \({x}_{b}\), as shown in Table 2.

Table 2 Five types of interactions between the two social–environmental factors

Spatial statistical models

The OLS, GWR, and GTWR models were used to analyze the spatial and temporal influences of various social–environmental factors on the IMLC. The analysis was performed using the GTWR ADDIN plugin in ArcGIS (Version 10.7), which can be downloaded from https://www.researchgate.net/publication/329518786_GTWR_ADDIN_Updated_and_Valid_till_Jan_1_2025.

OLS model

The OLS is the most commonly used method for solving curve-fitting problems. It estimates the parameters in a regression model by reducing the sum of the squared errors between the predicted and observed values [43]. The formula is:

$$y={\beta }_{0}+\sum_{i=1}^{k}{\beta }_{i}{x}_{i}+\varepsilon$$
(7)

where \(y\) represents the age-standardized IMLC, \({x}_{i}\) are the social–environmental factors, \({\beta }_{0}\) and \({\beta }_{i}\) are intercept and coefficient, respectively, \(k\) is the number of social–environmental factors, and \(\varepsilon\) is for the error term.

GWR model

The GWR model fully considers spatial heterogeneity by introducing spatial weight [44]. By estimating regional differences between variables, it describes the spatial changes of variables, and calculates the influence of each factor in each study area. The formula for calculating GWR is [45]:

$${y}_{j}={\beta }_{0}\left({u}_{j},{v}_{j}\right)+\sum_{i=1}^{k}{\beta }_{i}\left({u}_{j},{v}_{j}\right){x}_{ij}+{\varepsilon }_{j}$$
(8)

where \({y}_{j}\) is the IMLC in the district where tumor registry \(j\) is located, \({u}_{j}\) and \({v}_{j}\) are the spatial location of tumor registry \({j}\) in the study area, \(\beta_{0}({u}_{j},{v}_{j})\) is the local intercept at location \({j}\), \({k}\) is the total number of social-environmental factors, and \({\beta}_{1}\left({u_{j},{v}_{j}}\right)\) is the local coefficient for \({x}_{ij}\), which changes with the location of the sample point, \({x}_{ij}\) is the value of the \({i}\)th social-environmental factor in the district where tumor registry \({j}\) is located. \({\epsilon}_{j}\) denotes the random error for the district in which tumor registry \({j}\) is located, which is assumed to have a mean of 0 and a variance of σ2.

GTWR model

The GTWR model is a further extension of the GWR model that adds a temporal dimension to the GWR model [46], reflecting the spatiotemporal variability of the variables. The GTWR model was used to comprehensively analyze the spatial and temporal influences of social–environmental factors on IMLC in China. The formula for calculating the GTWR is [47]:

$${y}_{j}={\beta }_{0}\left({u}_{j},{v}_{j},{t}_{j}\right)+\sum_{i=1}^{k}{\beta }_{i}\left({u}_{j},{v}_{j},{t}_{j}\right){x}_{ij}+{\varepsilon }_{j}$$
(9)

where \({y}_{j}\) is the IMLC in the district where tumor registry \(j\) is located, \({t}_{j}\) denotes the data are observed at time period \(t\) where tumor registry \(j\) is located, \(\left({u}_{j},{v}_{j},{t}_{j}\right)\) denotes the spatiotemporal coordinates of the district where tumor registry \(j\) is located, \({\beta }_{0}\left({u}_{j},{v}_{j},{t}_{j}\right)\) denotes the constant for the district where tumor registry \(j\) is located, \(k\) is the number of social–environmental factors, \({\beta }_{i}\left({u}_{j},{v}_{j},{t}_{j}\right)\) is the regression coefficient of the \(i\)th social–environmental factor on the spatiotemporal scale in the district where tumor registry \(j\) is located, and \({x}_{ij}\) is the value of the \(i\)th influencing factor in the district where tumor registry \(j\) is located. \({\varepsilon }_{j}\) denotes the random error for the district in which tumor registry \(j\) is located, which is assumed to have a mean of 0 and a variance of σ2.

Evaluation of the model accuracy

The accuracy of the model in this study was evaluated via the corrected Akaike information criterion (AICc) [48], coefficient of determination (R2) [49] and adjusted R2 metric [49]. Details can be found in the Supplementary Materials.

Results

Temporal and spatial distribution patterns of the IMLC in China

For temporal distribution pattern of the IMLC in China, as shown in Fig. 3, the annual age-standardized IMLC were much higher in males than in females nationwide. The incidence of lung cancer in both males and females showed an overall upward trend; however, there was a decrease in 2015. The mortality of lung cancer in both males and females was steady throughout the years; however, there was a slight increase in 2012.

Fig. 3
figure 3

Temporal trends of the annual age-standardized IMLC (per 100, 000 population) nationwide from 2007 to 2016

In addition, based on data from tumor registries at the city/county level in 2016, the IMLC was calculated for both males and females at the provincial level and classified into seven levels using the natural breakpoint method. The spatial distribution pattern of the IMLC at the provincial level is shown in Fig. 4. For males, areas with the highest incidence (Fig. 4A) and mortality (Fig. 4C) were primarily in the Chongqing and Guizhou provinces, whereas areas with low incidence and mortality rates were in the western region. For females, areas with high incidence (Fig. 4B) and mortality (Fig. 4D) were observed in the northeastern region, with the highest rates in the Guizhou, Liaoning, and Zhejiang provinces, whereas areas with low incidence and mortality rates were mainly in the western region. Overall, the IMLC were spatially clustered, and the distribution patterns between incidence and mortality were similar.

Fig. 4
figure 4

The IMLC in both males and females by province in 2016. The number in parenthesis indicates the number of tumor registries in each province. A incidence of lung cancer in males, (B) incidence of lung cancer in females, (C) mortality of lung cancer in males, (D) mortality of lung cancer in females

Global spatial clustering characteristics of the IMLC in China

Three key parameters of the global spatial autocorrelation analysis were used to determine whether the data were spatially autocorrelated: Global Moran’s I, z-score, and p-value. A positive or negative Global Moran’s I indicates that the data are positively or negatively correlated, respectively. If it is 0, it indicates no spatial correlation. When the z-score is greater than 2.58 and the p-value is less than 0.05, it indicates significant spatial clustering of the age-standardized incidence or mortality of lung cancer. Table 3 shows the Global Moran’s I, z-scores, and p-values of the age-standardized IMLC in both males and females between 2007 to 2016.

Table 3 Results of Global Moran’s I for the IMLC in China from 2007 to 2016

As shown in Table 3, the Global Moran’s I of the IMLC in both males and females was greater than zero, which indicated a positive spatial autocorrelation. Specifically, during the period of 2007–2016, Global Moran’s I of incidences varied between 0.04–0.2 and 0.09–0.33 in males and females, respectively. The minimum values occurred in 2008 and 2016, whereas the maximum values occurred in 2009 and 2012. Global Moran’ I of mortalities varied between 0.01–0.12 and 0.11–0.32 in males and females, respectively. The minimum values occurred in 2008 and 2016, whereas the maximum values occurred in 2007 and 2016. In addition, Global Moran’s I passed the significance test (p < 0.05) for all data except for the incidence in males in 2007 and 2008 and the mortality in males in 2007–2009. In males, z-scores were all greater than 2.58 except for the incidence in 2007 and 2008 and mortality in 2007, 2008, 2009, and 2011. In females, z-scores of both the incidence and mortality were all greater than 2.58. In other words, except for the incidence and mortality in males in 2007 and 2008 and in 2007, 2008, 2009, and 2011, respectively, areas with high incidence (or mortality) and areas with low incidence (or mortality) were significantly clustered.

Overall, the spatial autocorrelation of the IMLC was higher in females than in males over time, and both showed spatial clustering. The overall autocorrelation trend in females decreased, whereas that in males showed fluctuations.

Moving trajectories of the centers of gravity of cold and hot spots of the IMLC in China

Since there are relatively more tumor registries in the late five years (2012—2016) than the early five years (2007—2011), the centers of gravity of cold and hot spots of IMLC were explored using the mean center in ArcGIS spatial statistics tools and standard deviation ellipses were drawn to obtain the moving trajectories of the centers of gravity of cold and hot spots from 2012 to 2016 (Fig. 5).

Fig. 5
figure 5

The distribution of the cold and hot spots of IMLC in both males and females in 2016 and the moving trajectories of the centers of gravity of the cold and hot spots between 2012 to 2016. A incidence in males, (B) incidence in females, (C) mortality in males, (D) mortality in females. NM: Nei Mongol, TJ: Tianjin, NX: Ningxia, SX: Shanxi, HeB: Hebei, SD: Shandong, GS: Gansu, ShX: Shaanxi, HeN: Henan, JS: Jiangsu, SC: Sichuan, HuB: Hubei, AH: Anhui, SH: Shanghai, CQ: Chongqing, HuN: Hunan, JX: Jiangxi, FJ: Fujian, ZJ: Zhejiang, GZ: Guizhou, LN: Liaoning, BJ: Beijing

As shown in Fig. 5, the main hot spots of IMLC in males were primarily in the central-southern region of China, including Chongqing, Hubei, Hunan, Guizhou, Guangxi, and Yunnan provinces in 2016. The cold spots were in western China, such as Tibet, Xinjiang, Qinghai, and Gansu Provinces, and in the western part of Nei Mongol (Figs. 5A and 5C). Compared to that of males, hot spots of IMLC in females were more concentrated, primarily in northeastern China, such as Jilin, Liaoning, Hebei Provinces. The cold spot areas had a wider range, typically occurring in the southern and western regions of China (Figs. 5B and 5D).

Regarding the centers of gravity, between 2012 to 2016, the hot spots of incidence in males were in Hubei and Hunan Provinces, but overall moved westward, close to Chongqing, while the centers of gravity of cold spots started from Ningxia Province and moved southeast toward Shanghai, and then moved toward the northwest end in Shandong Province (Fig. 5A). In the female population, regarding the centers of gravity, the hot spots of incidence were in the Liaoning Province. The results indicated that incidences moved from southwest to northeast in the province. The cold spots started in Hubei Province, moved northwest toward Shanxi, and then moved in the southeast direction, ending in Henan Province (Fig. 5B). The centers of gravity of hot spots of mortality in males were mainly in Hubei Province, but overall moved westward and ended on the Chongqing side at the junction of Chongqing and Hubei, while the centers of gravity of cold spots started in Gansu Province, moved toward the southeast and ending in Anhui Province (Fig. 5C). Hot spots of mortality in the female population were primarily present in Liaoning Province, moving back and forth to the southwest and northeast of the province, while the centers of gravity of cold spots were mainly in Jiangxi Province, moving toward the northeast and ending in Anhui Province (Fig. 5D).

Overall, the centers of gravity of the cold spots spanned larger areas, and the overall distribution was more scattered, whereas the centers of gravity of the hot spots were relatively concentrated.

Temporal and spatial distribution patterns of social–environmental factors in China

Figure 6 shows the temporal distribution pattern of social–environmental factors between 2007 to 2016, based on the city or district (county) where the tumor registry was located. Owing to a lack of data, the analysis of the temporal distribution only focused on 15 factors with data for each of the ten years and explored their trends over time.

Fig. 6
figure 6

Temporal trends of 15 social–environmental factors between 2007 to 2016. A PM2.5, (B) PM10, (C) SO2, (D) NO2, (E) O3, (F) NDVI, the number of days with (G) blowing sand (BS), (H) floating dust (FD), (I) haze, (J) sandstorms, (K) annual average temperature (TEMP), (L) the number of hospitals, (M) the number of medical beds (MedBeds), (N) the number of doctors, and (O) road area

For air pollutants (Fig. 6A–E), the annual average concentrations of PM2.5 and PM10 showed a fluctuating trend between 2007 to 2012, with a significant increase in 2013 and then a gradual decrease to the lowest point over the decade. The annual average concentration of SO2 had a clear downward trend since 2007, except for a small rebound in 2011. The annual average concentration of NO2 decreased significantly in 2008 and 2015 and remained stable during the rest of the year. The annual average concentration of O3 had two peaks in 2010 and 2015.

The annual average NDVI (Fig. 6F) decreased from 2007 to 2011, except for a significant increase in 2010. During the period between 2012–2016, the NDVI improved, although there were some fluctuations.

For meteorological factors (Fig. 6G–K), the annual average number of days with blowing sand fluctuated greatly and reached its lowest point in 2016; the overall trend of the annual average number of days with floating dust and sandstorms was very similar, with little fluctuation between 2007 and 2015, but a significant increase in 2016; the annual average number of days with haze showed a slow downward trend between 2007 to 2010, followed by a rapid increase, reaching a peak in 2014, and then a sharp decline, reaching its lowest point in 2016; and the overall annual average temperatures remained relatively stable except for an obvious increase in 2015.

Regarding socioeconomic factors (Fig. 6L–O), the annual average number of medical beds, doctors, and overall road area steadily increased over the decade. There were two trends in the average annual number of hospitals. From 2007 to 2012, the number was relatively low and fluctuated significantly, whereas from 2013 to 2016, it increased and remained relatively stable.

Figure 7 presents the spatial distribution pattern of all 20 social–environmental factors in 2016. Owing to the large size of China, these factors were spatially distributed. It should be noted that some factors lack data from 2016. Geomorphic data of the year 2009 was used, data of economic density, population density, and types of land use were from year 2015, DEM data was from year 2000, and the rest of data were from year 2016.

Fig. 7
figure 7

Spatial distributions pattern of social–environmental factors, including concentrations of (A) PM2.5, (B) PM10, (C) SO2, (D) NO2 and (E) O3, (F) NDVI, the number of days with (G) blowing sand (BS), (H) floating dust (FD), (I) haze and (J) sandstorms, (K) annual average temperature (TEMP), (L) economic density (ED), (M) population density, (N) terrain undulation (TU), (O) the number of hospitals, (P) the number of medical beds (MedBeds), (Q) the number of doctors, (R) road area, (S) geomorphic types (GT), (T) types of land uses (LT)

Influences of social–environmental factors on the IMLC in China

Correlations of different social–environmental factors

Annual average values for each social–environmental factor were obtained from 2007 to 2016. The correlation between themselves and their correlation with the age-standardized IMLC was determined on the reclassified data using Pearson’s correlation analysis.

For the correlation analysis between socio-environmental factors and IMLC, O3 had the most significant positive correlation, whereas annual average temperature had the highest negative correlation with the incidence and mortality in females. Besides, factors such as concentration of PM2.5, PM10, SO2, NO2 and O3, the number of hospitals, medical beds and doctors, road area, population density, geomorphic type, and types of land use all have significant positive correlation with the incidence and mortality in females. Population density had the most significant positive correlation, whereas TU had the highest negative correlation with the incidence and mortality in males (Fig. 8A). Besides, factors such as concentration of PM2.5, PM10, and NO2, the number of hospitals, medical beds and doctors, road area, population density, economic density, geomorphic type, and types of land use all have significant positive correlation with the incidence and mortality in males.

Fig. 8
figure 8

Correlation coefficients between each social–environmental factor and the IMLC (A), as well as between the social–environmental factors themselves (B) based on the reclassified data by Person’s correlation analysis

For correlation analysis between social–environmental factors, high positive correlations were seen between air pollutants and social factors. TU had the highest negative correlation with GT, followed by LT and air pollutants. Besides, annual average temperatures also had high negative correlations with O3, the number of dusty days, and the number of days with sandstorms (Fig. 8B).

Explanatory powers of different social–environmental factors

Before using the Geodetector for analysis, it was necessary to discretize the independent variables, that is, the social–environmental factors. The optimal discretization methods was chosen from the natural breaks, quantile classification, equal interval, geometrical interval, and standard deviation [50] to reclassify the original social–environmental data, which were then used for the subsequent factor and interaction detectors. Among the factors, geomorphic types and types of land use were reclassified data, therefore, there was no need for the discretization processing.

For factor detector, according to Table 4, the strongest explanatory factor for IMLC in males was terrain undulation, reaching 0.092 and 0.091, respectively, and the strongest explanatory factor for IMLC in females was O3, reaching 0.164 and 0.24, respectively.

Table 4 Explanatory powers of the social–environmental factors for the IMLC in both males and females obtained via the factor detector

Diseases are often not caused by a single factor but by interactions between various factors. Therefore, an interaction detection analysis was used to comprehensively reveal the impact of the interactions between social and environmental factors on IMLC.

As shown in Fig. 9, each factor showed enhanced explanatory power after interacted with each of other factors. In males, the interaction between terrain undulation and road area had the strongest explanatory power for IMLC with both reaching 0.22. Compared with the results of the factor detector in Table 4, the explanatory powers of terrain undulation and road area on the incidence of lung cancer were 0.092 and 0.073, respectively, and their explanatory powers on lung cancer mortality were 0.091 and 0.085, respectively, suggesting that the interaction between terrain undulation and road area greatly enhanced the explanatory power. In females, the interaction between O3 and road area explained the incidence of lung cancer more strongly reaching 0.274, their independent values were 0.164 and 0.052, respectively, according to results of factor detector. The interaction between O3 and the number of medical beds explained the mortality of lung cancer more strongly reaching 0.337 than each of them, which were 0.240 and 0.037, respectively, according to results of factor detector. Thus, it can be concluded that the interactions between social and environmental factors significantly enhanced their explanatory power for IMLC. This provides a favorable basis for understanding the influence of different interactions between social and environmental factors on lung cancer.

Fig. 9
figure 9

Results of interaction detection between the social–environmental factors for (A) the incidence in males, (B) the incidence in females, (C) the mortality in males, and (D) the mortality in females. All the values are listed in Supplementary Tables 1–4

Model establishment and evaluation

The OLS, GWR, and GTWR model could experience distortion or difficulty in estimating due to the presence of multicollinearity; therefore, before using these models to explore the influence of social–environmental factors on the IMLC, it is necessary to perform a multicollinearity diagnosis on each social–environmental factor to avoid multicollinearity. Table 5 shows the multicollinearity diagnosis results after removing factors with no significant correlation according to the Pearson’s correlation analysis.

Table 5 Results of multicollinearity analysis of the independent variables for the IMLC

According to Table 5, the VIF values of all independent variables are less than 10, which indicates no multicollinearity among the independent variables, thus the stability and accuracy of the model will not be affected.

OLS, GWR, and GTWR models were then established. As show in Table 6, The GTWR model showed the best regression performance, followed by the GWR model, both of which had much higher accuracy than that of the OLS model. The OLS model provides only a preliminary estimate of the linear relationship between the independent and dependent variables, and is not ideal for dealing with data with complex nonlinear relationships. Unlike the OLS model, the GWR model considers spatial heterogeneity, thereby improving its fitness. The GTWR model further considers temporal heterogeneity based on the GWR model. This is also why it shows better fitness than the OLS and GWR models when the data exhibit obvious spatiotemporal heterogeneity.

Table 6 The model performance evaluation results for each dependent variable

Coefficients of explanatory variables in the GTWR model

To further explore the complex relationship between social–environmental factors and IMLC, based on the optimal GTWR model, the regression coefficients of each social–environmental factor were calculated to evaluate the degree of their effects on IMLC. The coefficients of the factors were plotted on the map whose combinations had the highest q-value in interaction detection (Fig. 10). Figure 10A and B show the regression coefficients of terrain undulation and road area for the incidence in males, respectively. Figure 10E and F show the regression coefficients of terrain undulation and road area for the mortality in males, respectively. In most of the area, the degree of terrain undulation showed a negative correlation with the incidence and mortality in males, increasing northward and southward from the provinces around the Yangtze River area. Road area was positively correlated with the incidence and mortality in males in the western, central, and northeastern China. The rest of the region showed a negative correlation with the incidence and mortality in males increasing from the center towards the northwest and southeast directions. Figure 10C and D show the regression coefficients of O3 and road area for the incidence in females. Figure 10G and H show the regression coefficients of O3 and the number of medical beds for the mortality in females. O3 showed a negative correlation with the incidence and mortality in females, mainly in Sichuan, Chongqing, and Shanxi provinces, as well as in northeastern China, mainly in Heilongjiang and Jilin provinces. A positive correlation with the incidence and mortality in females was observed in central and eastern China. The road area showed a weak positive correlation with the incidence in females in parts of central and northeastern China and a negative correlation in eastern China, increasing from west to east. The number of medical beds showed a positive correlation with the mortality in females in the northeast and part of central China, whereas in other regions, there was a progressively increasing negative correlation toward the west and east.

Fig. 10
figure 10

Distribution of coefficients of factors in GTWR model. Model coefficients of (A) terrain undulation (TU) and (B) road area for the incidence in males, (C) O3 and (D) road area for the incidence in females, (E) TU and (F) road area for the mortality in males, (G) O3 and (H) the number of medical beds (MedBeds) for the mortality in females

Discussion

In this study, spatial analysis methods such as spatial autocorrelation (Moran’s I and Getis-Ord Gi*) and directional distribution (standard deviation ellipse) were used to explore the spatial and temporal evolution patterns of the IMLC in both males and females in China on a national scale. In addition, the contributions of social–environmental factors to the IMLC were analyzed based on the OLS, GWR, and GTWR models. We focused on 20 macro-factors—air pollutants (PM2.5, PM10, NO2, SO2, and O3), meteorological factors (the number of days with blowing sand, floating dust, haze, or sandstorms and temperatures), natural environmental factors (the NDVI, geomorphic type, types of land use, and terrain undulation), and socioeconomic factors (economic density, population density, road area, the number of hospitals, medical beds and doctors)—rather than micro-factors, such as smoking, genetics, and life styles, which are related to individual status, thus providing support for the comprehensive risk evaluation and control of lung cancer in China from a broad perspective.

It was determined that the distributions of IMLC in males and females had significant spatial and temporal heterogeneity and showed fluctuations between 2007 and 2016, which may be attributed to changes in social–environmental factors over time. Overall, the IMLC was higher in males than in females. The IMLC in both males and females showed positive autocorrelation and spatial clustering patterns. The cold and hot spots changed both spatially and temporally. For cold spots, the distribution of centers of gravity was dispersed, and their directional changes were obvious: the centers of gravity of the incidence and mortality in males showed a trend of change in the northwest-southeast direction, whereas the centers of gravity of the incidence in females first moved in the southeast-northwest direction and then in the northwest-southeast direction. The centers of gravity of the mortality in females moved in the southwest-northeast direction. In contrast, the centers of gravity of the hotspots were relatively stable, and their overall distribution was concentrated.

Varying degrees of correlation exist between IMLC and social–environmental factors. The strongest explanatory factors for IMLC in males and females were terrain undulation and O3, respectively, which showed enhanced explanatory power when interacting with other social–environmental factors. Among them, the interaction between terrain undulation and road area had the highest explanatory power for both incidence and mortality in males. The interaction between O3 and road area had the highest explanatory power for the incidence in females, and the interaction between O3 and the number of medical beds had the highest explanatory power for mortality in females. In fact, in males, the distribution of IMLC coincided with the distribution of terrain undulation; generally, places with higher terrain undulation had lower incidence and mortality, which was consistent with the results from Ren’s study in that they showed that the village-level mortality rate of lung cancer was negatively correlated with relief degree of land surface [17]. This may be because air pollutants are less likely to spread in areas with greater terrain undulation but are more likely to spread in plain areas. In females, the distribution of IMLC coincided with the distribution of O3. Generally, places with higher concentrations of O3 demonstrated higher incidence and mortality rates. Previous studies have also demonstrated the contribution of O3 to lung tumorigenesis [51, 52]. The concentration of PM2.5 is shown especially high in the northwest part of China where the Taklimakan Desert is located (Fig. 7A). Loose soil, low vegetation coverage, and arid climate of the desert lead to a high concentration of PM2.5 in this area. However, few people live there, thus it has little contribution to the overall IMLC in this whole region. High PM2.5 concentration but low IMLC may result in a significant but not so obvious correlation between PM2.5 concentration and IMLC as that between O3 and IMLC when conducting Pearson’s correlation analysis (Fig. 8A). Interestingly, socioeconomic factors, such as the number of hospitals, medical beds and doctors, road area, population density all showed significant positive correlations with the IMLC in both males and females. These socioeconomic factors may indirectly reflect the level of economic development in a region. The more developed the economy, the worse the environment may be, as air pollutants, such as PM2.5, PM10, and NO2, also showed significant positive correlations with the IMLC in both males and females. Although these social–environmental factors can explain the spatiotemporal distribution of lung cancer, it is undeniable that the factors contributing to lung cancer are complex and diverse, and there may be other factors that were not considered in this study. For example, study of Cardoso et al. not only considered the effect of PM10, but also the urbanization rate and percentage of industrial areas on lung cancer mortality in Portugal [53]. The introduction of these factors may provide a comprehensive perspective on the causes of lung cancer.

This study focused on systematically analyzing the influences of socio–environmental factors on the spatiotemporal patterns of lung cancer via the Geodetector model, highlighting its unique advantages in identifying spatial heterogeneity and interactive factors. Compared with traditional regression models that rely on linear relationships, the Geodetector model processes nonlinear spatial data and accounts for the synergistic effects of multiple factors, thus effectively assessing complex environmental health exposure processes. In our study, the effects of social–environmental factors on the IMLC showed significant spatial and temporal variability: in males, the interaction between terrain undulation and road area exhibited the highest explanatory power for the incidence and mortality, with a value of 0.22 for both; in females, the interaction between O3 and road area and the interaction between O3 and the number of medical beds exhibited the highest explanatory powers for the incidence and mortality, reaching 0.27 and 0.34, respectively. Similar studies have been conducted. For example, Xing et al. employed the Geodetector q-statistic to examine the nonlinear spatial association between outdoor air pollution and the incidence of lung cancer in China, revealing significant regional variations: the influences of SO2 and PM2.5 on lung cancer should be considered in North China, and the impacts of O3 and CO, as well as their interactions with SO2, should receive more attention in South China [54]. Similarly, Liu et al. used the Geodetector model to identify the nonlinear enhancement effect of coal/biomass fuel use and outdoor PM2.5 in evaluating the short-term impacts of indoor and outdoor air pollution on lung cancer in Henan Province. The Geodetector model can capture the synergistic toxicity of multiple pollution sources in a closed environment [55]. Overall, the advantages of the Geodetector model are reflected in three aspects: first, it can accurately identify the core driving factors via the determination of the impact (q-value) of a single factor; second, it can identify the interaction types of multiple factors and reveals the synergistic effect of environmental behavioral factors; and third, it can quantify spatial heterogeneity to provide a basis for locating high-risk areas and implementing prevention and control strategies. Although the method faces limitations, such as potential subjective bias introduced by discretization (e.g., natural breakpoint classification), its potential in environmental health research remains substantial.

For model selection and accuracy comparison, spatial statistical models, including OLS, GWR, and GTWR, were applied. These methods have been extensively used in previous studies as OLS and GWR models were used to analyze the spatial relationship between the most common cancers and NO2 concentrations [56, 57]. Guo et al. adopted the GTWR model to capture the spatiotemporal heterogeneity of socioeconomic and environmental determinants and to determine their effects on the mortality of chronic obstructive pulmonary disease in Xi’an, China [58]. Huang et al. reported the spatiotemporal distribution characteristics and impacts of macro-factors, such as socioeconomic factors, climatic conditions, and healthcare resources, on gastric cancer incidence [59]. Models such as time-weighted regression (TWR), multiscale geographically weighted regression (MGWR), and geographically weighted logistic regression (GWLR) have also been used. Bai et al. [60] and Zhang et al. [61] used the TWR model to study macro-factors related to breast and thyroid cancers, respectively. Anderson et al. analyzed the relationship between breast cancer mortality and environmental factors, lifestyle, and medical care levels in the United States, using OLS and MGWR models [62]. Goovaerts et al. explored the relationship between prostate cancer incidence and potential influencing factors in Florida, USA, using both the GWR and GWLR models [63]. However, different research backgrounds and data characteristics may lead to variations in the model performance. In this study, compared to OLS and GWR, GTWR captured the heterogeneous influence of factors at different geographical locations and time points, thereby achieving better data fitness. This conclusion was consistent with the previous studies [58]. Using reclassified data, we identified that the optimal GTWR model reflecting the relationship between the IMLC and social–environmental factors. Compared to Guo et al. (2023) study, which explored the factors influencing the incidence of lung cancer in China, the adjusted R2 of our GTWR model was better [23]: for the incidence of lung cancer in males, the adjusted R2 is 0.358 from our study versus 0.2726 from Guo’s study; for the incidence of lung cancer in females, the adjusted R2 is 0.446 from our study versus 0.3552 from Guo’s study [23]. The regression coefficients of the model revealed that the influence of each social–environmental factor on the IMLC showed significant spatiotemporal variability. Because some influencing factors have a stronger correlation with IMLC in certain local regions, more specific government policies can be implemented. The results revealed that O3 significantly influences the IMLC in females in the Bohai Rim region (including Liaoning, Hebei, and Shandong Provinces), whereas terrain undulation notably impacts the IMLC in males in Liaoning, Jiangsu, and Sichuan Provinces. For industrial clusters around the Bohai Sea, implementing enclosed management of volatile organic compounds, establishing ecological buffer zones between industrial and residential areas, and establishing an ozone health warning and screening network for high-risk female populations are recommended. The impact of terrain undulation on the IMLC in females is significant in southern China (Guangdong, Guangxi, and Yunnan Provinces) and western Nei Mongol. Therefore, to eliminate pollutant retention caused by the hilly and basin terrain in southern China, the establishment of mountain–urban ventilation corridor systems in the Pearl River Delta and Beibu Gulf urban clusters is recommended to relocate heavy industrial zones downstream of the dominant wind direction and to establish ecological buffer zones (e.g., eucalyptus and camphor forests) in river valleys to block the spread of pollutants toward residential areas. In western Nei Mongol, ecological restoration should be implemented in mining‒pastoral transition zones, and natural dust barriers should be built using drought-resistant shrubs. Moreover, the density and height gradient of urban buildings should be optimized to promote local circulation and enhance pollutant diffusion.

Although this study aimed to systematically explore the spatial driving mechanisms of the influence of social‒environmental factors on the IMLC, several limitations occur: (1) Due to the complex pathogenesis of lung cancer, other influencing factors that were not explored in this study may exist. Future research could aim to incorporate more factors to optimize our model. (2) Due to the limited data sources, the resolution of certain types of data is still insufficient. In the future, if data with a higher resolution become available, it may help to improve the model performance. (3) In this study, we employed only the OLS, GWR, and GTWR models to analyze the impacts of various socioenvironmental factors on the IMLC. Future research could aim to integrate additional analytical methods, such as the MGWR model [62] and deep-learning methods [64, 65], which may further improve the model.

Conclusions

Focusing on the entire region of China, this study explores the effects of 20 social–environmental factors on the IMLC spanning ten years from 2007 to 2016. The distribution of IMLC had significant spatial heterogeneity. Male population hot spots were observed in the central-southern region of China, and female population hot spots demonstrated a concentrated pattern and were observed primarily in northeastern China. Factors with the strongest explanatory power were terrain undulation and O3 concentration and GTWR was the best model to explain the effects of social–environmental factors on the IMLC. Owing to the wide coverage of the research area and long-time span, it not only provided sufficient samples but also helped in the in-depth exploration of the related field. In addition, future studies could adopt more advanced methods and models, such as machine and deep learning models [64, 65]. The driving forces behind lung cancer are expected to be revealed more comprehensively with the support of these advances, thus providing useful information for lung cancer prevention and treatment in the future.

Data availability

All data generated or analyzed during this study are listed in Table 1 in this article.

Abbreviations

IMLC:

Incidence and mortality of lung cancer

OLS:

Ordinary least squares

GWR:

Geographically weighted regression

GTWR:

Geographically and temporally weighted regression

NDVI:

Normalized difference vegetation index

GDP:

Gross domestic product

GIS:

Geographic information systems

DEM:

Digital elevation model

BS:

Blowing sand

FD:

Floating dust

TEMP:

Temperatures

GT:

Geomorphic type

LT:

Types of land use

TU:

Terrain undulation

ED:

Economic density

MedBeds:

Medical beds

NESSDC:

National earth system science data center

OMI:

Ozone monitoring instrument

Moran’s I:

Moran’s index

AICc:

Corrected Akaike information criterion

TWR:

Time-weighted regression

MGWR:

Multiscale geographically weighted regression

GWLR:

Geographically weighted logistic regression

References

  1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca-Cancer J Clin. 2018;68(6):394–424.

    Article  PubMed  Google Scholar 

  2. Liu SZ, Chen Q, Guo LW, Cao XQ, Sun XB, Chen WQ, He J. Incidence and mortality of lung cancer in China, 2008–2012. Chinese J Cancer Res. 2018;30(6):580–7.

    Article  Google Scholar 

  3. Naghavi M, Abajobir AA, Abbafati C, Abbas KM, Abd-Allah F, Abera SF, Aboyans V, Adetokunboh O, Arnlöv J, Afshin A, et al. Global, regional, and national age-sex specific mortality for 264 causes of death, 1980–2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet. 2017;390(10100):1151–210.

    Article  Google Scholar 

  4. Zheng RS, Chen R, Han BF, Wang SM, Li L, Sun KX, Zeng HM, Wei WW, He J. Cancer incidence and mortality in China, 2022. Zhonghua Zhong Liu Za Zhi. 2024;46(3):221–31.

    CAS  PubMed  Google Scholar 

  5. Huang JJ, Deng YY, Tin MS, Lok V, Ngai CH, Zhang L, Lucero-Prisno DE, Xu WH, Zheng ZJ, Elcarte E, et al. Distribution, Risk Factors, and Temporal Trends for Lung Cancer Incidence and Mortality A Global Analysis. Chest. 2022;161(4):1101–11.

    Article  PubMed  Google Scholar 

  6. Sun YY, Kinsela AS, Cen XT, Sun SQ, Collins RN, Cliff DI, Wu YX, Waite TD. Impact of reactive iron in coal mine dust on oxidant generation and epithelial lung cell viability. Sci Total Environ. 2022;810:152277.

  7. Metintas M, Ak G, Metintas S. Environmental asbestos exposure and lung cancer. Lung Cancer. 2024;194:107850.

  8. Kerry R, Goovaerts P, Ingram B, Tereault C. Spatial Analysis of Lung Cancer Mortality in the American West to Improve Allocation of Medical Resources. Appl Spat Anal Policy. 2020;13(4):823–50.

    Article  Google Scholar 

  9. Cheng E, Weber M, Steinberg J, Yu XQ. Lung cancer risk in never-smokers: An overview of environmental and genetic factors. Chinese J Cancer Res. 2021;33(5):548–62.

    Article  CAS  Google Scholar 

  10. Guo HG, Li WF, Wu JS. Ambient PM2.5 and Annual Lung Cancer Incidence: A Nationwide Study in 295 Chinese Counties. Int J Environ Res Public Health. 2020;17(5):1481.

  11. Han X, Liu YQ, Gao H, Ma JM, Mao XX, Wang YT, Ma XD. Forecasting PM2.5 induced male lung cancer morbidity in China using satellite retrieved PM2.5 and spatial analysis. Sci Total Environ. 2017;607:1009–1017.

  12. Zhou Y, Li LS, Hu L. Correlation Analysis of PM10 and the Incidence of Lung Cancer in Nanchang, China. Int J Environ Res Public Health. 2017;14(10):1253.

  13. Yu P, Xu RB, Huang WZ, Yang ZY, Coelho M, Saldiva PHN, Wen B, Wu Y, Ye TT, Zhang YW, et al. Short-term ozone exposure and cancer mortality in Brazil: A nationwide case-crossover study. Int J Cancer. 2024;155(10):1731–40.

    Article  CAS  PubMed  Google Scholar 

  14. Hassan A, Latif MT, Soo CI, Faisal AH, Roslina AM, Andrea YLB, Hassan T. Short communication: Diagnosis of lung cancer increases during the annual southeast Asian haze periods. Lung Cancer. 2017;113:1–3.

    Article  CAS  PubMed  Google Scholar 

  15. Swiatkowska B, Szeszenia-Dabrowska N, Sobala W, Wilczynska U. Occupational risk factors for lung cancer - A case-control study. Lodz industrial center Medycyna Pracy. 2008;59(1):25–34.

    CAS  PubMed  Google Scholar 

  16. Yang L, Guo FY, Wang N, Liu S, Zhang X, Li HC, Li QY, Xue T, Xiao QY, Li X, et al. Urban greenness and survival in lung cancer patients: A registry-based cohort study in Beijing. Ecotoxicol Environ Safety. 2021;228:113042.

  17. Ren HY, Cao W, Chen GB, Yang JX, Liu LQ, Wan X, Yang GH. Lung Cancer Mortality and Topography: A Xuanwei Case Study. Int J Env Res Pub He. 2016;13(5):473.

    Article  Google Scholar 

  18. Xie HJ, Shao R, Yang YP, Cruz R, Zhou XL. Impacts of Built Environment on Risk of Women’s Lung Cancer: A Case Study of China. Int J Environ Res Public Health. 2022;19(12):7157.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Sun WY, Bao PP, Zhao XJ, Tang J, Wang L. Road Traffic and Urban Form Factors Correlated with the Incidence of Lung Cancer in High-density Areas: An Ecological Study in Downtown Shanghai, China. Journal of Urban Health-Bulletin of the New York Academy of Medicine. 2021;98(3):328–43.

    PubMed  PubMed Central  Google Scholar 

  20. Shao YQ, Wang YJ, Yu HJ, Zhang YY, Xiang F, Yang Y, Yang Y, Li LH, Dong SR, Yang DJ, et al. Geographical variation in lung cancer risk associated with road traffics in Jiading District. Shanghai Science of the Total Environment. 2019;652:729–35.

    Article  PubMed  Google Scholar 

  21. Kuo TM, Meyer AM, Baggett CD, Olshan AF. Examining determinants of geographic variation in colorectal cancer mortality in North Carolina: A spatial analysis approach. Cancer Epidemiol. 2019;59:8–14.

    Article  PubMed  Google Scholar 

  22. Caswell JM. Prevalence of reported high blood pressure in Canada: investigation of demographic and spatial trends. J Public Health. 2017;25(1):49–59.

    Article  Google Scholar 

  23. Guo B, Gao Q, Pei L, Guo T, Wang Y, Wu H, Zhang W, Chen M. Exploring the association of PM2.5 with lung cancer incidence under different climate zones and socioeconomic conditions from 2006 to 2016 in China. Environ Sci Pollut Res. 2023, 30(60):126165–126177.

  24. Roquette R, Painho M, Nunes B. Geographical patterns of the incidence and mortality of colorectal cancer in mainland Portugal municipalities (2007–2011). BMC Cancer. 2019;19(1):512.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Marques VD, Massago M, da Silva MT, Roskowski I, de Lima DAN, dos Santos L, Louro E, Gonçalves ST, Pedroso RB, Obale AM, et al. Exploring regional disparities in lung cancer mortality in a Brazilian state: A cross-sectional ecological study. Plos One. 2023;18(6):e0287371.

  26. Tobler WR. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ Geogr. 1970;46(sup1):234–40.

    Article  Google Scholar 

  27. Mandal B, Goswami KP.Evaluating the influence of biophysical factors in explaining spatial heterogeneity of LST: Insights from Brahmani-Dwarka interfluve leveraging Geodetector, GWR, and MGWR models. Physics and Chemistry of the Earth, Parts A/B/C. 2025;138:103836.

  28. Miao Y, Geng C, Ji Y, Wang S, Wang L, Yang W. Understanding the Dynamics of PM2.5 Concentration Levels in China: A Comprehensive Study of Spatio-Temporal Patterns, Driving Factors, and Implications for Environmental Sustainability. Sustainability. 2025;17(4):1742.

  29. Zhu M, Yu X, Chen K, Tan H, Yuan J. Spatiotemporal characteristics and driving factors of chemical oxygen demand emissions in China’s wastewater: An analysis based on spatial autocorrelation and Geodetector. Ecological Indicators. 2024;166:112308.

  30. Garten RJ, Davis CT, Russell CA, Shu B, Lindstrom S, Balish A, Sessions WM, Xu XY, Skepner E, Deyde V, et al. Antigenic and Genetic Characteristics of Swine-Origin 2009 A(H1N1) Influenza Viruses Circulating in Humans. Science. 2009;325(5937):197–201.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Mondal S, Gavsker KK. Investigating the urban eco-environmental quality utilizing remote sensing based approach: evidence from an industrial city of Eastern India. Discov Appl Sci. 2024;6(12):666.

    Article  Google Scholar 

  32. Mandal B, Mondal S. Unveiling spatio-temporal mysteries: A quest to decode India’s Dengue and Malaria trend (2003–2022). Spatial Spatio Temp Epidemiol. 2024;51:100690.

  33. Anselin L. Spatial Econometrics: Methods and Models. 1988.

  34. Ord JK, Getis A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application. 1995;27(4):286-306.

  35. Duman Z, Mao XQ, Cai BF, Zhang QY, Chen YP, Gao YB, Guo Z. Exploring the spatiotemporal pattern evolution of carbon emissions and air pollution in Chinese cities. J Environ Manage. 2023;345:118870.

  36. Chen T, Deng S, Li M. Spatial Patterns of Satellite-Retrieved PM2.5 and Long-Term Exposure Assessment of China from 1998 to 2016. Int J Environ Res Public Health. 2018;15(12):2785.

  37. Zhao Y, Yuan D, Du JT, Chen JJ. Geo-Ellipse-Indistinguishability: Community-Aware Location Privacy Protection for Directional Distribution. IEEE Trans Knowl Data Eng. 2023;35(7):6957–67.

    Google Scholar 

  38. Gui D, He H, Liu C, Han S. Spatio-temporal dynamic evolution of carbon emissions from land use change in Guangdong Province, China, 2000–2020. Ecological Indicators. 2023;156:111131.

  39. Zhang J, Zhang P, Gu XC, Deng MJ, Lai XY, Long AH, Deng XY. Analysis of Spatio-Temporal Pattern Changes and Driving Forces of Xinjiang Plain Oases Based on Geodetector. Land. 2023;12(8):1508.

  40. Rodgers JL, Nicewander WA. Thirteen Ways to Look at the Correlation Coefficient. Am Stat. 1988;42(1):59–66.

    Article  Google Scholar 

  41. Zhao R, Zhan L, Yao M, Yang L. A geographically weighted regression model augmented by Geodetector analysis and principal component analysis for the spatial distribution of PM2.5. Sustain Cities Soc. 2020;56:102106.

  42. Wang JF, Li XH, Christakos G, Liao YL, Zhang T, Gu X, Zheng XY. Geographical Detectors-Based Health Risk Assessment and its Application in the Neural Tube Defects Study of the Heshun Region, China. Int J Geogr Inf Sci. 2010;24(1):107–27.

    Article  CAS  Google Scholar 

  43. Banks HT, Joyner ML. AIC under the framework of least squares estimation. Appl Math Lett. 2017;74:33–45.

    Article  Google Scholar 

  44. Fotheringham AS, Oshan TM. Geographically weighted regression and multicollinearity: dispelling the myth. J Geogr Syst. 2016;18(4):303–29.

    Article  Google Scholar 

  45. Brunsdon C, Fotheringham AS, Charlton ME. Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity. 1996;28(4):281–98.

    Google Scholar 

  46. Huang B, Wu B, Barry M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int J Geogr Inf Sci. 2010;24(3):383–401.

    Article  Google Scholar 

  47. Fotheringham AS, Crespo R, Yao J. Geographical and Temporal Weighted Regression (GTWR). Geogr Anal. 2015;47(4):431–52.

    Article  Google Scholar 

  48. Sugiura N. Further analysts of the data by akaike’ s information criterion and the finite corrections. 1978;7:13–26.

    Google Scholar 

  49. Harel O. The estimation of R2 and adjusted R2 in incomplete data sets using multiple imputation. J Appl Stat. 2009;36(10):1109–18.

    Article  Google Scholar 

  50. Song Y, Wang J, Ge Y, Xu C. An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: cases with different types of spatial data. GIScience & Remote Sensing. 2020;57(5):593–610.

    Article  Google Scholar 

  51. Guo YM, Zeng HM, Zheng RS, Li SS, Barnett AG, Zhang SW, Zou XN, Huxley R, Chen WQ, Williams G. The association between lung cancer incidence and ambient air pollution in China: A spatiotemporal analysis. Environ Res. 2016;144:60–5.

    Article  CAS  PubMed  Google Scholar 

  52. Guo HG, Liu JM, Jing W. Ambient Ozone, PM1 and Female Lung Cancer Incidence in 436 Chinese Counties. Int J Env Res Pub He. 2021;18(19):10386.

  53. Cardoso D, Painho M, Roquette R. A geographically weighted regression approach to investigate air pollution effect on lung cancer: A case study in Portugal. Geospatial Health. 2019;14(1):701.

  54. Xing DF, Xu CD, Liao XY, Xing TY, Cheng SP, Hu MG, Wang JX. Spatial association between outdoor air pollution and lung cancer incidence in China. Bmc Public Health. 2019;19(1):1377.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Liu Y, Tian Z, He X, Wang X, Wei H. Short-term effects of indoor and outdoor air pollution on the lung cancer morbidity in Henan Province. Central China Environmental Geochemistry and Health. 2021;44(8):2711–31.

    Article  PubMed  Google Scholar 

  56. Al-Ahmadi K, Al-Zahrani A. NO2 and Cancer Incidence in Saudi Arabia. Int J Environ Res Public Health. 2013;10(11):5844–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Al-Ahmadi K, Al-Zahrani A. Spatial Autocorrelation of Cancer Incidence in Saudi Arabia. Int J Environ Res Public Health. 2013;10(12):7207–28.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Guo B, Wang Y, Pei L, Yu Y, Liu F, Zhang DH, Wang XX, Su Y, Zhang DM, Zhang B, et al. Determining the effects of socioeconomic and environmental determinants on chronic obstructive pulmonary disease (COPD) mortality using geographically and temporally weighted regression model across Xi'an during 2014–2016. Sci Total Environ. 2021;756:143869.

  59. Huang B, Ding F, Liu J, Li Y. Government drivers of gastric cancer prevention: The identification of risk areas and macro factors in Gansu, China. Prevent Med Rep. 2023;36:102450.

  60. Bai X, Zhang X, Shi H, Geng G, Wu B, Lai Y, Xiang W, Wang Y, Cao Y, Shi B et al. Government drivers of breast cancer prevention: A spatiotemporal analysis based on the association between breast cancer and macro factors. Front Public Health. 2022;10:954247.

  61. Zhang X, Lai Y, Bai X, Wu B, Xiang W, Zhang C, Geng G, Miao W, Xia Q, Wu Q et al. Determining the spatial non-stationarity underlying social and natural environment in thyroid cancer in China. Sci Total Environ. 2023;870:162009.

  62. Anderson T, Herrera D, Mireku F, Barner K, Kokkinakis A, Dao H, Webber A, Merida AD, Gallo T, Pierobon M. Geographical Variation in Social Determinants of Female Breast Cancer Mortality Across US Counties. JAMA Network Open. 2023;6(9):e2333618.

  63. Goovaerts P, Xiao H, Adunlin G, Ali A, Tan F, Gwede CK, Huang Y. Geographically-weighted regression analysis of percentage of late-stage prostate cancer diagnosis in Florida. Appl Geogr. 2015;62:191–200.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Bibault J-E, Bassenne M, Ren H, Xing L. Deep Learning Prediction of Cancer Prevalence from Satellite Imagery. Cancers. 2020;12(12):3844.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Karzai S, Zhang ZY, Sutton W, Prescott J, Segev DL, McAdams-DeMarco M, Biswal SS, Ramanathan M, Mathur A. Ambient particulate matter air pollution is associated with increased risk of papillary thyroid cancer. Surgery. 2022;171(1):212–8.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by grants from the National Natural Science Foundation of China (grant number: 31200581) and the Natural Science Foundation of Zhejiang Province (grant number: LY15H160068).

Author information

Authors and Affiliations

Authors

Contributions

BX designed the study concept, managed data, and interpretated study findings. JW participated in extracting the original data, implemented methods, and wrote the first draft of the paper. CW contributed to the revision and finalization of the paper. DZ, YK, HY checked and verified the data used in the analysis. The corresponding author ZL aquired the fund for this research, finalized and submitted the article for publication. All the authors have read and approved the final manuscript.

Corresponding author

Correspondence to Zhe Lu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, B., Wang, J., Wang, C. et al. Determining the effects of social–environmental factors on the incidence and mortality of lung cancer in China based on remote sensing and GIS technology during 2007–2016. BMC Public Health 25, 1673 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12889-025-22591-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12889-025-22591-w

Keywords