Study area

We focused on hiking activity in the four main islands of Japan (Honshu, Hokkaido, Kyushu, and Shikoku) and nearby small islands connected to the main islands by a bridge (Fig. 1a). These islands lie between latitudes 31.0° and 45.5°N, and the total area is 361,000 km2. The islands are generally mountainous and tallest mountains in central Honshu exceed 3000 m a.s.l. (Fig. 1c). In Tokyo, mean monthly temperatures range between 5.2 °C in January and 26.4 °C in August, while they range between − 18.4 °C in January and 6.2 °C in August at the summit of the highest mountain, Mt. Fuji (3776 m a.s.l., Japan Meteorological Agency). In northern Honshu and Hokkaido, snow depth can exceed 1 m even at low elevations and high mountains are covered with snow even in southern Japan.

Vegetation excluding farmland and pasture covers 70.9% of the study area and the 93.9% is forest. Plantations of mostly evergreen conifers such as Japanese cedar (Cryptomeria japonica) occupy 37.6% of the vegetation area (National Surveys on the Natural Environment by the Biodiversity Center of Japan 2nd–7th; Secondary vegetation after past human disturbances occupies 39.4% of the total vegetation and the remaining 23.0% is primary vegetation. The typical primary vegetation types are, from north to south, boreal mixed forest, deciduous broad leaved forest, and evergreen broad leaved forest.

Grid squares

Records of hiking activity were summarized for 4244 secondary grid squares based on Standard Grid Square System, which was defined by the Minister’s Order of Administrative Management Agency in 1973. In the system, the secondary grid was defined as a grid of 5′ in latitude and 7′ 30″ in longitude, which roughly corresponds to a 10 km grid in the study area. This is the standard grid system of the government and we adopted the system for convenience in future application uses and communication with practitioners. The grids, which are defined by latitude and longitude, are different in the area up to 22% between the north and south ends. Therefore, area of each grid was included in a model as an offset term.

Hiking activity

According to a government survey in 2016, (the Survey on Time Use and Leisure Activities by the Statistics Bureau of Japan,, 10.0% (about 10.7 million people) of Japan’s population age 15 or over enjoyed hiking/mountaineering in the last year. The census showed also that hiking is more popular among urban residents in the metropolitan areas. Both multi-day expedition to high mountains and day trek to low mountains in suburban areas are popular. Because of the severe winter climate, unskilled hikers use the high mountains in summer and early autumn only. During a summer vacation, whose peak time in Japan is August, many hikers enjoy multi-day trips to distant mountains. Spring and autumn are also popular seasons because of the mild weather and the scenic beauty of the fresh green or autumn colors.

Data collection

In this study, we used number of hiking records accumulated on the most popular social networking service for hikers in Japan (Yamareco; as a surrogate for flow of recreation service. For all the registered destinations in the study area, the number of hiking records for each month and the latitude and longitude of the destination were collected from the service in September 2016 with the rvest28 package in R software29. This service launched in October 2005 hosts records of the hiking route, photos, participants, and impressions of a hiking trip and facilitates communication among users. Although monthly number of records for each destination is always available on the site, the exact date of each hiking record is not always public information for privacy reasons; therefore, all of the records from the almost 11 years since the start of the service were lumped together in our analysis. Hikers may record multiple places in a single trip, so the total number of records must be larger than the number of unique trips. Users of the service sometime record a place that is not a destination, e.g. start points and stations of trails, parking areas, stations of transports, and bus stops. Such records were excluded before analyses as far as it can be judged from the name of the place. As a result, the total number of hiking records was 4,708,229 records for 16,179 destinations. Finally, these records were assigned to the 4244 grids based on the latitude and longitude of each destination and then total number of records for each grid was used as a surrogate of the recreation service flow in our analysis. Not only total number but also monthly number was used in our analysis to examine seasonal changes in associations between the service and vegetation. Total record number of the grids was strongly right-skewed; no record (handled as 0 in our analysis) was found in 2036 grids while mean and maximum record number were 1109 and 350,384, respectively.

Explanation variables

Fifty ecological, environmental, and social/infrastructural variables (Table S1) were prepared for each grid by using ArcGIS version 10.5 (ESRI, Redlands, CA, USA). For vegetation and land-use attributes, National Surveys on the Natural Environment by the Biodiversity Center of Japan (2nd–7th; and National Land Numerical Information ( were used. The proportion of sea, that of total vegetation cover (excluding agricultural land and pasture) to land area, that of agricultural land (including pasture) to land area, that of natural vegetation (vegetation excluding plantations) to total vegetated area, and that of primary vegetation (vegetation with no record or evidence of a disturbance) to natural vegetation were summarized at four spatial scales: a radius of 10 km, 20 km, 50 km, and 100 km from the center of each grid. Spatial patterns of the three vegetation variables in 10 km radius were summarized in Fig. 1d–f.

Maximum elevation, minimum elevation, and ruggedness (index of topographic heterogeneity30) were summarized at the four spatial scales based on a digital elevation model (10-m resolution) provided by the Geospatial Information Authority of Japan ( For climatic variables (annual and monthly mean temperature, annual and monthly precipitation, annual and monthly hours of sunshine, and annual maximum snow depth), the National Land Numeric Information provided by the Ministry of Land, Infrastructure, Transport and Tourism of Japan ( was referenced. Densities of population and roads at the four spatial scales were prepared from population census data from the Statistics Bureau of Japan ( and the National Land Numeric Information. For calculation of these densities, the sea surface was excluded. In addition, latitude and longitude of center of each grid were also used as explanatory variables to average effects of spatial coordinates.

Statistical analysis

In this study, we employed BRT, a machine-learning method based on regression trees31 for modeling the complex relationship between a CES flow and landscape attributes12. BRT is an ensemble learning method where multiple regression trees are sequentially combined to minimize the loss function by means of gradient descent. This technique has advantage in the development of a model with a high predictive performance, in which high-dimensional interactions among explanatory variables and nonlinear responses are fully accounted for. In ecology, BRT has been frequently used for modeling of a species distribution32.

Total and monthly numbers of hiking records were modeled as a function of the 50 variables described above under the assumption of a Poisson response. For temperature, precipitation, and hours of sunshine, annual and monthly average were used for the analysis of total and monthly records, respectively. In modeling by BRT, parameters for building of each learner and assembly of the learners must be carefully chosen to maximize generalization ability of a model31. In our case, candidate parameters were 2, 5, and 10 for the maximum depth of variable interactions for each learner; 2, 5, 10, and 20 for the minimum number of observations in the terminal nodes for each learner; 0.5 and 0.75 for the proportion of training data used for building each learner; and 1000, 2000, 4000, 6000, 8000 and 10,000 for the total number of learners (Table S2). In the model assembling process, the value of 0.01 was used as a shrinkage parameter. Ten-fold cross validation was used to obtain the best suites of parameters. R2 based on sum of squares:

$${R}^{2}=1-frac{{sum ({y}_{i}-widehat{{y}_{i}})}^{2}}{{sum ({y}_{i}-overline{{y }_{i}})}^{2}}$$

was used for evaluation of the model’s prediction performance. The importance of explanatory variables was evaluated as an increase of mean absolute error after 100-times permutation of a variable33.

Effects of each explanatory variable (a landscape attribute) on the response variable (record number) and the context dependence were visually inspected by individual conditional expectation (ICE) plot34. ICE plot visualizes the effect of a given explanatory variable for each observation by connecting outcome of a model for shifting values of the focal explanatory variable throughout the range while keeping other explanatory variables as the original value. Predictions were performed in log-scale and each line was centered to be zero at the left end of the x-axis to show relative effects of explanatory variables (c-ICE plot sensu Goltstein et al.34). Each line in ICE plot can be colored based on value of the second explanatory variable to assist assessment of the interactive effects of the two predictors. Friedman’s H statistic35 was used to detect explanatory variables whose interaction with the vegetation variables are important and therefore should be used for color-coding of an ICE plot. Friedman’s H is defined as a proportion of variance of partial dependence estimates explained by interactive effects for arbitrary suites of explanatory variables.

Then, expected impacts of 0.1 decrease in the three local vegetation variables were assessed by the trained model and mapped. Although vegetation variables were sometimes more important at larger spatial scales (see “Results”), we focused on vegetation at a local (10 km radius) scale because most changes in vegetation occur at the scale in Japan (National Surveys on the Natural Environment by the Biodiversity Center of Japan,

All statistical analyses were performed using the R software packag29. The gbm36 package was used for BRT, the iml37 package was used for calculation of Friedman’s H statistic, and the cv.models (Oguro, packages was used for cross validation and parameter tuning.

Leave a Reply

Your email address will not be published. Required fields are marked *