Feeling Nature: Measuring perceptions of biophilia across global biomes using visual AI

Study area

To test the hypothesis that BP is universal—meaning that people perceive nature in urban environments similarly regardless of their geographic location—we conducted the study in eight cities spanning five continents, representing the main terrestrial biomes⁴⁰. The biome concept organises large-scale ecological variations based on factors such as climate, temperature, light, precipitation and soil type, which collectively influence the density and diversity of vegetation and wildlife⁴⁰. Earth scientists have differing opinions on biome classification, ranging from 4 to 38 categories, including domains, ecozones, life zones, and biomes; and there is no universal consensus except for the dual classification distinguishing terrestrial and aquatic biomes. After conducting a review (summarised in Supplementary Table 1), we selected the eight terrestrial biomes that are both highly urbanised and commonly found in various classifications. Subsequently, we chose eight cities with historical or cultural significance to represent each biome. Figure 1 illustrates the selection of cities and corresponding terrestrial biomes considered in this study: Trondheim (tundra); Amsterdam (temperate forest); Quebec City (coniferous forest); Singapore (tropical forest); Barcelona (Mediterranean forest); Buenos Aires (grassland); Nairobi (savanna) and Dubai (desert).

Research design

Our methodology combines quantitative and qualitative approaches and comprises five steps. First, we collected data in the form of streetscape images using GSV as an open-source platform. These images served as a database for quantifying both nature-based elements of the urban environment (termed BS) and human perceptions of biophilia via vision and declared emotions in the same urban context (termed BP). Second, we identified city features related to BS and BP in the eight selected cities and processed GSV imagery through semantic segmentation. This involved applying a specific vision transformer model³⁹ (DPT) that utilises visual AI technology to associate pixels with objects of the built environment under a biophilic classification. The categorisation was derived from an initial selection of 25 BC chosen from the existing 150 semantic classes provided by the DPT model (Supplementary Fig. 4). Then, we computed BS via three metrics: coverage, diversity and distribution. Third, for each city, we constructed a survey using a selection of GSV images scored as the top and bottom 5% in terms of being lively and beautiful resulting from the Place Pulse 2.0 dataset³⁵. We employed a machine learning model trained to predict emotional perceptions of the cityscape based on GSV images³⁴, using data from thousands of people worldwide (Supplementary Material 1). We disseminated this online survey in our eight city cases to gather first-hand data on which visual features related to BP bring more positive feelings to the respondents, who should have lived in the corresponding city for at least 12 months. Fourth, we utilised the survey feedback to weigh our biophilic metrics and measure BP employing the DPT model. Additionally, we implemented spatial analysis using the distribution metric to map BS and BP in each case study. Last, we conducted a qualitative analysis of the survey comments to gain insights into BP through machine learning models for natural language understanding. Finally, we compared metrics and maps to interpret BP worldwide.

Datasets

The present study utilised two datasets: a visual database comprising GSV images collected from eight cities and the results from our online survey (Feeling Nature).

We gathered GSV images within a 15–25 km radius of each city centre. The number of GSV images collected varies across cities as follows: Amsterdam (122,240); Barcelona (107,560); Buenos Aires (464,724); Dubai (117,224); Nairobi (111,052); Quebec City (106,048); Singapore (178,224), Trondheim (38,752). These images were sampled at 15-m intervals along each city’s road network. Each sampling point is characterised by a location point ID, capture timestamp (expressed in year and month), GPS coordinates, image size (480 × 600 pixels), horizontal field of view (90°), and the compass heading of the camera, which divides the 360° spherical view into four viewing angles of 90° each corresponding to the four cardinal directions. The images were captured between 2022 and 2024, and collected by the authors between May 2023 and February 2024. This image dataset was used to calculate both the BS and BP for each city. To compute the related metrics, we ran the Feeling Nature survey based on a selection of GSV images derived from the application of a deep learning model trained³⁴ on the Place Pulse 2.0 dataset³⁵. Launched in 2013 by Dubey and colleagues³⁵ from MIT Media Lab, Place Pulse 2.0 is a web interface designed to measure how people perceive subjective aspects of the urban environment through pairwise comparisons from users. Place Pulse 2.0 dataset contains 110,988 images from 56 cities and 1,170,000 pairwise comparisons provided by 81,630 online volunteers from 162 developed and developing countries³⁵. For each pairwise comparison, respondents indicated whether both images inspired the same feeling or if one image scored higher across six perceptual dimensions: safe, lively, boring, wealthy, depressing and beautiful. From these dimensions, we selected lively and beautiful as two ‘positive feelings’ that evoke emotional responses, serving as the baseline to build our dataset for measuring BP (further details about the results from Place Pulse 2.0 dataset are provided in Supplementary Material 1). Lively closely aligns with the biophilia concept introduced by Wilson because it pertains to ‘the innate tendency to focus on life and lifelike processes’¹⁴. Beautiful is an aspect of the human inclination toward the aesthetics of nature and living forms, playing a fundamental role in biophilic design by fostering a place-based relationship between people and nature through attraction and beauty⁴¹. We initially ran a deep learning model pre-trained³⁴ on the Place Pulse 2.0 dataset³⁵ for our eight cities, selecting the top 5% of images with the highest scores (‘top lively’, ‘top beautiful’) and the bottom 5% of images with the lowest scores (‘bottom lively’, ‘bottom beautiful’). Then, we randomly compiled the four groups of GSV imagery for each city, with each group consisting of an equal number of images (N = 40). These subsets served as the image sources in our online survey (Feeling Nature), conducted as a subsequent phase of our study.

Recognising the significant role that contextual knowledge may play in BP, we conducted the Feeling Nature survey (Supplementary Figs. 2 and 3) from August to October 2023. The survey was disseminated through the personal and public networks of the research team, both within and outside academia. Despite receiving over 500 completed surveys, they remain disproportionate among cases, with the lowest count being 53. Therefore, we established a minimum threshold of 50 respondents per city, totalling 400 participants globally (Supplementary Table 2), considering this sample size is sufficient to calibrate the DPT model based on online responses. The online Feeling Nature survey consists of ten screens, nine questions and five sections (see Supplementary Fig. 2). A summary of the survey sections is presented as follows:

(1)

General information: The first section provides details about the research project, including its purpose, topic, team and participating institutions. It also includes information on the survey duration, implementation mode, voluntary participation, and data processing and protection measures. Participants were provided with an online informed consent and the option to choose from among the six languages officially spoken in the eight case study cities (English for Singapore and Nairobi; Dutch for Amsterdam; Spanish for Barcelona and Buenos Aires; French for Quebec City; Norwegian for Trondheim; Arabic for Dubai). This language opportunity is intended to make respondents more familiar with the survey interface and comfortable in replying in a language that aligns with their device settings and cultural background.
(2)

Requirement for participation: Participants were asked to select a city in which they had lived for at least one year. If they lived in more than one city for at least one year, they could select another city.
(3)

Pair image comparison: This involved a method used to gather urban perceptual preferences. Participants were presented with a series of six pairs of images, resulting in 2400 selections across the eight cities. They were tasked with selecting the image (left or right) evoking more positive feelings and providing feedback on the elements and aspects that contributed to these feelings. The paired images were automatically selected from the four subsets of ‘top’ or ‘bottom’ lively and beautiful and presented in a fixed sequence of ‘left vs. right’ combinations, as depicted in Supplementary Fig. 3. To ensure research reliability, no image was repeated more than five times. The online survey utilised a total of 1280 images, with each city case allocated 160 images (40 images from each of the four different image subsets).
(4)

Personal data: This portion inquired about participants’ age and gender.
(5)

Closing information: At the end of the survey, an expression of gratitude and a contact form were provided. Additionally, a ‘restart’ button invited participants to repeat the survey for any city that met the initial participation requirement. The survey was developed using Vercel’s Frontend Cloud and hosted on Amazon AWS to ensure secure worldwide access and data collection.

Quantifying biophilic settings

The quantitative analysis involved two steps using visual AI to assess biophilia in each city: measuring BSs without any resident input and measuring BP based on the results of the Feeling Nature survey. To quantify BS and BP, we introduced four biophilic metrics derived from visual AI applications. We utilised a state-of-the-art semantic segmentation model³⁹ (DPT) known for its high performance in image recognition. For the convolutional neural network specific to the three-step image processing (classification, detection, segmentation), we employed a DPT model pre-trained on the ADE20K dataset⁴². This model categorises each pixel of an image into one of 150 semantic classes, including both anthropic and natural objects. From this extensive list, we identified 25 BC representing natural elements and bio-based features. These 25 BC, illustrated in Supplementary Fig. 4, were further grouped into four biophilic categories of the cityscape:

Greenscape (7): tree, grass, plant/flora, fire, flower, palm tree, natural food;
Waterscape (7): water, sea, river, fountain, swimming pool, waterfall, lake;
Landscape (9): sky, earth/ground, mountain, field, rock/stone, sand, hill, light/sunlight, land/soil;
Living beings (2): person, animal/fauna.

In the initial phase of our quantitative analysis, we applied DPT to process complete sets of GSV images for each case study, estimating the overall level of biophilia through BS. To achieve this, we assigned equal weights to the 25 BC within the DPT model. We evaluated BS using the following three metrics:

Coverage represents the fraction of pixels belonging to BC in relation to the total number of pixels in an image. It can be expressed as

$${\rm{COV}}=\frac{{{\rm{px}}}_{{{{bc}}}}}{\,{{\rm{px}}}_{{{{tot}}}}\,}$$

(1)

where ${{{\rm{px}}}}_{{{\rm{bc}}}}$ denotes the total of pixels of 25 BC, and ${{\rm{px}}}_{{{\rm{tot}}}}$ corresponds to the total pixels in an image.

For a given BC $i$, ${{\rm{COV}}}_{i}$ indicates the fraction of pixels belonging to that class in relation to the total number of pixels in an image.

$${{\rm{COV}}}_{i}=\frac{{{\rm{px}}}_{i}}{\,{{\rm{px}}}_{{{{tot}}}}\,}$$

(2)

where ${{{\rm{px}}}}_{i}$ stands for the pixels of BC $i$, whilst ${{\rm{px}}}_{{{{tot}}}}$ expresses the total pixels in an image. The mean coverage of BC $i$ across all samples in a city is denoted as ${\overline{{\rm{COV}}}}_{i}$. Coverage ranges from 0 to 1, with values closer to 1 denoting a higher presence of BS in the city case (Fig. 2; Supplementary Table 3).
Diversity indicates the relative proportions of the 25 BC in a city, which intends to measure the relative prevalence of the classes in different cities. The fraction (${{F}}_{i}$) of class $i$ across all samples in a city is the ratio of the coverage of that class to total biophilic coverage in that city. This indicator can be calculated as

$${{F}}_{i}=\frac{{\overline{{\rm{COV}}}}_{i}}{\overline{{\rm{COV}}}\,}$$

(3)

Here, ${\overline{{\rm{COV}}}}_{i}$ represents the mean coverage of BC $i$, and $\overline{{\rm{COV}}}$ is the mean coverage of all BC in the city. Diversity is expressed as a percentage (Fig. 3; Supplementary Table 3).
Distribution describes the spatial distribution of the coverage of all BC across a city, which can be represented in a map (see Fig. 4).

The values obtained for the BS metrics were normalised using min–max scaling, which ensures that each value falls within the specific range of 0–1, corresponding to the metric scale. This process involved subtracting the minimum value from each value in the dataset and then dividing by the difference between the maximum and minimum values. Only the normalised values are visualised in Figs. 2–4. However, both normalised and non-normalised biophilic metric values for each city can be found in Supplementary Table 3.

Quantifying biophilic perceptions

In the second stage of our quantitative analysis, we computed BP by applying DPT to a dataset containing only the selected images from the Feeling Nature survey. Treating the modelling results as BP data, we determined which visual elements of the cityscape, represented by the corresponding BC, elicited more positive feelings among the survey respondents. Thus, we introduced a fourth biophilic metric as a measure of perceived biophilic significance:

– Intensity for BC $i$ indicates the fraction of all positively rated survey images in a city containing that class. It is defined as follows:

$${{\rm{INT}}}_{i}=\frac{{N}_{i}}{\,{N}_{{{\rm{tot}}}}\,}$$

(4)

where ${N}_{i}$ denotes the number of positively rated survey images with BC $i$, and ${N}_{{{\rm{tot}}}}$ stands for the total number of positively rated survey images. The range of intensity spans from 0 to 1 (Fig. 2; Supplementary Table 3).

To test our hypothesis on the universality of BP, we used the computed intensity values associated with each BC as weights to adjust the DPT results based on the survey perception data. Subsequently, we derived refined values for perceived coverage, perceived diversity, and perceived distribution by scaling them with the intensity coefficient to quantify and spatially delineate BP across each city case, as follows:

Perceived coverage represents the ratio between the weighted sum by the intensity of pixels belonging to BC and the total number of pixels in an image. It can be expressed as

$${\rm{PCOV}}=\frac{{\sum }_{i=1}^{n}{{\rm{px}}}_{i}{{\rm{INT}}}_{i}}{\,{{\rm{px}}}_{{{\rm{tot}}}}\,}$$

(5)

where ${{\rm{px}}}_{i}$ denotes the pixels of BC $i$, and ${{\rm{INT}}}_{i}$ corresponds to the intensity of BC $i$ (4). Ranging from 0 to 1, perceived coverage values closer to 1 indicate a stronger perception of biophilia associated with a particular site in the city (Fig. 2; Supplementary Table 3).
Perceived diversity indicates the weighted proportions of the 25 BC in a city intended to measure the relative prevalence among the classes perceived more positively in each city and expressed by the fraction (${{\rm{PF}}}_{i}$). Thus, perceived diversity results from:

$${{\rm{PF}}}_{i}=\frac{{\overline{{\rm{PCOV}}}}_{i}}{\,\overline{{\rm{PCOV}}}\,}$$

(6)

Here, ${\overline{{\rm{PCOV}}}}_{i}$ denotes the mean perceived coverage of BC $i$, and $\overline{{\rm{PCOV}}}$ indicates the mean perceived coverage of all BC in the city. Perceived diversity is represented as percentage values (Fig. 3; Supplementary Table 3).
Perceived distribution illustrates the spatial distribution of perceived coverage across each city case and is depicted through the corresponding BP maps (Fig. 4).

Even the values of BP metrics underwent normalisation through min–max scaling to calibrate them within the range of 0 and 1, as established by the metric rate. The related normalised values are displayed in Figs. 2–4, whilst the complete dataset is presented in Supplementary Table 3.

In addition to conducting visual AI modelling using DPT, we quantified BP through NLP applied to the received comments in the Feeling Nature survey. We utilised Bidirectional Encoder Representations from Transformers (BERT), a widely used pre-trained model for NLP. BERT identified morphological, semantic and syntactic similarities with the 25 BC, which were established as keyword entities to be recognised in the biophilic term classification process. After consolidating survey comments into a single document per city, BERT was implemented across seven tasks: (1) pre-processing: the initial text was divided into individual sentences, which underwent tokenisation, segmentation and positional embeddings to facilitate the sentence encoder; (2) feature extraction: semantic features were extracted using the 25 BC as keywords for specific inputs; (3) embeddings: sentences were converted into tensors through sentence embeddings, keyword embeddings and combined embeddings; (4) 25 BC’ keyword embeddings: embeddings for all 25 BC keywords were computed, with a threshold of 0.5 set to classify items with higher similarity values as biophilic only; (5) entity recognition: the closest embeddings (3–4) were identified for biophilic term classification; (6) similarity computation: embedding results were represented as numeric values through cosine similarity⁴³ (measures the cosine of the angle between two vectors, providing a measure of similarity between them) ranging from 0 to 1, and visualised in bar graphs (Fig. 5a). Vectors that are semantically more similar have values closer to 1, whilst more opposite vectors are indicated by values tending towards 0 (Supplementary Table 4); (7) similarity matrices: the cosine similarity between all pairs of keyword embeddings was calculated to generate similarity matrices shown in the form of seaborn heatmaps, which depict the 25 BC’ keywords along both the x and y axes (Fig. 5b).

Qualifying biophilic perceptions

We conducted a qualitative analysis of the comment section in the Feeling Nature survey to explore how specific aspects of the urban environment positively influence human perceptions. We introduced a qualitative parameter termed specificity, which considers both biophilic elements associated with the 25 BC and non-biophilic elements identified through NLP techniques. The goal of non-biophilic term classification was to uncover additional nuances related to BC, thereby identifying features beyond the 25 BC that could favourably influence respondents’ perceptions of their cities. This approach provided valuable insights into the effects of urban imagery on citizens, even in familiar settings.

Building on previous quantitative analyses conducted through transformer models (DPT and BERT), we utilised the generative pre-trained transformer (GPT-3.5) by OpenAI⁴⁴ to process textual survey data. Unlike BERT’s bidirectional transformer architecture, GPT-3.5 employs a unidirectional transformer architecture. Similar to BERT, it is a deep learning model trained for NLP using word embedding vectors. GPT-3.5 generates contextually meaningful texts through a stack of decoder layers and specific prompts. The analysis involved distributing document–topic–word combinations and clustering textual survey data into topics, with a special emphasis on non-biophilic subjects. The data set for each city underwent clustering into nature-based or non-nature-based topics and was sorted based on word frequency. We imposed a maximum of 10 clusters for each city and implemented fine-tuned instructions to sort clusters, avoid repetition, integrate abstract concepts and identify local features. The resulting clusters were labelled with semantically coherent terms, comprising a total of 14 sub-topic clusters. Although all 14 sub-topic clusters provide beneficial effects, only 4 fall under the nature-based topics and were associated with the biophilic categories (greenscape, landscape, waterscape, living beings). The other 10 topics fall under non-nature-based topics, including the built environment, mobility and infrastructure, human activity, neighbourhood and community, urban mood, positive feelings, aesthetics, abstract concepts, vitality and movement, and local features. The sub-topic clusters were then organised into four main topic clusters, encompassing nature-based environment, anthropogenic environment, feelings and concepts, and city identifier. Specificity was computed as percentage values and visualised in a chord diagram illustrating the interrelations between each city case and the 14 sub-topic clusters, arranged by descending BP rates (Fig. 6a; Supplementary Table 5).

Through the qualitative analysis, we further explored potential gender differences in BP based on survey comments. Focusing primarily on women and men due to the limited numbers in other gender categories, we filtered the survey outputs accordingly. We then repeated the GPT-3.5 topic cluster modelling for each gender-based document to estimate the qualities of the cityscape perceived more positively by male and female respondents. Finally, we synthesised the results considering the four main topic clusters (nature-based environment, anthropogenic environment, feelings and concepts, city identifier). The resulting values are displayed in a double-bar chart, with gender-based pair bars represented on the x-axis and specificity percentage on the y-axis (Fig. 6b; Supplementary Table 6).

Inclusion and ethics statement

This research, conducted on a global scale, utilised models that were appropriately cited throughout the study. Roles and responsibilities were agreed upon among collaborators before conducting the research. The Feeling Nature survey was collaboratively designed and approved by the ethical boards of MIT and TU Delft, with respective letters of approval from each institution. TU Delft and MIT are the main institutions supporting the implementation of this study.