In Class exercise 2

Analysis

tidyverse

Author

Brian Lim

Published

August 26, 2024

Modified

September 9, 2024

2.0 Getting Started

For this in-class exercise, two R packages will be used:

sf for importing, managing, and processing geospatial data
tidyverse for performing data science tasks such as importing, wrangling and visualising data.

To install and load these packages into the R environment, we use the p_load function from the pacman package:

pacman::p_load(sf,tidyverse)

2.1 Working with Master Plan 2014 Subzone Boundary Data

mpsz14_shp <- st_read(dsn = "data/MasterPlan2014SubzoneBoundaryWebSHP", 
                  layer = "MP14_SUBZONE_WEB_PL")

Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\Users\blzll\OneDrive\Desktop\Y3S1\IS415\Quarto\IS415\In-class_Ex\data\MasterPlan2014SubzoneBoundaryWebSHP' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

The code chunk below demonstrates data conversion from SHP file format to KML file format:

mpsz14_kml <- st_write(mpsz14_shp, 
  "data/MasterPlan2014SubzoneBoundaryWebKML.kml",
  delete_dsn = TRUE)

The delete_dsn argument relates to the dsn (Data Source Name) to delete original source before writing the new file

2.2 Working with Master Plan 2019 Subzone Boundary Data

mpsz19_kml <- st_read("data/MasterPlan2019SubzoneBoundaryNoSeaKML.kml")

Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source 
  `C:\Users\blzll\OneDrive\Desktop\Y3S1\IS415\Quarto\IS415\In-class_Ex\data\MasterPlan2019SubzoneBoundaryNoSeaKML.kml' 
  using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY, XYZ
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

mpsz19_shp <- st_read(dsn = "data/MasterPlan2019SubzoneBoundaryWebSHP", 
                      layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)

Reading layer `MPSZ-2019' from data source 
  `C:\Users\blzll\OneDrive\Desktop\Y3S1\IS415\Quarto\IS415\In-class_Ex\data\MasterPlan2019SubzoneBoundaryWebSHP' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

2.3 Working with population data

popdata <- read_csv("data/respopagesextod2023/respopagesextod2023.csv")

2.3.1 Data Preparation

popdata2023 <- popdata %>%
  group_by(PA, SZ, AG) %>%
  summarise(`POP` = sum(`Pop`)) %>%
  ungroup() %>%
  pivot_wider(names_from = AG,
              values_from = POP)

colnames(popdata2023)

 [1] "PA"          "SZ"          "0_to_4"      "10_to_14"    "15_to_19"   
 [6] "20_to_24"    "25_to_29"    "30_to_34"    "35_to_39"    "40_to_44"   
[11] "45_to_49"    "50_to_54"    "55_to_59"    "5_to_9"      "60_to_64"   
[16] "65_to_69"    "70_to_74"    "75_to_79"    "80_to_84"    "85_to_89"   
[21] "90_and_Over"

As seen above, unlike other programming languages, R indexes from ‘1’ instead of ‘0’. The rows begin from [1],[6],[11], etc.

2.3.2 Data Wrangling

popdata2023 <- popdata2023 %>%
mutate(YOUNG = rowSums(.[3:6])
         +rowSums(.[14])) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[7:13])+
rowSums(.[15]))%>%
mutate(`AGED`=rowSums(.[16:21])) %>%
mutate(`TOTAL`=rowSums(.[3:21])) %>%  
mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)
/`ECONOMY ACTIVE`) %>%
  select(`PA`, `SZ`, `YOUNG`, 
       `ECONOMY ACTIVE`, `AGED`, 
       `TOTAL`, `DEPENDENCY`)

popdata2023 <- popdata2023 %>%
mutate_at(.vars = vars(PA, SZ),
          .funs = list(toupper))

2.3.3 Joining the attribute data and geospatial data

mpsz_2023 <- left_join(mpsz19_shp, popdata2023,
                       by = c("SUBZONE_N" = "SZ"))

pop2023_mpsz <- left_join(popdata2023, mpsz19_shp,
                       by = c("SZ" = "SUBZONE_N"))