1.1 Data Wrangling

library(tidyverse)
library(sf)
library(QuantPsyc)
library(RSocrata)
library(viridis)
library(caret)
library(spatstat)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(knitr)
library(kableExtra)
library(tidycensus)

mapTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2)
  )
}

In this model, the algorithm was developed to borrow the theft experience in places theft has been observed in Chicago, and test whether that experience generalizes to places where the risk is ‘latent’.It uses the collected spatial structure variables to predicted the chicago hot spots.

The interest here is the theft cases of crimes. The following is the map of cirme points in Chicago.It is possible that the selection bias is exist in my model.It is possible that people with a high propensity to select theft cases or report it with the selection bias of race.The selection bias also can be the different education level block groups. To some extent, it is possible that the low-educated people’s behavior will be magnified.

policeDistricts <- 
  st_read("https://data.cityofchicago.org/api/geospatial/fthy-xz3r?method=export&format=GeoJSON") %>%
  st_transform(crs=102271) %>%
  dplyr::select(District = dist_num)
## Reading layer `OGRGeoJSON' from data source `https://data.cityofchicago.org/api/geospatial/fthy-xz3r?method=export&format=GeoJSON' using driver `GeoJSON'
## Simple feature collection with 25 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -87.94011 ymin: 41.64455 xmax: -87.52414 ymax: 42.02303
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
  
policeBeats <- 
  st_read("https://data.cityofchicago.org/api/geospatial/aerh-rz74?method=export&format=GeoJSON") %>%
  st_transform(crs=102271) %>%
  dplyr::select(District = beat_num)
## Reading layer `OGRGeoJSON' from data source `https://data.cityofchicago.org/api/geospatial/aerh-rz74?method=export&format=GeoJSON' using driver `GeoJSON'
## Simple feature collection with 277 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -87.94011 ymin: 41.64455 xmax: -87.52414 ymax: 42.02303
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

bothPoliceUnits <- rbind(mutate(policeDistricts, Legend = "Police Districts"), 
                         mutate(policeBeats, Legend = "Police Beats"))

chicagoBoundary <- 
  st_read("/Users/Anyang/Documents/R/MUSA/MUSA 507/Geospatial risk modeling/riskPrediction_data/chicagoBoundary.shp") %>%
  st_transform(crs=102271) 
## Reading layer `chicagoBoundary' from data source `C:\Users\Anyang\Documents\R\MUSA\MUSA 507\Geospatial risk modeling\riskPrediction_data\chicagoBoundary.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 341164.1 ymin: 552875.4 xmax: 367345.4 ymax: 594870
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=36.66666666666666 +lon_0=-88.33333333333333 +k=0.9999749999999999 +x_0=300000 +y_0=0 +ellps=GRS80 +units=m +no_defs

fishnet <- 
  st_make_grid(chicagoBoundary, cellsize = 500) %>%
  st_sf()

fishnet <- 
  fishnet[chicagoBoundary,] %>%
  mutate(uniqueID = rownames(.)) %>%
  dplyr::select(uniqueID)

theft <- readRDS("C:/Users/Anyang/Documents/R/MUSA/MUSA 507/Geospatial risk modeling/theft.rds")

ggplot() + 
  geom_sf(data = chicagoBoundary) +
  geom_sf(data = theft, colour="black", size=0.05, show.legend = "point") +
  labs(title= "Theft, Chicago - 2017") +
  mapTheme()

1.2 Joining the Theft to the fishnet

In this model, It creates and refers to this grid cell lattice as the fishnet.The fishnet is used to aggregate point phenomena to a lattice of grid cells.It is used to represent smooth spatial surface which varies across administrative units. Then, calculating the count of theft cases.

Joining the sum theft cases to the fishnet. The following is the map of the count of theftlaries by grid cell, and the spatial structure of theft is created in the map below as the hotspots are apparent.

theft_net <- 
  theft %>% 
  dplyr::select() %>% 
  mutate(counttheft = 1) %>% 
  aggregate(., fishnet, sum) %>%
  mutate(counttheft = ifelse(is.na(counttheft), 0, counttheft),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), size=nrow(fishnet), replace = TRUE))

ggplot() +
  geom_sf(data = theft_net, aes(fill = counttheft)) +
  scale_fill_viridis() +
  labs(title = "Count of Thefts for the fishnet") +
  mapTheme()

2.1 Data Wrangling Risk Factors

The model use ten risk factors and a neighborhood polygon layer. The data includes 311 calls for: Missed garbage, Tree trims, Building violation, Traffic sign out, Street lights out, Graffiti remediation, Sanitation complaints, Abandoned cars, The location of liquor retail,Abandoned buildings.

There are the maps of risk factors in points.

Buildingviolation <- readRDS("C:/Users/Anyang/Documents/R/MUSA/MUSA 507/Geospatial risk modeling/Buildingviolation.rds")%>%
  dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Building Violation")

Treetrim <- readRDS("C:/Users/Anyang/Documents/R/MUSA/MUSA 507/Geospatial risk modeling/Treetrim.rds")%>%
  dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Tree Trim")

Trafficsignout <- readRDS("C:/Users/Anyang/Documents/R/MUSA/MUSA 507/Geospatial risk modeling/Trafficsignout.rds")%>%
  dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Traffic sign out")

missgarbage <- readRDS("C:/Users/Anyang/Documents/R/MUSA/MUSA 507/Geospatial risk modeling/missgarbage.rds")%>%
  dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Miss Garbage")

graffiti <- 
  read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Graffiti-Removal-Historical/hec5-y4x5") %>%
    mutate(year = substr(creation_date,1,4)) %>%
    filter(year == "2017") %>%
    filter(where_is_the_graffiti_located_ == "Front" |
           where_is_the_graffiti_located_ == "Rear" | where_is_the_graffiti_located_ == "Side") %>%
    dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Graffiti")

streetLightsOut <- 
  read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Street-Lights-All-Out/zuxi-7xem") %>%
    mutate(year = substr(creation_date,1,4)) %>%
    filter(year == "2017") %>%
    dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Street_Lights_Out")

sanitation <-
  read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Sanitation-Code-Complaints-Hi/me59-5fac") %>%
    mutate(year = substr(creation_date,1,4)) %>%
    filter(year == "2017") %>%
    dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Sanitation")

liquorRetail <- 
  read.socrata("https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses-Cur   rent-Liquor-and-Public-Places/nrmj-3kcf") %>%
  filter(BUSINESS.ACTIVITY == "Retail Sales of Packaged Liquor") %>%
  dplyr::select(Y = LATITUDE, X = LONGITUDE) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Liquor_Retail")

abandonCars <- 
  read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Abandoned-Vehicles/3c9v-pnva") %>%
  mutate(year = substr(creation_date,1,4)) %>%
  filter(year == "2017") %>%
  dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Abandoned_Cars")
  
abandonBuildings <- 
  read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Vacant-and-Abandoned-Building/7nii-7srd") %>%
    mutate(year = substr(date_service_request_was_received,1,4)) %>%
    filter(year == "2017") %>%
    dplyr::select(Y = latitude, X = longitude) %>%
  na.omit() %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(fishnet)) %>%
  mutate(Legend = "Abandoned_Buildings")

neighborhoods <- 
  st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/chicago.geojson") %>%
  st_transform(st_crs(fishnet)) 
## Reading layer `chicago' from data source `https://raw.githubusercontent.com/blackmad/neighborhoods/master/chicago.geojson' using driver `GeoJSON'
## Simple feature collection with 98 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

ggplot() +
  geom_sf(data=chicagoBoundary) +
  geom_sf(data = rbind(Buildingviolation,Trafficsignout,missgarbage,streetLightsOut,
                      liquorRetail, graffiti,sanitation,abandonBuildings,abandonCars,Treetrim),
         size = .1) +
  facet_wrap(~Legend, ncol = 2) +
  labs(title = "Risk Factors") +  
  mapTheme()

3.1 Feature engineering - Count of features by grid cell

There are the maps of risk factors counts in the fishnet. The approach sums the number of a given risk factor points occuring in a given grid cell.

vars_net <- 
  rbind(missgarbage,streetLightsOut,
        liquorRetail, graffiti, sanitation,abandonBuildings,abandonCars,Buildingviolation,Treetrim,Trafficsignout) %>%
  st_join(., fishnet, join=st_within) %>%
  st_set_geometry(NULL) %>%
  group_by(uniqueID, Legend) %>%
  summarize(count = n()) %>%
  full_join(fishnet) %>%
  spread(Legend, count, fill=0) %>%
  st_sf() %>%
  dplyr::select(-`<NA>`) %>%
  na.omit()

vars_net.long <- 
  vars_net %>%
  gather(Variable, value, -geometry, -uniqueID)

vars <- unique(vars_net.long$Variable)
mapList <- list()

for(i in vars){
  mapList[[i]] <- 
    ggplot() +
      geom_sf(data = filter(vars_net.long, Variable == i), aes(fill=value), colour=NA) +
      scale_fill_viridis(name="") +
      labs(title=i) +
      mapTheme()}

do.call(grid.arrange,c(mapList, ncol =2, top = "Risk Factors by Fishnet"))

3.2 Feature engineering - Nearest neighbor features

There are the maps of distance by nearest risk factors. It calculates the nearest neighbor distance. The average nearest neighbor distance is likely a better approach for understanding local exposure to risk factors.

nn_function <- function(measureFrom,measureTo,k) {
  measureFrom_Matrix <-
    as.matrix(measureFrom)
  measureTo_Matrix <-
    as.matrix(measureTo)
  nn <-   
    get.knnx(measureTo, measureFrom, k)$nn.dist
    output <-
      as.data.frame(nn) %>%
      rownames_to_column(var = "thisPoint") %>%
      gather(points, point_distance, V1:ncol(.)) %>%
      arrange(as.numeric(thisPoint)) %>%
      group_by(thisPoint) %>%
      summarize(pointDistance = mean(point_distance)) %>%
      arrange(as.numeric(thisPoint)) %>% 
      dplyr::select(-thisPoint) %>%
      pull()
  
  return(output)  
}

vars_net$missgarbage.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(missgarbage), 3)
    
vars_net$Graffiti.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(graffiti), 3)
    
vars_net$Liquor_Retail.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(liquorRetail), 3)

vars_net$Street_Lights_Out.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(streetLightsOut), 3)
    
vars_net$Sanitation.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(sanitation), 3)

vars_net$Abandoned_Buildings.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(abandonBuildings), 3)
    
vars_net$Abandoned_Cars.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(abandonCars), 3)

vars_net$Buildingviolation.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(Buildingviolation), 3)

vars_net$Treetrim.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(Treetrim), 3)

vars_net$Trafficsignout.nn =
    nn_function(st_coordinates(st_centroid(vars_net)), st_coordinates(Trafficsignout), 3)


vars_net.long.nn <- 
  vars_net %>%
  dplyr::select(ends_with(".nn")) %>%
  gather(Variable, value, -geometry, -uniqueID)

vars <- unique(vars_net.long.nn$Variable)
mapList <- list()

for(i in vars){
  mapList[[i]] <- 
    ggplot() +
      geom_sf(data = filter(vars_net.long.nn, Variable == i), aes(fill=value), colour=NA) +
      scale_fill_viridis(name="") +
      labs(title=i) +
      mapTheme()}

do.call(grid.arrange,c(mapList, ncol =2, top = "Nearest Neighbor risk Factors by Fishnet"))

3.3 Feature Engineering - Measure distance to one point

It also may be reasonable to measure distance to a single point, like the centroid for the Logan Square neighbor - Chicago’s neighborhood filled with beautiful architecture, lovely parks and a great mix of bars and restaurants.

loganPoint <-
  neighborhoods %>%
  filter(name == "Logan Square") %>%
  st_centroid()

vars_net$loganDistance =
  st_distance(st_centroid(vars_net),loganPoint) %>%
  as.numeric() 

ggplot() +
  geom_sf(data=vars_net, aes(fill=loganDistance)) +
  scale_fill_viridis() +
  labs(title="Euclidean distance to The Logan Square") +
  mapTheme() 

final_net <-
  left_join(theft_net, st_set_geometry(vars_net, NULL), by="uniqueID") 
final_net <-
  st_centroid(final_net) %>%
    st_join(., dplyr::select(neighborhoods, name)) %>%
    st_join(., dplyr::select(policeDistricts, District)) %>%
      st_set_geometry(NULL) %>%
      left_join(dplyr::select(final_net, geometry, uniqueID)) %>%
      st_sf() %>%
  na.omit()

4.1 Exploring the spatial structure of theft cases

In this part, it creates a neighbor and the spatial weights matrix. In addition, it plots the Moran’s I which is used for checking spatial autocorrelation on a local scale. While the null hypothesis for Global Moran’s I is that the count of theft is randomly distributed across the study area. Two more useful test statistics are output including the p-value and Significiant Hotspots, defined as those grid cells with p-values <= 0.05.

final_net.nb <- poly2nb(final_net, queen=TRUE)
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)

final_net.localMorans <- 
  cbind(
    as.data.frame(localmoran(final_net$counttheft, final_net.weights)),
    as.data.frame(final_net, NULL)) %>% 
    st_sf() %>%
    dplyr::select(Theft_Count = counttheft, 
                  Local_Morans_I = Ii, 
                  P_Value = `Pr(z > 0)`) %>%
    mutate(Significant_Hotspots = ifelse(P_Value <= 0.05, 1, 0)) %>%
    gather(Variable, Value, -geometry)
  
vars <- unique(final_net.localMorans$Variable)
varList <- list()

for(i in vars){
  varList[[i]] <- 
    ggplot() +
      geom_sf(data = filter(final_net.localMorans, Variable == i), aes(fill = Value), colour=NA) +
      scale_fill_viridis(name="") +
      labs(title=i) +
      mapTheme()}

do.call(grid.arrange,c(varList, ncol = 4, top = "Local Morans I statistics, Theft"))

final_net <-
  final_net %>% 
  mutate(theft.isSig = ifelse(localmoran(final_net$counttheft, 
                                            final_net.weights)[,5] <= 0.0000001, 1, 0)) %>%
  mutate(theft.isSig.dist = nn_function(st_coordinates(st_centroid(final_net)),
                                           st_coordinates(st_centroid(
                                             filter(final_net, theft.isSig == 1))), 1 ))

4.2 Correlation Test

Here the test is used to check the Pearson correlation between theft counts and risk factors. These plots also can indicate the different directions of association between the theft counts and nearest neighbor features.

correlation.long <-
  st_set_geometry(final_net, NULL) %>%
    dplyr::select(-uniqueID, -cvID, -loganDistance, -name, -District) %>%
    gather(Variable, Value, -counttheft)

correlation.cor <-
  correlation.long %>%
    group_by(Variable) %>%
    summarize(correlation = cor(Value, counttheft, use = "complete.obs"))
    
ggplot(correlation.long, aes(Value, counttheft)) +
  geom_point(size = 0.1) +
  geom_text(data = correlation.cor, aes(label = paste("r =", round(correlation, 2))),
            x=-Inf, y=Inf, vjust = 1, hjust = -.1) +
  geom_smooth(method = "lm", se = FALSE, colour = "#a2d7d8") +
  facet_wrap(~Variable, ncol = 2, scales = "free") +
  labs(title = "Theft count as a function of risk factors")

5.1 The histogram of dependant variable

There is the distribution of theft count in the histogram before. The distribution theft is far more skewed.

ggplot(final_net, aes(counttheft)) + 
  geom_histogram(binwidth = 1) +
  labs(title = "Distribution of theft by grid cell")

5.2 Cross-validated poisson Regression

reg.vars <- c("Graffiti.nn", 
              "Liquor_Retail.nn", "Street_Lights_Out.nn", "Sanitation.nn", "loganDistance","missgarbage.nn","Abandoned_Buildings.nn","Abandoned_Cars.nn","Treetrim.nn","Trafficsignout.nn","Buildingviolation.nn")

reg.ss.vars <- c("Graffiti.nn", "Liquor_Retail.nn", "missgarbage.nn",
                 "Street_Lights_Out.nn", "Sanitation.nn", "loganDistance", 
                 "theft.isSig", "theft.isSig.dist","Abandoned_Buildings.nn","Abandoned_Cars.nn","Treetrim.nn","Trafficsignout.nn","Buildingviolation.nn")

crossValidate <- function(dataset, id, dependentVariable, indVariables) {

allPredictions <- data.frame()
cvID_list <- unique(dataset[[id]])

for (i in cvID_list) {

  thisFold <- i
  cat("This hold out fold is", thisFold, "\n")

  fold.train <- filter(dataset, dataset[[id]] != thisFold) %>% as.data.frame() %>% 
                dplyr::select(id, geometry, indVariables, dependentVariable)
  fold.test  <- filter(dataset, dataset[[id]] == thisFold) %>% as.data.frame() %>% 
                dplyr::select(id, geometry, indVariables, dependentVariable)
  
  regression <-
    glm(counttheft ~ ., family = "poisson", 
      data = fold.train %>% 
      dplyr::select(-geometry, -id))
  
  thisPrediction <- 
    mutate(fold.test, Prediction = predict(regression, fold.test, type = "response"))
    
  allPredictions <-
    rbind(allPredictions, thisPrediction)
    
  }
  return(st_sf(allPredictions))
}

reg.cv <- crossValidate(
  dataset = final_net,
  id = "cvID",
  dependentVariable = "counttheft",
  indVariables = reg.vars) %>%
    dplyr::select(cvID = cvID, counttheft, Prediction, geometry)
## This hold out fold is 9 
## This hold out fold is 59 
## This hold out fold is 21 
## This hold out fold is 99 
## This hold out fold is 18 
## This hold out fold is 46 
## This hold out fold is 52 
## This hold out fold is 38 
## This hold out fold is 41 
## This hold out fold is 73 
## This hold out fold is 53 
## This hold out fold is 44 
## This hold out fold is 72 
## This hold out fold is 98 
## This hold out fold is 54 
## This hold out fold is 87 
## This hold out fold is 17 
## This hold out fold is 5 
## This hold out fold is 12 
## This hold out fold is 60 
## This hold out fold is 68 
## This hold out fold is 57 
## This hold out fold is 64 
## This hold out fold is 47 
## This hold out fold is 6 
## This hold out fold is 71 
## This hold out fold is 63 
## This hold out fold is 55 
## This hold out fold is 4 
## This hold out fold is 3 
## This hold out fold is 26 
## This hold out fold is 51 
## This hold out fold is 45 
## This hold out fold is 39 
## This hold out fold is 75 
## This hold out fold is 42 
## This hold out fold is 34 
## This hold out fold is 86 
## This hold out fold is 10 
## This hold out fold is 8 
## This hold out fold is 25 
## This hold out fold is 40 
## This hold out fold is 58 
## This hold out fold is 89 
## This hold out fold is 49 
## This hold out fold is 22 
## This hold out fold is 102 
## This hold out fold is 7 
## This hold out fold is 91 
## This hold out fold is 61 
## This hold out fold is 66 
## This hold out fold is 95 
## This hold out fold is 27 
## This hold out fold is 78 
## This hold out fold is 19 
## This hold out fold is 32 
## This hold out fold is 48 
## This hold out fold is 81 
## This hold out fold is 82 
## This hold out fold is 88 
## This hold out fold is 96 
## This hold out fold is 28 
## This hold out fold is 16 
## This hold out fold is 65 
## This hold out fold is 1 
## This hold out fold is 35 
## This hold out fold is 24 
## This hold out fold is 90 
## This hold out fold is 14 
## This hold out fold is 94 
## This hold out fold is 74 
## This hold out fold is 70 
## This hold out fold is 33 
## This hold out fold is 76 
## This hold out fold is 62 
## This hold out fold is 101 
## This hold out fold is 56 
## This hold out fold is 84 
## This hold out fold is 85 
## This hold out fold is 97 
## This hold out fold is 50 
## This hold out fold is 67 
## This hold out fold is 29 
## This hold out fold is 11 
## This hold out fold is 80 
## This hold out fold is 37 
## This hold out fold is 23 
## This hold out fold is 43 
## This hold out fold is 31 
## This hold out fold is 93 
## This hold out fold is 15 
## This hold out fold is 13 
## This hold out fold is 92 
## This hold out fold is 2 
## This hold out fold is 36 
## This hold out fold is 77 
## This hold out fold is 30 
## This hold out fold is 69 
## This hold out fold is 100 
## This hold out fold is 20 
## This hold out fold is 79 
## This hold out fold is 83

reg.ss.cv <- crossValidate(
  dataset = final_net,
  id = "cvID",
  dependentVariable = "counttheft",
  indVariables = reg.ss.vars) %>%
    dplyr::select(cvID = cvID, counttheft, Prediction, geometry)
## This hold out fold is 9 
## This hold out fold is 59 
## This hold out fold is 21 
## This hold out fold is 99 
## This hold out fold is 18 
## This hold out fold is 46 
## This hold out fold is 52 
## This hold out fold is 38 
## This hold out fold is 41 
## This hold out fold is 73 
## This hold out fold is 53 
## This hold out fold is 44 
## This hold out fold is 72 
## This hold out fold is 98 
## This hold out fold is 54 
## This hold out fold is 87 
## This hold out fold is 17 
## This hold out fold is 5 
## This hold out fold is 12 
## This hold out fold is 60 
## This hold out fold is 68 
## This hold out fold is 57 
## This hold out fold is 64 
## This hold out fold is 47 
## This hold out fold is 6 
## This hold out fold is 71 
## This hold out fold is 63 
## This hold out fold is 55 
## This hold out fold is 4 
## This hold out fold is 3 
## This hold out fold is 26 
## This hold out fold is 51 
## This hold out fold is 45 
## This hold out fold is 39 
## This hold out fold is 75 
## This hold out fold is 42 
## This hold out fold is 34 
## This hold out fold is 86 
## This hold out fold is 10 
## This hold out fold is 8 
## This hold out fold is 25 
## This hold out fold is 40 
## This hold out fold is 58 
## This hold out fold is 89 
## This hold out fold is 49 
## This hold out fold is 22 
## This hold out fold is 102 
## This hold out fold is 7 
## This hold out fold is 91 
## This hold out fold is 61 
## This hold out fold is 66 
## This hold out fold is 95 
## This hold out fold is 27 
## This hold out fold is 78 
## This hold out fold is 19 
## This hold out fold is 32 
## This hold out fold is 48 
## This hold out fold is 81 
## This hold out fold is 82 
## This hold out fold is 88 
## This hold out fold is 96 
## This hold out fold is 28 
## This hold out fold is 16 
## This hold out fold is 65 
## This hold out fold is 1 
## This hold out fold is 35 
## This hold out fold is 24 
## This hold out fold is 90 
## This hold out fold is 14 
## This hold out fold is 94 
## This hold out fold is 74 
## This hold out fold is 70 
## This hold out fold is 33 
## This hold out fold is 76 
## This hold out fold is 62 
## This hold out fold is 101 
## This hold out fold is 56 
## This hold out fold is 84 
## This hold out fold is 85 
## This hold out fold is 97 
## This hold out fold is 50 
## This hold out fold is 67 
## This hold out fold is 29 
## This hold out fold is 11 
## This hold out fold is 80 
## This hold out fold is 37 
## This hold out fold is 23 
## This hold out fold is 43 
## This hold out fold is 31 
## This hold out fold is 93 
## This hold out fold is 15 
## This hold out fold is 13 
## This hold out fold is 92 
## This hold out fold is 2 
## This hold out fold is 36 
## This hold out fold is 77 
## This hold out fold is 30 
## This hold out fold is 69 
## This hold out fold is 100 
## This hold out fold is 20 
## This hold out fold is 79 
## This hold out fold is 83
  
reg.spatialCV <- crossValidate(
  dataset = final_net,
  id = "name",
  dependentVariable = "counttheft",
  indVariables = reg.vars) %>%
    dplyr::select(cvID = name, counttheft, Prediction, geometry)
## This hold out fold is Riverdale 
## This hold out fold is Hegewisch 
## This hold out fold is West Pullman 
## This hold out fold is South Deering 
## This hold out fold is Morgan Park 
## This hold out fold is Mount Greenwood 
## This hold out fold is Roseland 
## This hold out fold is Pullman 
## This hold out fold is East Side 
## This hold out fold is Beverly 
## This hold out fold is Washington Heights 
## This hold out fold is Burnside 
## This hold out fold is Calumet Heights 
## This hold out fold is Chatham 
## This hold out fold is South Chicago 
## This hold out fold is Auburn Gresham 
## This hold out fold is Ashburn 
## This hold out fold is Avalon Park 
## This hold out fold is West Lawn 
## This hold out fold is Grand Crossing 
## This hold out fold is South Shore 
## This hold out fold is Chicago Lawn 
## This hold out fold is Englewood 
## This hold out fold is Woodlawn 
## This hold out fold is Clearing 
## This hold out fold is Jackson Park 
## This hold out fold is Washington Park 
## This hold out fold is Garfield Ridge 
## This hold out fold is West Elsdon 
## This hold out fold is Gage Park 
## This hold out fold is Hyde Park 
## This hold out fold is New City 
## This hold out fold is Fuller Park 
## This hold out fold is Archer Heights 
## This hold out fold is Brighton Park 
## This hold out fold is Grand Boulevard 
## This hold out fold is Kenwood 
## This hold out fold is Oakland 
## This hold out fold is Little Village 
## This hold out fold is Mckinley Park 
## This hold out fold is Bridgeport 
## This hold out fold is Armour Square 
## This hold out fold is Douglas 
## This hold out fold is Lower West Side 
## This hold out fold is North Lawndale 
## This hold out fold is Chinatown 
## This hold out fold is Near South Side 
## This hold out fold is Museum Campus 
## This hold out fold is Little Italy, UIC 
## This hold out fold is West Loop 
## This hold out fold is Austin 
## This hold out fold is Printers Row 
## This hold out fold is Garfield Park 
## This hold out fold is Grant Park 
## This hold out fold is United Center 
## This hold out fold is Greektown 
## This hold out fold is Loop 
## This hold out fold is Millenium Park 
## This hold out fold is Humboldt Park 
## This hold out fold is West Town 
## This hold out fold is River North 
## This hold out fold is Streeterville 
## This hold out fold is Ukrainian Village 
## This hold out fold is East Village 
## This hold out fold is Rush & Division 
## This hold out fold is Wicker Park 
## This hold out fold is Gold Coast 
## This hold out fold is Galewood 
## This hold out fold is Old Town 
## This hold out fold is Lincoln Park 
## This hold out fold is Belmont Cragin 
## This hold out fold is Hermosa 
## This hold out fold is Logan Square 
## This hold out fold is Bucktown 
## This hold out fold is Montclare 
## This hold out fold is Sheffield & DePaul 
## This hold out fold is Dunning 
## This hold out fold is Avondale 
## This hold out fold is North Center 
## This hold out fold is Lake View 
## This hold out fold is Portage Park 
## This hold out fold is Irving Park 
## This hold out fold is Boystown 
## This hold out fold is Wrigleyville 
## This hold out fold is Uptown 
## This hold out fold is Albany Park 
## This hold out fold is Lincoln Square 
## This hold out fold is Norwood Park 
## This hold out fold is Jefferson Park 
## This hold out fold is Sauganash,Forest Glen 
## This hold out fold is North Park 
## This hold out fold is Andersonville 
## This hold out fold is Edgewater 
## This hold out fold is West Ridge 
## This hold out fold is Edison Park 
## This hold out fold is Rogers Park

reg.ss.spatialCV <- crossValidate(
  dataset = final_net,
  id = "name",
  dependentVariable = "counttheft",
  indVariables = reg.ss.vars) %>%
    dplyr::select(cvID = name, counttheft, Prediction, geometry)
## This hold out fold is Riverdale 
## This hold out fold is Hegewisch 
## This hold out fold is West Pullman 
## This hold out fold is South Deering 
## This hold out fold is Morgan Park 
## This hold out fold is Mount Greenwood 
## This hold out fold is Roseland 
## This hold out fold is Pullman 
## This hold out fold is East Side 
## This hold out fold is Beverly 
## This hold out fold is Washington Heights 
## This hold out fold is Burnside 
## This hold out fold is Calumet Heights 
## This hold out fold is Chatham 
## This hold out fold is South Chicago 
## This hold out fold is Auburn Gresham 
## This hold out fold is Ashburn 
## This hold out fold is Avalon Park 
## This hold out fold is West Lawn 
## This hold out fold is Grand Crossing 
## This hold out fold is South Shore 
## This hold out fold is Chicago Lawn 
## This hold out fold is Englewood 
## This hold out fold is Woodlawn 
## This hold out fold is Clearing 
## This hold out fold is Jackson Park 
## This hold out fold is Washington Park 
## This hold out fold is Garfield Ridge 
## This hold out fold is West Elsdon 
## This hold out fold is Gage Park 
## This hold out fold is Hyde Park 
## This hold out fold is New City 
## This hold out fold is Fuller Park 
## This hold out fold is Archer Heights 
## This hold out fold is Brighton Park 
## This hold out fold is Grand Boulevard 
## This hold out fold is Kenwood 
## This hold out fold is Oakland 
## This hold out fold is Little Village 
## This hold out fold is Mckinley Park 
## This hold out fold is Bridgeport 
## This hold out fold is Armour Square 
## This hold out fold is Douglas 
## This hold out fold is Lower West Side 
## This hold out fold is North Lawndale 
## This hold out fold is Chinatown 
## This hold out fold is Near South Side 
## This hold out fold is Museum Campus 
## This hold out fold is Little Italy, UIC 
## This hold out fold is West Loop 
## This hold out fold is Austin 
## This hold out fold is Printers Row 
## This hold out fold is Garfield Park 
## This hold out fold is Grant Park 
## This hold out fold is United Center 
## This hold out fold is Greektown 
## This hold out fold is Loop 
## This hold out fold is Millenium Park 
## This hold out fold is Humboldt Park 
## This hold out fold is West Town 
## This hold out fold is River North 
## This hold out fold is Streeterville 
## This hold out fold is Ukrainian Village 
## This hold out fold is East Village 
## This hold out fold is Rush & Division 
## This hold out fold is Wicker Park 
## This hold out fold is Gold Coast 
## This hold out fold is Galewood 
## This hold out fold is Old Town 
## This hold out fold is Lincoln Park 
## This hold out fold is Belmont Cragin 
## This hold out fold is Hermosa 
## This hold out fold is Logan Square 
## This hold out fold is Bucktown 
## This hold out fold is Montclare 
## This hold out fold is Sheffield & DePaul 
## This hold out fold is Dunning 
## This hold out fold is Avondale 
## This hold out fold is North Center 
## This hold out fold is Lake View 
## This hold out fold is Portage Park 
## This hold out fold is Irving Park 
## This hold out fold is Boystown 
## This hold out fold is Wrigleyville 
## This hold out fold is Uptown 
## This hold out fold is Albany Park 
## This hold out fold is Lincoln Square 
## This hold out fold is Norwood Park 
## This hold out fold is Jefferson Park 
## This hold out fold is Sauganash,Forest Glen 
## This hold out fold is North Park 
## This hold out fold is Andersonville 
## This hold out fold is Edgewater 
## This hold out fold is West Ridge 
## This hold out fold is Edison Park 
## This hold out fold is Rogers Park

The Goodness of fit metrics are generated for both random k-fold and Leave-one-group-out cross-validations (LOGO-CV) for two different specifications. There are the maps predictions for each Regression.It can be indicated that the models with the spatial structural features has a better prediction.

reg.summary <- 
  rbind(
    mutate(reg.cv,           Error = counttheft - Prediction,
                             Regression = "Random k-fold CV: Just Risk Factors"),
                             
    mutate(reg.ss.cv,        Error = counttheft - Prediction,
                             Regression = "Random k-fold CV: Spatial Structure"),
    
    mutate(reg.spatialCV,    Error = counttheft - Prediction,
                             Regression = "Spatial LOGO-CV: Just Risk Factors"),
                             
    mutate(reg.ss.spatialCV, Error = counttheft - Prediction,
                             Regression = "Spatial LOGO-CV: Spatial Structure")) %>%
    st_sf() 

grid.arrange(
  reg.summary %>%
    ggplot() +
      geom_sf(aes(fill = Prediction)) +
      facet_wrap(~Regression) +
      scale_fill_viridis() +
      labs(title = "Predicted theft by Regression") +
      mapTheme() + theme(legend.position="bottom"),

  filter(reg.summary, Regression == "Random k-fold CV: Just Risk Factors") %>%
    ggplot() +
      geom_sf(aes(fill = counttheft)) +
      scale_fill_viridis() +
      labs(title = "Observed theft\n") +
      mapTheme() + theme(legend.position="bottom"), ncol = 2)

5.3 Model error by random k-fold and spatial cross validation

Below maps indicate the prediction errors by random k-fold and spatial cross validation. It is notable to indicate the Spatial Structure can decrease the errors.

filter(reg.summary, Regression == "Spatial LOGO-CV: Just Risk Factors" | 
         Regression == "Spatial LOGO-CV: Spatial Structure" |
         Regression == "Random k-fold CV: Just Risk Factors" | 
         Regression == "Random k-fold CV: Spatial Structure" ) %>%
  ggplot() +
    geom_sf(aes(fill = Error)) +
    facet_wrap(~Regression) +
    scale_fill_viridis() +
    labs(title = "Theft errors by Regression") +
    mapTheme()

5.4 MAE and the standard deviation of MAE

Mean absolute error (MAE) can be calculated on predicted and observed theft crime counts, then compared across models.The results indicate that the Spatial Structure model can decrease the MAE and SD_MAE. So,the regressions with the spatial structural features are much more accurate and more generalizable. These features are doing a better job predicting the hotspots.

st_set_geometry(reg.summary, NULL) %>%
  group_by(Regression) %>% 
  summarize(MAE = round(mean(abs(Prediction - counttheft), na.rm = T),2),
            SD_MAE = round(sd(abs(Prediction - counttheft), na.rm = T),2)) %>% 
  kable(caption = "MAE by regression") %>%
    kable_styling("striped", full_width = F) %>%
    row_spec(2, color = "black", background = "#FDE725FF") %>%
    row_spec(4, color = "black", background = "#FDE725FF") 
MAE by regression
Regression MAE SD_MAE
Random k-fold CV: Just Risk Factors 2.02 3.84
Random k-fold CV: Spatial Structure 1.67 2.98
Spatial LOGO-CV: Just Risk Factors 2.06 3.95
Spatial LOGO-CV: Spatial Structure 1.68 3.01

5.5 Generalizability by neighborhood context

There is to check if the algorithms can perpetuate racial bias.The model use tidycensus to pull race data by Census Tract. percentWhite is calculated and tracts are split into two groups, Majority_White and Majority_Non_White.

Compared with the Errors in raw form indicates that the model does overpredict theft crime in Majority_Non_White neighborhoods and underpredict theft crime in Majority_White neighborhoods. However, the model with the spatial features have lower errors and difference in errors across neighborhood context.

tracts17 <- 
  get_acs(geography = "tract", variables = c("B01001_001E","B01001A_001E"), 
          year = 2017, state=17, county=031, geometry=T) %>%
  st_transform(102271)  %>% 
  dplyr::select(variable, estimate, GEOID) %>%
  spread(variable, estimate) %>%
  rename(TotalPop = B01001_001,
         NumberWhites = B01001A_001) %>%
  mutate(percentWhite = NumberWhites / TotalPop,
         raceContext = ifelse(percentWhite > .5, "Majority_White", "Majority_Non_White")) %>%
  .[neighborhoods,]
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================| 100%

final_reg <- 
  filter(reg.summary, Regression == "Spatial LOGO-CV: Spatial Structure" |
           Regression == "Spatial LOGO-CV: Just Risk Factors" |
           Regression == "Random k-fold CV: Just Risk Factors" | 
           Regression == "Random k-fold CV: Spatial Structure") %>%
  mutate(uniqueID = rownames(.))

final_reg.tracts <- 
  st_join(st_centroid(final_reg), tracts17) %>%
  st_set_geometry(NULL) %>%
  left_join(dplyr::select(final_reg, uniqueID)) %>%
  st_sf() %>%
  na.omit()

st_set_geometry(final_reg.tracts, NULL) %>%
  group_by(Regression, raceContext) %>%
  summarize(mean.Error = mean(Error, na.rm = T)) %>%
  spread(raceContext, mean.Error) %>%
  kable(caption = "Mean Error by neighborhood racial context") %>%
  kable_styling("striped", full_width = F) 
Mean Error by neighborhood racial context
Regression Majority_Non_White Majority_White
Random k-fold CV: Just Risk Factors -0.0548167 0.0665390
Random k-fold CV: Spatial Structure 0.0162757 -0.0166149
Spatial LOGO-CV: Just Risk Factors -0.0615865 0.0935413
Spatial LOGO-CV: Spatial Structure 0.0236218 0.0049611

6.1 The model compared with traditional crime hotspots?

The following plots compared the risk predictions and traditional ‘kernel density’ hotspot mapping.Below a map is generated of the risk categories for both model types with test set points overlayed.It can be seen that the risk predictions do a better jobs becasue the highest risk category is uniquely targeted to places with a high density of burglary points.

Compared the two plots, it is notable to indicate that some high risk places with color yellow in kernel density maps has a low risk possibility in the risk predictions model. To some extent, the risk prediction model can achieve more detailed predictions.

library(raster)
theft_ppp <- as.ppp(st_coordinates(theft), W = st_bbox(final_net))
theft_KD.1000 <- spatstat::density.ppp(theft_ppp, 1000)
theft_KD.1500 <- spatstat::density.ppp(theft_ppp, 1500)
theft_KD.2000 <- spatstat::density.ppp(theft_ppp, 2000)
theft_KD.df <- rbind(
  mutate(data.frame(rasterToPoints(mask(raster(theft_KD.1000), as(neighborhoods, 'Spatial')))), Legend = "1000 Ft."),
  mutate(data.frame(rasterToPoints(mask(raster(theft_KD.1500), as(neighborhoods, 'Spatial')))), Legend = "1500 Ft."),
  mutate(data.frame(rasterToPoints(mask(raster(theft_KD.2000), as(neighborhoods, 'Spatial')))), Legend = "2000 Ft.")) 

theft_KD.df$Legend <- factor(theft_KD.df$Legend, levels = c("1000 Ft.", "1500 Ft.", "2000 Ft."))

theft_ppp <- as.ppp(st_coordinates(theft), W = st_bbox(final_net))
theft_KD <- spatstat::density.ppp(theft_ppp, 1000)

# Compute kernel density
theft_ppp <- as.ppp(st_coordinates(theft), W = st_bbox(final_net))
theft_KD <- spatstat::density.ppp(theft_ppp, 1000)
# Convert kernel density to grid cells taking the mean
theft_KDE_sf <- as.data.frame(theft_KD) %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(final_net)) %>%
  aggregate(., final_net, mean) %>%
#Mutate the Risk_Category field as defined below.
  mutate(label = "Kernel Density",
         Risk_Category = ntile(value, 100),
         Risk_Category = case_when(
           Risk_Category >= 90 ~ "90% to 100%",
           Risk_Category >= 70 & Risk_Category <= 89 ~ "70% to 89%",
           Risk_Category >= 50 & Risk_Category <= 69 ~ "50% to 69%",
           Risk_Category >= 30 & Risk_Category <= 49 ~ "30% to 49%",
           Risk_Category >= 1 & Risk_Category <= 29 ~ "1% to 29%")) %>%
#Bind to a layer where test set crime counts are spatially joined to the fisnnet.
  bind_cols(
    aggregate(
      dplyr::select(theft) %>% mutate(theftCount = 1), ., length) %>%
    mutate(theftCount = replace_na(theftCount, 0))) %>%
#Select the fields we need
  dplyr::select(label, Risk_Category, theftCount)

theft_risk_sf <-
  filter(final_reg, Regression == "Spatial LOGO-CV: Spatial Structure") %>%
  mutate(label = "Risk Predictions",
         Risk_Category = ntile(Prediction, 100),
         Risk_Category = case_when(
           Risk_Category >= 90 ~ "90% to 100%",
           Risk_Category >= 70 & Risk_Category <= 89 ~ "70% to 89%",
           Risk_Category >= 50 & Risk_Category <= 69 ~ "50% to 69%",
           Risk_Category >= 30 & Risk_Category <= 49 ~ "30% to 49%",
           Risk_Category >= 1 & Risk_Category <= 29 ~ "1% to 29%")) %>%
  bind_cols(
    aggregate(
      dplyr::select(theft) %>% mutate(theftCount = 1), ., length) %>%
      mutate(theftCount = replace_na(theftCount, 0))) %>%
  dplyr::select(label,Risk_Category, theftCount)

rbind(theft_KDE_sf, theft_risk_sf) %>%
  gather(Variable, Value, -label, -Risk_Category, -geometry) %>%
  ggplot() +
    geom_sf(aes(fill = Risk_Category), colour = NA) +
    geom_sf(data = sample_n(theft, 1500), size = .1, colour = "black") +
    facet_wrap(~label, ) +
    scale_fill_viridis(discrete = TRUE) +
    labs(title="Comparison of Kernel Density and Risk Predictions",
         subtitle="Relative to test set points (in black)") +
    mapTheme()

6.2 The comparison bar plot

The barchart show the rate of burglary points by risk category and model type.The model here narrowly edges out the kernel density in the highest risk category. It also indicates the risk prediction model is better than kernel density model.

rbind(theft_KDE_sf, theft_risk_sf) %>%
  st_set_geometry(NULL) %>%
  gather(Variable, Value, -label, -Risk_Category) %>%
  group_by(label, Risk_Category) %>%
  summarize(counttheft = sum(Value)) %>%
  ungroup() %>%
  group_by(label) %>%
  mutate(Rate_of_test_set_crimes = counttheft / sum(counttheft)) %>%
    ggplot(aes(Risk_Category,Rate_of_test_set_crimes)) +
      geom_bar(aes(fill=label), position="dodge", stat="identity") +
      scale_fill_viridis(discrete = TRUE)

7.1 Conclusion

Our risk prediction model uses a host of spatial structure variables to predicted the local burglary hot spots. The risk prediction with spatial structure present better with lower MAE and SD_MAE. It also present well when comparing with the kernel density, The risk prediction model can predict for in a smaller size cells to promote the model.

However,I will not recommend this risk prediction algorithm into production. Although it improved the accuracy of the prediction. It do not dealing well with the generalizability in racial context. Becasue the goal of our prediction is to identify the latent risk, that indicates the accuracy becomes less important than generalizability. The overpredict theft crime in Majority_Non_White neighborhoods and underpredict theft crime in Majority_White neighborhoods will leads problems if the model is in production. The prediction results will support resource allocation information, such as the allocation of police stations and where should officers patrol. This erroneous biased prediction is not only unfair, but may also provoke social conflicts.