Posts Tagged ‘GIS’

A Quick Map With QGIS and OSM

Posted: June 18, 2015 in GIS
Tags: , , ,

What I love about QGIS is that one is able to create a nice map quickly. The other day I was asked to make a situation map for the project we are working on to include it into presentation. Аll I had was a laptop with no relevant spatial data at all, but with QGIS installed (I even had no mouse to draw something). Though it was more than enough: I loaded OSM as a base layer and used annotation tool to add more sense to it. Voilà:

Advertisements

Posted: June 11, 2015 in GIS
Tags: , , , ,
This is strange, but I was unable to find instruction about importing QGIS layers into PostGIS database with PyQGIS API. The PyQGIS cookbook has the example of exporting layers as .shp-files via QgsVectorFileWriter.writeAsVectorFormat() function and says that the other other OGR-supported formats are available to use in this function as well. PostGIS is supported by OGR so people got confused and try to use this function to import data to the PostGIS with no success and have to write generic import functions. 


After of couple of hours of searching the internet for the solution I gave up and decided to find the answer the hard way: started to search through the source code for the DB Manager plugin that has this nice “import layer” feature. It took about 20 minuets to trace down the function that was in charge of the import and it was QgsVectorLayerImport.importLayer(). But the quest wasn’t over yet! The documentation says nothing about provider names that are accepted by this function. “PostgreSQL” would be the obvious name for the provider as it is the name for the PostgeSQL provider in OGR, but it is not the case. I had to go through the source code of DB Manager again and luckily in comments (and there are quite a few of them in DB Manager: I didn’t find a single doc-string there) the author wrote that it is “postgres” for the PostgreSQL.

Now here is the very basic example code of importing QGIS layer into PostGIS:

uri = "dbname='test' host=localhost port=5432 user='user' password='password' key=gid type=POINT table=\"public\".\"test\" (geom) sql="
crs = None
# layer - QGIS vector layer
error = QgsVectorLayerImport.importLayer(layer, uri, "postgres", crs, False, False)
if error[0] != 0:
    iface.messageBar().pushMessage(u'Error', error[1], QgsMessageBar.CRITICAL, 5)
For my own project I needed to create a graph based on a Delauney triangulation using NetworkX python library. And a special condition was that all the edges must be unique. Points for triangulation stored in a .shp-file.I was lucky enough to find this tread on Delauney triangulation using NetworkX graphs. I made a nice function out of it for point NetworkX graphs processing. This function preserves nodes attributes (which are lost after triangulation) and calculates lengths of the edges. It can be further improved (and most likely will be) but even at the current state it is very handy.

Example of use:

import networkx as nx
import scipy.spatial
import matplotlib.pyplot as plt

path = '/directory/'
f_path = path + 'filename.shp'
G = nx.read_shp(f_path)

GD = createTINgraph(G, show = True)

Code for the function:

import networkx as nx
import scipy.spatial
import matplotlib.pyplot as plt

def createTINgraph(point_graph, show = False, calculate_distance = True):
  '''
  Creates a graph based on Delaney triangulation

  @param point_graph: either a graph made by read_shp() from another NetworkX's point graph
  @param show: whether or not resulting graph should be shown, boolean
  @param calculate_distance: whether length of edges should be calculated
  @return - a graph made from a Delauney triangulation

  @Copyright notice: this code is an improved (by Yury V. Ryabov, 2014, riabovvv@gmail.com) version of
                    Tom's code taken from this discussion
                    https://groups.google.com/forum/#!topic/networkx-discuss/D7fMmuzVBAw
  '''

  TIN = scipy.spatial.Delaunay(point_graph)
  edges = set()
  # for each Delaunay triangle
  for n in xrange(TIN.nsimplex):
      # for each edge of the triangle
      # sort the vertices
      # (sorting avoids duplicated edges being added to the set)
      # and add to the edges set
      edge = sorted([TIN.vertices[n,0], TIN.vertices[n,1]])
      edges.add((edge[0], edge[1]))
      edge = sorted([TIN.vertices[n,0], TIN.vertices[n,2]])
      edges.add((edge[0], edge[1]))
      edge = sorted([TIN.vertices[n,1], TIN.vertices[n,2]])
      edges.add((edge[0], edge[1]))


  # make a graph based on the Delaunay triangulation edges
  graph = nx.Graph(list(edges))

  #add nodes attributes to the TIN graph from the original points
  original_nodes = point_graph.nodes(data = True)
  for n in xrange(len(original_nodes)):
    XY = original_nodes[n][0] # X and Y tuple - coordinates of the original points
    graph.node[n]['XY'] = XY
    # add other attributes
    original_attributes = original_nodes[n][1]
    for i in original_attributes.iteritems(): # for tuple i = (key, value)
      graph.node[n][i[0]] = i[1]


  # calculate Euclidian length of edges and write it as edges attribute
  if calculate_distance:
    edges = graph.edges()
    for i in xrange(len(edges)):
      edge = edges[i]
      node_1 = edge[0]
      node_2 = edge[1]
      x1, y1 = graph.node[node_1]['XY']
      x2, y2 = graph.node[node_2]['XY']
      dist = sqrt( pow( (x2 - x1), 2 ) + pow( (y2 - y1), 2 ) )
      dist = round(dist, 2)
      graph.edge[node_1][node_2]['distance'] = dist


  # plot graph
  if show:
    pointIDXY = dict(zip(range(len(point_graph)), point_graph))
    nx.draw(graph, pointIDXY)
    plt.show()

  return graph
 
Schema of the Conservation Areas in Leningradskaya Region

About a year ago I was asked to create a small (a B5 size) and simple schema of the conservation areas in Leningradskaya region. I did it using QGIS. Here you are the author version of the schema and several notes that might be helpful for a beginner map-maker.

There was a huge disproportion between areas of different objects and both polygon and point markers were needed to show them. I decided to use Centroid Fill option in polygon style to be able to use only one layer (polygon) instead of two (polygon and point). Using Centroid Fill makes points in centres of the small polygons overlap and mask these tiny polygons.

All the administrative borers were stored in one layer (and there are far more borders than one see here). They are drawn using rule-based symbology so I didn’t even need to subset this layer to get rid of the rest of the polygons in this layer.

All the names of the surrounding countries, regions, city and the water bodies are not labels derived from layers, but labels created inside the map composer. It was quicker and their position was more precise which is crucial for such a small schema.

There was a lack of space for the legend so I had to utilise every bit of canvas to place it. I had to use 3 legend items here. One of them is actually overlapping the other and setting a transparent background for the legends was helping with that.

Finally labels for the conservation areas (numbers) were outlined with white colour to be perfectly readable. Some of them were moved manually (with storing coordinates in special columns of the layer) to prevent overlapping with other labels and data.

P.S. Don’t be afraid to argue with the client about the workflow. Initially I was asked it manually digitise a 20 000 x 15 000 pixels raster map of the Leningrad region to extract the most precise borders of the conservation areas (and districts of the region). Of course I could do it and charge more money for the job, but what’s the point if some of that borders are not even to be seen at this small schema? So I convinced client to use data from OSM and saved myself a lot of time.

This is my third post in this series, here are the links to the Part 1 and Part 2. Sorry for the delay, but at that time I got completely consumed with my PhD thesis and this story just went out of my mind.
To the point. There were 2 types of shadows I had to deal with: cloud shadows and cast shadows (from building and trees). In scientific literature you can find several relatively complicated algorithms to deal with them (including DEM usage for cast shadow detection) and I definitely should have used some of them, but… I was too lazy to implement them when there were couple of quick’n’dirty workarounds available.

Cloud shadow removal

To remove cloud shadows I decided to calculate ratio between mean pixel value of the lighted up and shadowed parts of the raster for each band and multiply shadowed pixels by this ratio.  I manually created a mask for shadowed areas. There were 3 areas and their borders were quite distinct in NIR channels:
Cloud shadows in one of the NIR bands
I drew several polygons over shadows in QGIS and converted them to raster (Raster -> Conversion -> Rasterise). Now everything was ready for patching:
library(raster)
library(rgdal)

# load rasters
image <- stack('2013_IEEE_GRSS_DF_Contest_CASI.tif') # hyper spectral imagery
s_mask <- raster('shadow_mask.tif') # mask for cloud shadow

# extract shadowed area
m_cloud <- s_mask
m_cloud[m_cloud < 1] <- NA
shadowed <- mask(image, m_cloud)

# calculate zonal statistics
zd <- zonal(image, s_mask, fun = 'mean', na.rm = T)
zd <- as.data.frame(zd)

# calculate ratio between lighted up and shadowed zones
zd[3,] <- zd[1,] / zd[2,] 

# recalculae data in shadows
multiplyer <- zd[3,]
multiplyer <- multiplyer[,2:ncol(zd)]
multiplyer <- as.numeric(multiplyer)
enlight <- shadowed*multiplyer # patch layer
restored <- cover(enlight, image, progress = 'text')

# save result
wr <- writeRaster(restored, 
                  filename = 'restored.tif', 
                  datatype = 'GTiff', overwrite = TRUE)

Now it’s time to check out results:
The same NIR band after cloud shadow removal
Yes, we can easily find patched areas, but overall result is good for such simple approach.

Cast shadow removal

Shadows from buildings can be processed the same way as cloud shadows as shown above except mask creation approach is different. In case of cast shadows they have to be detected automatically (for there are a bit more than 3 as in case with clouds). Using one of the NIR bands I created a learning sample of points and passed it to Random Forest algorithm to classify imagery into shadowed and non-shadowed areas. I will not describe here this classification process for Random Forest was used for overall classification of the imagery and it is the subject of my next post of this series.

My post about modification of extent and resolution of rasters drew quite a bit of attention and I decided to make a small New Year’s present to the community and create a QGIS Processing script to automate the process.

The script was designed to be used within Processing module of QGIS. This script will make two rasters of your choice to have the same spatial extent and pixel resolution so you will be able to use them simultaneously in raster calculator. No interpolation will be made – new pixels will get predefined value. Here is a simple illustration of what it does:

Modifications to rasters A and B

To use my humble gift simply download this archive and unpack files from ‘scripts‘ folder to your …/.qgis2/processing/scripts folder (or whatever folder you configured in Processing settings). At the next start of QGIS you will find a ‘Unify extent and resolution’ script in ‘Processing Toolbox’ in ‘Scripts’ under ‘Raster processing’ category:

If you launch it you will see this dialogue:

Main window

Note that ‘Help’ is available:

Help tab

Lets describe parameters. raster 1and raster 2are rasters that you want to unify. They must have the same CRS. Note that both output rasters will have the same pixel resolution as raster 1′.

‘replace No Data values with’ will provide value to pixels that will be added to rasters and replace original No Data values with the value provided here. Note that in output rasters this value will not be saved as No Data value, but as a regular one. This is done to ease feature calculations that would include both of this rasters, but I’m open to suggestions and if you think that No Data attribute should be assigned I can add corresponding functionality.

Finally you mast provide a path to the folder where the output rasters will be stored in ‘output directory’ field. A ‘_unified’ postfix will be added to the derived rasters file names: ‘raster_1.tif’ -> ‘raster_1_unified.tif’

If CRSs of rasters won’t match each other (and you will bypass built-in check) or  an invalid pass will be provided a warning message will pop up and the execution will be cancelled:

Example of the warning message

When the execution is finished you will be notified and asked if rasters should be added to TOC:

Happy New Year!

P.S. I would like to thank Australian government for making the code they create an open source. Their kindness saved me a couple of hours.

I decided to make my script for counting unique values in raster more usable. Now you can use it via SEXTANTE in QGIS. Download script for SEXTANTE and extract archive to the folder that is is intended for your Python SEXTANTE scripts (for examlple ~./qgis2/processing/scripts). If you don’t know where this folder is located go Processing -> Options and configuration -> Scripts -> Scripts folder, see the screenshot: 

Processing options

Now restart QGIS and in SEXTNTE Toolbox go to Scripts. You will notice new group named Raster processing. There the script named Unique values count will be located:

Launch it and you will see its dialogue window:

Unique values count script main window
Main window

 

Note that Help is available:

Unique values count script help tab

 

Either single- or multi-band rasters are accepted for processing. After the raster is chosen (input field) one need to decide whether to round values for counting or not. If no rounding is needed – leave ‘round values to ndigits‘ field blank. Otherwise enter needed value there. Note that negative values are accepted – this will round values to ndigits before decimal point.

When the set up is finished hit the Run button. You will get something like this:

Result window
Result window

How to Count Unique Values in Raster

Posted: August 17, 2013 in GIS
Tags: , , ,

Recently I demonstrated how to get histograms for the rasters in QGIS. But what if one need to count exact number of the given value in a raster (for example for assessment of classification results)? In this case the scripting is required.

We’ll use Python-GDAL for this task (thanks to such tutorials as this one) it is extremely easy to learn how to use it). The script I wrote seems to be not optimal (hence its working just fine and quickly so you may use it freely) due to it is not clear to me whether someone would like to use it for the floating data types (which seems to be not that feasible due to great number of unique values in this case) or such task is performed only for integers (and in this case code might be optimised). It works with single and multi band rasters.

How to use

The code provided below should be copied into a text file and saved with ‘.py’ extension. Enable Python console in QGIS, hit “Show editor” button in console and open the script file. Then replace ‘raster_path’ with the actual path to the raster and hit ‘Run script button’. In the console output you will see sorted list of unique values of raster per band:

QGIS Python console view
Script body (to the right) and its output (to the left)

Script itself

#!/usr/bin/python -tt
 #-*- coding: UTF-8 -*-'''
 #####################################################################
 This script is designed to compute unique values of the given raster
 #####################################################################
 '''

from osgeo import gdal
 import sys
 import math

path = "raster_path"

gdalData = gdal.Open(path)
 if gdalData is None:
 sys.exit( "ERROR: can't open raster" )

# get width and heights of the raster
 xsize = gdalData.RasterXSize
 ysize = gdalData.RasterYSize

# get number of bands
 bands = gdalData.RasterCount

# process the raster
 for i in xrange(1, bands + 1):
 band_i = gdalData.GetRasterBand(i)
 raster = band_i.ReadAsArray()

# create dictionary for unique values count
 count = {}

# count unique values for the given band
 for col in range( xsize ):
 for row in range( ysize ):
 cell_value = raster[row, col]

# check if cell_value is NaN
 if math.isnan(cell_value):
 cell_value = 'Null'

# add cell_value to dictionary
 try:
 count[cell_value] += 1
 except:
 count[cell_value] = 1

# print results sorted by cell_value
 for key in sorted(count.iterkeys()):
 print "band #%s - %s: %s" %(i, key, count[key]) 

Introduction

This is the second part of my post series related to hyper-spectral and LiDAR imagery using R. See other parts:

In this part I will describe my general approach to classification process and then I will show you how to create cool spectral response plots like this:

spectral response graph Colour of boxes corresponds to the band wavelength

Classification approach

I decided to use Random Forest algorithm for the per-pixel classification. Random Forest is a reliable machine-learning algorithm and I already successfully used it for my projects. Notice that the winner of the contest used Random Forest too, but with per-object classification approach.Random Forest is implemented in R in ‘randomForest‘ package and cool thing is that ‘raster‘ package is able to implement it to rasters. But not only initial 144 hyper-spectral bands + LiDAR were ‘feeded’ to Random Forest, but in addition several indices and DSM derivatives were used. Here is what I did for classification:

  1. Cloud shadow identification and removal.
  2. Self-casted shadow identification and removal.
  3. Creation of the spectral profiles for the 16 classes. This needed to choose which bands will be used for indices calculation.
  4. Indices calculation: NDVI, NHFD, NDWI.
  5. DSM processing: elevated objects extraction, DSM to DTM conversion, calculation of slope, aspect, normalised DEM, topographic position and terrain roughness indices.
  6. Random Forest model creation and adjustment. Classification itself.
  7. Final filtering using python-GDAL.
  8. Result evaluation.

Spectral profiles

For calculation of indices one need to choose the bands as an input. Which one to pick if there are 144 bands in hyper-spectral image? The spectral profile graphs which should help with that. But these won’t be common spectral profiles that we all used to, like this one:
Spectral profile from one of my researches
We need to see dispersion of band values for each class much more than mean values, because dispersion will significantly affect classification. If some classes will have overlapping values in the same band it will decrease accuracy of classification. Ok, at our spectral profile we should see wavelength, band number and dispersion of given classes to make a decision which band to pick for indices calculation. This means that box plots will be used instead of linear plots. Also boxes should be coloured accordingly to the corresponding wavelength

Needed libraries

library(ggplot2)
library(reshape2)
library(raster)
library(rgdal)

Load raster and take samples using .shp-file with classes

# Load raster
image &lt;- stack('~/EEEI_contest/all_layers.tif')

# Load point .shp-file with classes
s_points_dFrame &lt;- readOGR( '~/EEEI_contest',
                    layer="sampling_points_shadow",
                    p4s="+proj=utm +zone=15 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0")
s_points &lt;- SpatialPoints(s_points_dFrame)
dFrame &lt;- as.data.frame(s_points_dFrame)

# Extract data at sampling points
probe &lt;- extract(image, s_points, method='bilinear')

# Combine sampling results with original data 
sample &lt;- cbind(dFrame, probe)

Spectral profile creation

At this point we have ‘sample’ dataframe that looks like this:
‘pattern’ is a class name and ‘pattern_id’ is the integer that corresponds to each class. These fields, as well as ‘lat’ and ‘lon’ belonged to the shp-file. Ohter fields were created with ‘extract’ function.
# get numbers for bands instead of names in headers of columns
for (i in c(7:ncol(sample))){
  colnames(sample)[i] &lt;- i-6
}

Creation of the proper names for bands and palette creation

# preparation for palette creation (establish wavelengths)
palette &lt;- c()
violet &lt;- c(380:450)
blue &lt;- c(452:495)
green &lt;- c(496:570)
yellow &lt;- c(571:590)
orange &lt;- c(591:620)
red &lt;- c(621:750)
NIR &lt;- c(750:1050)

# process band names (future captions) and palette
for (i in c(7:150)){
  # calculate wavelength for a given band
  wave &lt;- (i-7)*4.685315 + 380
  wave &lt;- round(wave, digits = 0)

  # rename colunms in 'sample'
  band_num &lt;- paste('(band', i-6, sep = ' ')
  band_num &lt;- paste(band_num, ')', sep = '')
  colnames(sample)[i] &lt;- paste(wave, band_num, sep = ' ')

  # add value to palette
  if (is.element(wave, violet)) {palette &lt;- c(palette, '#8F00FF')   
  } else if (is.element(wave, blue)) {palette &lt;- c(palette, '#2554C7')   
  } else if (is.element(wave, green)) {palette &lt;- c(palette, '#008000')   
  } else if (is.element(wave, yellow)) {palette &lt;- c(palette, '#FFFF00')   
  } else if (is.element(wave, orange)) {palette &lt;- c(palette, '#FF8040')   
  } else if (is.element(wave, red)) {palette &lt;- c(palette, '#FF0000')   
  } else if (is.element(wave, NIR)) {palette &lt;- c(palette, '#800000')   
  }

}

# Remove unneeded fields
sample &lt;- subset(sample, select = 1:150)
sample[, 5] &lt;- NULL
sample[, 5] &lt;- NULL
sample &lt;- sample[,3:148]
Now our dataframe looks like this:
Reshape dataframe for future plotting:
sample &lt;- melt(sample, id = c('pattern', 'pattern_id'))
Now our dataframe looks like this:
We want to create spectral profiles for each class. This means we need to create plotting function. Keep in mind that ‘ggplot2’ doesn’t work with local variables. We need to create generic plotting function and run it in a cycle that will create global subsets from our dataframe:
plotting &lt;- function(data_f, plot_name){
  p &lt;- ggplot(data_f, aes(data_f[,3], data_f[,4])) +
       geom_boxplot(aes(fill = data_f[,3]),
                    colour = palette,
                    # make outliers have the same colour as lines    
                    outlier.colour = NULL) +
       theme(axis.text.x = element_text(angle=90, hjust = 0),
             axis.title = element_text(face = 'bold', size = 14),
             title = element_text(face = 'bold', size = 16),
             legend.position = 'none') +
       labs(title = paste('Spectral response for class\n',
                          plot_name,
                          sep = ' '),
            x = 'Wavelength, nm',
            y = 'Response')+
       scale_fill_manual(values = palette)
  print(p)
}
Final stage: plotting
# get unique classes

# uniue class numbers:
u &lt;- unique(merge[,'pattern_id'])
# unique class names:
u_names &lt;- unique(unlist(merge[,'pattern']))

for (i in u){
  # works only if class numbers (u) are consequtive integers from 1 to u
  data_f &lt;- subset(merge, pattern_id == i)  
  plot_name &lt;- u_names[i]
  plotting(data_f, plot_name)
}
If you are curious here you are the rest of the plots:

Introduction


This us the first post of my series notes about hyperspectral imagery classification. See other parts:
Part 2: Classification Approach and Spectral Profiles Creation

There was the EEEI Data Fusion Contest this spring. This year they wanted people to elaborate about hyper-spectral (142-bands imagery) and LiDAR data. The resolution of the data-set was about 5 m.  There were 2 nominations: best classification and  the best scientific paper.  

I work with high-resolution imagery quite often, but classification is a very rear task for me though. I thought that this contest was a great opportunity to develop my skills. And not just a classification skills, but R skills as well… I decided to participate in best classification contest, and to use R for the most part. 

I learned a lot and I will share my knowledge with you in a series of posts starting with this one. And like in some great novels, I will start from the very end – evaluation of my results.

Results of my classification (created in R, designed in QGIS)

Contest results

are available here. As you may notice, I’m not on the list of the 10 best classification 😉 But there is almost unnoticeable 0.03% difference between my result (85.93% accuracy) and the result of the 10-th place. Not a bad result, don’t you think? And I know, that I could have done better – I had 99% prediction accuracy for the training samples. It’s funny, but my classification map looks better than map that took 7-th place!

How to evaluate classification using R

Due to I was not on the top ten list, I had to evaluate the result on my own. The organisers finally disclosed evaluation samples and I got a chance for a self assessment. So we have a set of .shp-files – each contains ground-truth polygons for one of the 16 classes and a classification map. We need to accomplish 3 goals:

  1. Create a visual representation of missclassification.
  2. Assess accuracy.
  3. Create a confusion matrix.
  4. Visualise classification map using EEEI colour palette.

Lets get a palette!

Official EEEI palette

To extract colour values from the palette above you may use GIMP. But I used a widget that every KDE-user (Linux) should have by default. You can probe and save colour values from any part of your screen. Quite useful!

‘Color picker’ in work

Now let’s see the code for our tasks.

Load needed libraries

library(rgdal)
library(raster)
library(reshape2)
library(caret)
library(ggplot2)

Load and process data

# get classification raster
ras <- raster('~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/raster.tif',
              verbose = T)
# get list of shp-files for evaluation
shapes <- list.files(path = '~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi', pattern = '*shp')

# a list for accuracy assessment
accuracy_list <- list()

# create an empty dataframe to be filled vith evaluation results
# field names are not arbitrary!!!
eval_df <- data.frame(variable = character(),
                      value = character())

# create an empty dataframe to be used for plotting
# field names are not arbitrary!!!
plot_data <- data.frame(variable = character(),
                        value = character(),
                        Freq = integer())

for (f_name in shapes) {
  # delete '.shp' from the filename
  layer_name <- paste(sub('.shp', '', f_name))
  class <- readOGR("~/GIS/IEEE_contest_2013/2013_IEEE_GRSS_DF_Contest/roi",
                   layer = layer_name,
                   verbose = F)

  # extract values from raster
  probe <- extract(ras, class)

  # replace class numbers with names 
  samples <- list()
  for (lis in probe) {
    for (value in lis) {
      if (value == 0) {c_name <- Unclassified
      } else if (value == 1) {c_name <- 'Healthy grass'
      } else if (value == 2) {c_name <- 'Stressed grass'
      } else if (value == 3) {c_name <- 'Synthetic grass'
      } else if (value == 4) {c_name <- 'Trees'
      } else if (value == 5) {c_name <- 'Soil'
      } else if (value == 6) {c_name <- 'Water'
      } else if (value == 7) {c_name <- 'Residential'
      } else if (value == 8) {c_name <- 'Commercial'
      } else if (value == 9) {c_name <- 'Road'
      } else if (value == 10) {c_name <- 'Highway'
      } else if (value == 11) {c_name <- 'Railway'
      } else if (value == 12) {c_name <- 'Parking Lot 1'
      } else if (value == 13) {c_name <- 'Parking Lot 2'
      } else if (value == 14) {c_name <- 'Tennis Court'
      } else if (value == 15) {c_name <- 'Running Track'} 
      samples <- c(samples, c = c_name)
    }
  }

  # make layer_name match sample name
  if (layer_name == 'grass_healthy') {layer_name <- 'Healthy grass'
  } else if (layer_name == 'grass_stressed') {layer_name <- 'Stressed grass'
  } else if (layer_name == 'grass_syntethic') {layer_name <- 'Synthetic grass'
  } else if (layer_name == 'tree') {layer_name <- 'Trees'
  } else if (layer_name == 'soil') {layer_name <- 'Soil'
  } else if (layer_name == 'water') {layer_name <- 'Water'
  } else if (layer_name == 'residental') {layer_name <- 'Residential'
  } else if (layer_name == 'commercial') {layer_name <- 'Commercial'
  } else if (layer_name == 'road') {layer_name <- 'Road'
  } else if (layer_name == 'highway') {layer_name <- 'Highway'
  } else if (layer_name == 'railway') {layer_name <- 'Railway'
  } else if (layer_name == 'parking_lot1') {layer_name <- 'Parking Lot 1'
  } else if (layer_name == 'parking_lot2') {layer_name <- 'Parking Lot 2'
  } else if (layer_name == 'tennis_court') {layer_name <- 'Tennis Court'
  } else if (layer_name == 'running_track') {layer_name <- 'Running Track'} 

  # create a dataframe with classification results
  df <- as.data.frame(samples)
  dfm <- melt(df, id = 0)
  dfm['variable'] <- layer_name

  # add data to evaluation dataframe
  eval_df <- rbind(eval_df, dfm)

  # assess accuracy of current class
  mytable <- table(dfm)
  dmt <- as.data.frame(mytable)
  total_samples <- 0
  correct_predictions <- 0
  for (i in 1:nrow(dmt)) {
    predict_class <- toString(dmt[i,2])
    pc_frequency <- dmt[i,3]
    if (predict_class == layer_name) {
      correct_predictions <- dmt[i,3]
    }
    total_samples <- total_samples + pc_frequency
  }
  accuracy <- round(correct_predictions/total_samples, 2)
  accuracy_list <- c(accuracy_list, c = accuracy)

  # append data for plotting
  plot_data <- rbind(plot_data, dmt)
}

Create a fancy graph (that is shown on the map)

# create facets plot
EEEI_palette <- c('#D4D4F6',
                  '#5F5F5F',
                  '#710100',
                  '#00B300',
                  '#007900',
                  '#014400',
                  '#008744',
                  '#D0B183',
                  '#BAB363',
                  '#DAB179',
                  '#737373',
                  '#A7790A',
                  '#00EA01',
                  '#CA1236',
                  '#00B9BB')
plot_data <- plot_data[order(plot_data$variable),]
p = ggplot(data = plot_data,
           aes(x = factor(1),
               y = Freq,
               fill = factor(value)
           )
)
p <- p + facet_grid(facets = . ~ variable)
p <- p + geom_bar(width = 1) +
  xlab('Classes') +
  ylab('Classification rates') +
  guides(fill=guide_legend(title="Classes"))+
  scale_fill_manual(values = EEEI_palette)+
  ggtitle('Classification Accuracy')+
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
p

Let’t finally get our accuracy result

# accuracy assessment
fin_accuracy <- mean(unlist(accuracy_list))
fin_accuracy <- paste(round(fin_accuracy*100, 2), '%', sep = '')
print(paste('Total accuracy:', fin_accuracy), sep = ' ')
[1] "Total accuracy: 85.93%"

Confusion matrix

# confusion matrix creation
true <- as.factor(eval_df$variable)
predict <- as.factor(eval_df$value)
confusionMatrix(predict, true)

Enjoy statistics!

Confusion Matrix and Statistics

                 Reference
Prediction        Commercial Healthy grass Highway Parking Lot 1
  Commercial             850             0      14           151
  Healthy grass            0           868       0             0
  Highway                  0             0     718            14
  Parking Lot 1           41             0      54           641
  Parking Lot 2            0             0      19            73
  Railway                  0             0      11            44
  Residential            155             0     191             0
  Road                     0             0      20           107
  Running Track            0             0       0             0
  Soil                     0             0       1             0
  Stressed grass           0            61       0             0
  Synthetic grass          0             0       0             0
  Tennis Court             0             0       0             0
  Trees                    0           117       0             0
  Water                    0             0       0             0
                 Reference
Prediction        Parking Lot 2 Railway Residential Road Running Track
  Commercial                  0       0          74    2             0
  Healthy grass               0       0           0    0             0
  Highway                     0       9           0    0             0
  Parking Lot 1             104       8           0   11             0
  Parking Lot 2             155       4           0    3             0
  Railway                     2     917          11   61             0
  Residential                 0      40         918    0             0
  Road                       12      14           0  930             0
  Running Track               0       0           1    0           465
  Soil                        6       2           0   27             1
  Stressed grass              0      11           4    0             0
  Synthetic grass             0       0           0    0             3
  Tennis Court                0       3           0    0             0
  Trees                       0       0          10    0             0
  Water                       0       0           0    0             0
                 Reference
Prediction        Soil Stressed grass Synthetic grass Tennis Court Trees
  Commercial         0              0               0            0     0
  Healthy grass      0             14               0            0    34
  Highway            0              0               0            0     0
  Parking Lot 1      0              0               0            0     0
  Parking Lot 2      0              0               0            0     0
  Railway            0              0               0            0     0
  Residential        0              1               0            0     0
  Road              14              0               0            0     0
  Running Track      0              0               0            0     0
  Soil            1040              0               0            0     0
  Stressed grass     0            931               0            0    17
  Synthetic grass    0              0             503            0     0
  Tennis Court       0              0               0          245     0
  Trees              0             85               0            0  1004
  Water              0              0               0            0     0
                 Reference
Prediction        Water
  Commercial          0
  Healthy grass       3
  Highway             0
  Parking Lot 1      22
  Parking Lot 2       0
  Railway             0
  Residential         0
  Road                0
  Running Track       0
  Soil                0
  Stressed grass      0
  Synthetic grass     0
  Tennis Court        0
  Trees               0
  Water             118

Overall Statistics

               Accuracy : 0.859         
                 95% CI : (0.853, 0.866)
    No Information Rate : 0.088         
    P-Value [Acc > NIR] : <2e-16        

                  Kappa : 0.847         
 Mcnemar's Test P-Value : NA            

Statistics by Class:

                     Class: Commercial Class: Healthy grass Class: Highway
Sensitivity                     0.8126               0.8298         0.6984
Specificity                     0.9780               0.9953         0.9979
Pos Pred Value                  0.7791               0.9445         0.9690
Neg Pred Value                  0.9820               0.9839         0.9724
Prevalence                      0.0872               0.0872         0.0857
Detection Rate                  0.0709               0.0724         0.0599
Detection Prevalence            0.0910               0.0767         0.0618
                     Class: Parking Lot 1 Class: Parking Lot 2
Sensitivity                        0.6223               0.5556
Specificity                        0.9781               0.9915
Pos Pred Value                     0.7276               0.6102
Neg Pred Value                     0.9650               0.9894
Prevalence                         0.0859               0.0233
Detection Rate                     0.0535               0.0129
Detection Prevalence               0.0735               0.0212
                     Class: Railway Class: Residential Class: Road
Sensitivity                  0.9097             0.9018      0.8994
Specificity                  0.9883             0.9647      0.9848
Pos Pred Value               0.8767             0.7034      0.8478
Neg Pred Value               0.9917             0.9906      0.9905
Prevalence                   0.0841             0.0849      0.0862
Detection Rate               0.0765             0.0766      0.0776
Detection Prevalence         0.0872             0.1088      0.0915
                     Class: Running Track Class: Soil
Sensitivity                        0.9915      0.9867
Specificity                        0.9999      0.9966
Pos Pred Value                     0.9979      0.9656
Neg Pred Value                     0.9997      0.9987
Prevalence                         0.0391      0.0879
Detection Rate                     0.0388      0.0867
Detection Prevalence               0.0389      0.0898
                     Class: Stressed grass Class: Synthetic grass
Sensitivity                         0.9030                 1.0000
Specificity                         0.9915                 0.9997
Pos Pred Value                      0.9092                 0.9941
Neg Pred Value                      0.9909                 1.0000
Prevalence                          0.0860                 0.0420
Detection Rate                      0.0777                 0.0420
Detection Prevalence                0.0854                 0.0422
                     Class: Tennis Court Class: Trees Class: Water
Sensitivity                       1.0000       0.9517      0.82517
Specificity                       0.9997       0.9806      1.00000
Pos Pred Value                    0.9879       0.8257      1.00000
Neg Pred Value                    1.0000       0.9953      0.99789
Prevalence                        0.0204       0.0880      0.01193
Detection Rate                    0.0204       0.0837      0.00984
Detection Prevalence              0.0207       0.1014      0.00984

As you may see, the main source of misclassification are Parking Lot 1 and Parking Lot 2. The accuracy for other classes is above 90%, and it is great. Frankly, I still don’t understand what is the difference between Parking Lots 1  and 2… Official answer was that Parking Lot 2 is cars (isn’t detecting them using 5 m resolution imagery is a questionable task???)… But it seems that it is something else. It is hard to classify something that you don’t understand what it is…