Pansharpening in QGIS Using QTB
Posted: January 31, 2015 in GISTags: OTB, pansharpening, Processing, QGIS, QGIS Processing toolbox
A Dialogue With Bot
Posted: January 29, 2015 in This and ThatTags: bot, chat, chat-bot, scam, skype
Back in the days when ICQ was popular spam bots added you to their contact lists dozens times a day. But they were brute and ugly. At first they spew their spam links at your face when you added them, then they simply started to insert spam links in the body of the invitation. ICQ is long dead and spam-bots never bothered me in modern protocols… until some time ago. Read the rest of this entry »
How to Predict Where Will Next Disaster Strike?
Posted: January 19, 2015 in Spatial data, Sustainable Development, UncategorisedTags: climate change, disaster, environment, prediction, R, risk, science, spatial data
It is amusing coincidence that another MOOC that I took this week (Geospatial Intelligence & the Geospatial revolution) mentioned [natural] disasters. About the other course see my recent Disasters: Myth or the Reality post.
In Geospatial Intelligence they gave a weird assignment: one need to mark the location on the world map where the next international natural disaster will occur O_o. This is not and easy task by any means and the lecturer suggested to use one’s ‘gut feeling’ if one’s knowledge is insufficient (I suppose it is close to impossible to find someone who can make such a prediction taking into account all the types of the disasters). Though the link to the International Disasters Database was given, so I accepted the challenge (to make a data-driven prediction). To predict the exact location of the next disaster one would need a lot of data – far more that you can get out of that database so my goal was to make prediction at the country level. (BTW the graphs from my post about disasters seems to be based on the data from this database – I saw one of them at that site)
I passed a query to the database and saved the output to process it with R. The dataframe looks like this:
Example of disasters dataset |
So how to predict the country with the next disaster? I came up with the idea to calculate cumulative average occurrence of disasters per country per year and plot it on the graph to see the trends. If I would just calculate average occurrence of disasters per country for the whole time of the observations I would have significant issues choosing from countries that would have close numbers. Plus the total average disasters per year can be misleading by itself due to it can be high because of high amount of disasters in the beginning of XX century but relatively low number in XXI.
The formula for the calculation of the cumulative average for the given year that I used was:
Cumulative_Average = Total_Occurences / ( Given_Year – (Starting_Year – 1) ) ,
where: Total_Occurrences is the sum of occurrences of disasters for given country in time interval between the starting year and the given year (inclusive).
Here is the plot I got for the short-list countries (plotting the results for all the 180 countries from the dataset makes plot unreadable):
Cumulative average number of disasters |
It is clear that China and Indonesia are the two most likely candidates for the next disaster to strike, with a China having a lead. I’m not ready to provide insight on the reasons of the increasing number of natural disasters in the countries at the plot now (especially for Turkey and Iran). Maybe it is just that the events become documented more often?… It should be investigated further.
The code
Here is the code to create the plot above. ‘sqldf’ package was really helpful for divide data for the short list countries from the rest of 180 countries.
library(ggplot2) library(sqldf) library(grid) #library(gridExtra) # Load natural disasters data --------------------------------------------- dis <- read.csv("~/R/Disasters/Natural_disasters.csv") # Create data frame with average number of disasters per year ------------- average_events <- data.frame(country = character(), year = numeric(), disasters_per_year = numeric(), stringsAsFactors = F) countries <- unique(dis$country) starting_year <- min(dis$year) - 1 # we subtract 1 year to have numbers greater than 0 further on for (country in countries) { data <- dis[dis$country == country,] # we need data for one country at a time disasters_count <- 0 years <- unique(data$year) for (year in years) { total_years <- year - starting_year y_data <- data[data$year == year,] n_disasters <- sum(y_data$occurrence) disasters_count <- disasters_count + n_disasters average_disasters <- disasters_count / total_years row <- data.frame(country = country, year = year, disasters_per_year = average_disasters) average_events <- rbind(average_events, row) } } # Plot data about average number of disasters per country per year -------- # Data for 180 countries is hard to plot, lets filter mots affected. # Let's use SQL to query data: subset data for countries that had more than 0.6 disasters per year # in any year after 2000 danger <- sqldf('SELECT * FROM average_events WHERE country IN (SELECT DISTINCT country FROM average_events WHERE disasters_per_year >= 0.6 AND year > 2000)') p <- ggplot(danger, aes (x = year, y = disasters_per_year)) + geom_line(size = 1.2, aes(colour = country, linetype = country)) + labs(title = 'Cumulative average number of disasters per year', x = 'Year', y = 'Average number of disasters cumulative') + guides(guide_legend(keywidth = 3, keyheight = 1)) + theme(axis.text.x = element_text(angle=0, hjust = NULL), axis.title = element_text(face = 'bold', size = 14), title = element_text(face = 'bold', size = 16), legend.position = 'right', legend.title = element_blank(), legend.text = element_text(size = 12), legend.key.width = unit(1.5, 'cm'), legend.key.height = unit(1, 'cm')) plot(p)
Disasters: Myth or the Reality?
Posted: January 17, 2015 in Economics, Environmental Safety, Sustainable DevelopmentTags: climate change, damage, disaster, economics, education, environment, risk, sustainable development
I enrolled a MOOC titled “Disasters and Ecosystems: Resilience in a Changing Climate” which is organised by the UNEP (and other organisations… which names I’m going to learn by heart cause they have like 2 minutes of credits after each lecture ). Not that I know nothing about disasters, risks or climate change (I’m a geographer and ecologist after all), but I was curious about the product that was made by organisation of this class.
The third video (and first video that is not an introduction) they teach us about the disasters; differences between hazard and disaster; and risks. Well… the thing they told, the graphs they showed – that what inspired the title of this post.
Terminology
Disaster. When they say “disaster” they mean “natural disaster” that was enhanced by human [mismanagement].
Risk – a potential losses due to disasters.
Hazard – A dangerous phenomenon, substance, human activity or condition that may cause loss of life, injury or other health impacts, property damage, loss of livelihoods and services, social and economic disruption, or environmental damage.
Vulnerability – the characteristics and circumstances of a community, system or asset that make it susceptible to the damaging effects of a hazard
Fails
The risk
A Disaster with Disasters
Deaths dynamics |
Damage dynamics |
Conclusion
Do You Know What You Show at Your Map?
Posted: January 10, 2015 in Spatial dataTags: cyberthreat, fail, map, R, spatial data
As access to the GIS and mapping is becoming easier every year the more people and companies create maps. Unfortunately often they just do not know what they are actually showing at their maps. This issue is being mentioned over and over again.
Here is the example that I discovered recently: Cyberthreat Real-Time Map by Kaspersky antivirus company. Here how it looks like:
Amongst the other info they show the Infection rank for each country… based on total threats detected…. You may have already guessed what is the fail, but I let me explain it anyway.
See, the №1 infected country is Russia, which is the home country for Kaspersky and where this antivirus is quite popular. So we can conclude that the rankings that supposed to demonstrate the severity of virus activities merely demonstrates the number of Kaspersky software installations across the globe.
Lets test this hypothesis. I don’t have the data about the number of installation of Kaspersky software per country, but it is safe to assume that this number is proportional to the population of the given country. Also it is easier to get infection rankings for countries from the map than the number of the threats detected. If I had total threats data per country I would compare it to the population. Having infection rankings it is more rational to compare it to the population rankings instead. So I picked 27 random countries and compared their infection and population rankings. The result is demonstrated at the plot below:
Infection rank vs. Population rank |
The linear model is fairly close to Inrection rank = Population rank. It is clear that the phenomena that is presented as an Infection rank just reflects a total software installations per country and not the severity of the ‘cyberthreat’. In order to get the actual Infection rank the number of detected threats have to be normalised by the number of software installations.
How to Get Accurate Measurements in QGIS
Posted: December 14, 2014 in GISTags: measurements, projection, QGIS
This post is mainly my answer to the corresponding question.
The main rule for accurate measurements is to define correct CRS for your layers and project. This is somewhat wide theme to talk about so I won’t 😉
Lets just discuss what you should do when the project CRS can’t be easily used for measurement (like Web Mercator) and you want to use Measure tool to calculate distance or area. Go Settings -> Project Properties, in CRS tab enable on the fly transformation; in General tab in Measure tool menu choose ellipsoid for distance calculations. See picture:
When you start measurements with the Measure tool move your cursor over the Measure window – an information about measurement settings will pop up just like that:
Now you can make your measurements and be sure that they are accurate. NOTE that you should always check Measure tool settings like it shown at the picture above because Ellipsoid settings for the Measure tool are not always stored or displayed in Project settings correctly (QGIS 2.6.1).
A Flawed Research on a TSP Algorithm
Posted: November 30, 2014 in GISTags: algorithm, fraud, optimal route, science, Travelling Salesman Problem, TSP
A Travelling Salesman Problem (TSP) is a well known computational challenge. A lot of algorithms were developed to solve it or its special cases.
I came around an article authored by Fang Liu ‘A dual population parallel ant colony optimization algorithm for solving the travelling salesman problem’. In this article he proposed a modification of an Ant Colony System algorithm for solving TSP and presented results obtained by his algorithm. In the table with results all looked fine – his algorithm was able to provide very good solutions for the TSP instances from TSPLIB (which is the common testing ground for TSP algorithms).
So the researcher presented good results… it seems. But then he decided to show the best routes his algorithm was able to find and annotated them with the corresponding route costs. Lets take a look at one of them. Here you are his best route for the ‘att48’ instance from the TSPLIB:
Route that claims to be optimal (but the cost is very wrong) |
The optimal route for ‘att48’ and its cost is well-known (it applies to all TSPLIB instanses). Its cost is approximately 33523 (there are different approaches to rounding distances between points). So what we see at the picture above should be the optimal route (or extremely close to it). But dear reader, do you think that you see optimal route? Humans are able to provide very good solutions to TSP instances that consists of not too many points. I bet you can draw far better route yourself. The route from this picture is 1, 8, 46, 33, 20, 17, 43, 27, 19, 37, 6, 30, 36, 28, 7, 18, 44, 31, 38, 9, 40, 15, 12, 11, 47, 21, 13, 25, 14, 23, 3, 22, 16, 41, 34, 2, 29, 5, 48, 39, 32, 24, 42, 10, 45, 35, 4, 26, 1 and its cost is 41052 which is whooping 22% far from optimal! The same story for another illustration in the article.
Here take a look at the optimal route which cost is really 33523:
Actually optimal route for ‘att48’ with Cost = 33523 |
So what we can conclude? I do believe that the routes demonstrated in the article are the best routes found by given algorithm, but costs are put-up for the routes and for the table of results in the article as well. I think that author developed an algorithm that wasn’t able to find good solutions and provided fraud table with the put-up testing results. And clearly this article wasn’t reviewed by scientist that have knowledge in TSP area because these plots are so obviously flawed that you can’t overlook it!
No one usually shows plots of the routes they find with their algorithms. I wonder if there are more modern algorithms with the put-up results?
How to Create Delauney Triangulation Graph from a .shp-file Using NetworkX
Posted: November 2, 2014 in GIS, Spatial dataTags: Delauney, GIS, Graph, NetworkX, Python, science
Example of use:
import networkx as nx import scipy.spatial import matplotlib.pyplot as plt path = '/directory/' f_path = path + 'filename.shp' G = nx.read_shp(f_path) GD = createTINgraph(G, show = True)
Code for the function:
import networkx as nx import scipy.spatial import matplotlib.pyplot as plt def createTINgraph(point_graph, show = False, calculate_distance = True): ''' Creates a graph based on Delaney triangulation @param point_graph: either a graph made by read_shp() from another NetworkX's point graph @param show: whether or not resulting graph should be shown, boolean @param calculate_distance: whether length of edges should be calculated @return - a graph made from a Delauney triangulation @Copyright notice: this code is an improved (by Yury V. Ryabov, 2014, riabovvv@gmail.com) version of Tom's code taken from this discussion https://groups.google.com/forum/#!topic/networkx-discuss/D7fMmuzVBAw ''' TIN = scipy.spatial.Delaunay(point_graph) edges = set() # for each Delaunay triangle for n in xrange(TIN.nsimplex): # for each edge of the triangle # sort the vertices # (sorting avoids duplicated edges being added to the set) # and add to the edges set edge = sorted([TIN.vertices[n,0], TIN.vertices[n,1]]) edges.add((edge[0], edge[1])) edge = sorted([TIN.vertices[n,0], TIN.vertices[n,2]]) edges.add((edge[0], edge[1])) edge = sorted([TIN.vertices[n,1], TIN.vertices[n,2]]) edges.add((edge[0], edge[1])) # make a graph based on the Delaunay triangulation edges graph = nx.Graph(list(edges)) #add nodes attributes to the TIN graph from the original points original_nodes = point_graph.nodes(data = True) for n in xrange(len(original_nodes)): XY = original_nodes[n][0] # X and Y tuple - coordinates of the original points graph.node[n]['XY'] = XY # add other attributes original_attributes = original_nodes[n][1] for i in original_attributes.iteritems(): # for tuple i = (key, value) graph.node[n][i[0]] = i[1] # calculate Euclidian length of edges and write it as edges attribute if calculate_distance: edges = graph.edges() for i in xrange(len(edges)): edge = edges[i] node_1 = edge[0] node_2 = edge[1] x1, y1 = graph.node[node_1]['XY'] x2, y2 = graph.node[node_2]['XY'] dist = sqrt( pow( (x2 - x1), 2 ) + pow( (y2 - y1), 2 ) ) dist = round(dist, 2) graph.edge[node_1][node_2]['distance'] = dist # plot graph if show: pointIDXY = dict(zip(range(len(point_graph)), point_graph)) nx.draw(graph, pointIDXY) plt.show() return graph
Interactive Visualisation of the Profitable Amount of Waste to Dispose Illegally
Posted: October 1, 2014 in Economics, Environmental Safety, researches, Society, Sustainable DevelopmentTags: illegal dumping, illegal dumps and landfills, Leningrad region, R, science, St. Petersburg, waste
I quickly came up with an idea of creating interactive plot for my simple model for assessment of the profitable ratio between the volume waste that could be illegally disposed and costs of illegal disposal [Ryabov Y. (2013) Rationale of mechanisms for the land protection from illegal dumping (an example from the St.-Petersburg and Leningrad region). Regional Researches. №1 (39), p. 49-56]. The conditions for profitable illegal dumping can be describes as follows:
Here: k – the probability of being fined for illegal disposal of waste;
P – maximum fine for illegal disposal of waste (illegal dumping);
V – volume of waste to be [illegally] disposed by the waste owner;
E – costs of illegal disposal of waste per unit;
T – official tax for waste disposal per unit.The conditions for the profitable landfilling can be described as follows:
Here: V1 – total volume of waste that is supposed to be disposed at illegal landfill;
Tc – tax for disposal of waste at illegal landfill per unit;
P1 – maximum fine for illegal landfilling;
E1 – expenditures of the illegal landfill owner for disposal of waste per unit.
Lets plot the graphs (with some random numbers (except for fines) for a nice looking representation) to have a clue how it looks like.
Note that there is a footnote (this post provides nice examples on how to do it) with the values used for plotting – it is important to have to have this kind of indication if we want to create a series of plots.
Now I will show you the result and then will provide the code and some tips.
Playing with the plot |
Tips and Tricks
Before I will show you code I want to share my hardly earned knowledge about nuances of the manipulate library. There are several ways to get static plot like that using ggplot, but some of them will fail to be interactive with manipulate.
- All the data for the plot must be stored in one dataframe.
- All data for plots must be derived from the dataframe (avoid passing single variables to ggplot).
- Do not use geom_hline() for the horizontal line – generate values for this line and store them inside dataframe and draw as a regular graph.
- To create a footnote (to know exactly which parameters were used for the current graph) use arrangeGrob() function from the gridExtra library.
- Always use $ inside aes() settings to address columns of your dataframe if you want plots to be interactive
The Code
<pre class="brush: r; title: ; notranslate" title="">library(ggplot2) library(grid) library(gridExtra) library(manipulate) library(scales) library(reshape2) ## Ta --- official tax for waste utilisation per tonne or cubic metre. ## k --- probability of getting fined for illegal dumping the waste owner (0
The Most Hilarious Rendering “Bug” I’ve Ever Seen
Posted: September 13, 2014 in GIS, This and ThatTags: QGIS
Can you guess the projection that I used to display a world map at the picture below (EPSG code is left there because I doubt someone would recall it anyway)?
It is the World Bonne projection in QGIS (original shp’s CRS is EPSG:4326). I already had some fun with Bonne in this post. The world got torn apart at scale 1:75 000 000. At bigger scales map renders normally:
Fortunately this issue can be easily resolved by disabling “feature simplification by default for newly added layers” under Settings -> Options -> Rendering.