GRASS Short Course -
Notes for Part 6: GRASS, R-stats and PostgreSQL

1) Installation and preparations

R-stats and extra packages installation We assume that you have a working installation of R-1.8 or later.

For the R/PostgreSQL connection we need two extra packages. Since 10/2003 there is a new Rdbi/PostgreSQL interface for R-stats. Download it from BioConductor (but not the old SourceForge version!):

Maybe soon the BioConductor version contains our fixes (should be Rdbi 1.0.2 or later)...

Interface installation (enter in shell terminal, appropriate version number):

R CMD INSTALL Rdbi_1.0.2.tar.gz
R CMD INSTALL RdbiPgSQL_1.0.2.tar.gz

Note: The RODBC interface in an alternative, but the data transfer is rather slow.

PostgreSQL installation

We assume that you have a working installation of PostgreSQL 7.3 or later. For GNU/Linux RPM packages are available, for Mac OSX packages from FINK.

2) Using PostgreSQL (PG)

The first step is to try if PostgreSQL is running - we start the 'psql' interactive terminal, a command-line tool to maintain and query PG databases:
psql template1
Welcome to psql 7.3.2, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=> \q

#maybe only user "postgres" is available, so start like this:
psql -U postgres template1
Welcome to psql 7.3.2, the PostgreSQL interactive terminal.
[...]
\q
If you see a similar message, PostgreSQL is running well. PostgreSQL is running a server, several clients (like 'psql' or MapServer or GRASS or ...) may connect, even simultaneously.

Server does not run? This requires 'root' permissions:

su

#start postgresql daemon:
service postgresql start
exit

#To enable the PostgreSQL server to launch after boot:
chkconfig --list postgresql
chkconfig postgresql on
chkconfig --list postgresql
#... should be changed to 'on' in some levels now.
Now retry above command.

Server running? Yes (maybe it tells you that the user name does not exist).

We add a new user now:

#become 'root':
su

#become special user 'postgres'
su postgres

#create new PostgreSQL user (warning, we don't use a passowrd here!):
createuser -a -d pablo
#... CREATE USER should be printed

exit
exit

Now retry to connect (see above, 'psql ...').

Then we explore a bit more (note that the TAB key expands command and table names). Every query has to be followed with semicolon:

psql template1
[...]
template1=> SELECT * from pg_user;
[...]

#show all tables (called 'relation' in SQL speech):
\dt
No relations found.

#what time is it?
SELECT CURRENT_TIMESTAMP;
          timestamptz
-------------------------------
 2003-11-01 13:24:42.960403+00
(1 row)

#get some help for all backslash commands:
\?
[...]

#get some help for all commands:
\h
Available help:
  ABORT                     CREATE TABLE              EXECUTE
[...]

#print help for 'CREATE TABLE' command:
\h CREATE TABLE
[...]

#get out of PostgreSQL:
\q

Now we want to create an own database.
This requires 'root' permissions:

su

#stop postgresql daemon (never ! modify databases with PostgreSQL still running):
service postgresql stop

#just notes, we won't do it now:
# NOTE 1: new users are added in: /var/lib/pgsql/data/pg_hba.conf
#         it is recommended to use "md5" encryption when defining users
# NOTE 2: if network access is desired, activate "tcpip_socket = true" in
#         /var/lib/pgsql/data/postgresql.conf

#so we start postgresql daemon again:
service postgresql start
#-> should print "ok"

#to create a new database, we login as 'postgres' user:
su postgres

createdb grasscourse
exit

#exit from root
exit
#now we are normal user again.

#Check available databases (the new DB 'grasscourse' should be listed):
psql -l
The still empty database 'grasscourse' is ready.

Now we create a new table within this database:

psql grasscourse

#Note: column definitions are comma separated, no comma after last column,
#to make it more sophisticated, we add a constraint (Y, N) to 
# column 'enjoyed' to only accepts these two values for that column:

CREATE TABLE cities
(
 id           integer,
 name         varchar(50),
 lastvisit    date,
 enjoyed      varchar(1) check (enjoyed in ('Y','N'))
);
# -> CREATE TABLE should be printed

#list all tables:
\d

#get column info:
\d cities

#Insert data into the new table (use single quotes!):
#Date order: YY-mm-dd
INSERT INTO cities VALUES ( 1, 'Trento', '2003-11-01', 'Y');
# -> INSERT some_numbers should be printed

INSERT INTO cities VALUES ( 2, 'Milano', '2003-10-12', 'N');
INSERT INTO cities VALUES ( 3, 'Povo', '2003-11-25', 'Y');

#test for the constraint:
INSERT INTO cities VALUES ( 4, 'Bolzano', '2003-10-23', 'A');
# -> should fail: 'A' not accepted

#test for the date
INSERT INTO cities VALUES ( 4, 'Bolzano', '2003-10-33', 'Y');
# -> should fail: day 33 not accepted

#now we query the table:
SELECT * FROM cities;
 id |  name  | lastvisit  | enjoyed
----+--------+------------+---------
  1 | Trento | 2003-11-01 | Y
  2 | Milano | 2003-10-12 | N
  3 | Povo   | 2003-11-25 | Y
(3 rows)

#now we query the table, ordered by one colum:
SELECT * FROM cities order by lastvisit;
 id |  name  | lastvisit  | enjoyed
----+--------+------------+---------
  2 | Milano | 2003-10-12 | N
  1 | Trento | 2003-11-01 | Y
  3 | Povo   | 2003-11-25 | Y
(3 rows)

#now we query the table, but only two columns:
SELECT name,lastvisit FROM cities order by lastvisit;
  name  | lastvisit
--------+------------
 Milano | 2003-10-12
 Trento | 2003-11-01
 Povo   | 2003-11-25
(3 rows)

#leave the 'psql' terminal:
\q

3) Using R-stats

Starting R-stats is pretty straightforward:
R

#it will appear something like this:
R : Copyright 2003, The R Development Core Team
Version 1.8.0  (2003-10-08)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.

>
Now you have reached the R prompt ('>'). Let's draw a plot and leave it:
plot(rnorm(123))
q()
Save workspace image? [y/n/c]: n

Since you are able to start and leave R, we can try a sample session:

R

#get help in terminal window:
?help
#use q key to leave the help, cursor and space to browse the text.

#open help in browser:
help.start()

1+1
# [1] 2

#NOTE: if you get lost while entering a command, CTRL-C is your friend to
#get back to the prompt.

x <- 1+1
x
# [1] 2

#make a sequence of 12 numbers:
seq(1,12)

#assign result to a variable:
x <- seq(1,12)

#print variable contents:
x
# [1]  1  2  3  4  5  6  7  8  9 10 11 12

#what's this variable:
str(x)

#simple statistics of this variable:
summary(x)

#see command help (in browser, if still open):
?summary

#make some categorical data (called "factors" in R speech):
rgb <- c("red", "green", "blue")
rgb
rgb[2]

#have some more:
infra <- c("nir", "mir", "fir")
thermal <- c("tir", NA, NA)

#for convenience we can handle variable composites in a "data frame":
landsat5 <- data.frame(rgb, infra, thermal)
landsat5
str(landsat5)


#Now we look at some sample maps:
# Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland
# volcanic field.  This data set gives topographic information for
# Maunga Whau on a 10m by 10m grid.
data(volcano)
str(volcano)
#... is a matrix of numerics

plot(volcano)
#...looks strange!

#but a nice map:
filled.contour(volcano, color = terrain.colors, asp = 1)
title(main = "Maunga Whau volcano: filled contour map")

Up to now we used only base functions and data. There are many extensions, called R libraries which provide enormous functionality. One is the "GRASS" library to link R and GRASS:

#load GRASS library (GRASS/R interface):
library(GRASS)

#there are sample data built-in, e.g. river Maas soil pollution data:
?utm.maas

#many commands in R provide an example. You can just run it:
example(utm.maas)

#get out of this R session:
q()
Save workspace image? [y/n/c]: n

Next task is to use the Rdbi interface to connect R and PostgreSQL:

R

library(Rdbi)
conn <- dbConnect(PgSQL(), host="localhost", dbname="grasscourse")

#list available tables:
dbListTables(conn)

Reading data (the table we created above) from PostgreSQL into R:

res <- dbSendQuery(conn, "select * from cities")
cities <- dbGetResult(res)
str(cities)
`data.frame':   3 obs. of  4 variables:
 $ id       : int  1 1 1
 $ name     : chr  "Trento" "Milano" "Povo"
 $ lastvisit: chr  "11-01-2003" "10-12-2003" "11-25-2003"
 $ enjoyed  : chr  "Y" "N" "Y"

#disconnect:
dbDisconnect(conn)

#print the downloaded data frame:
#     date: mm-dd-YYYY
cities
  id   name  lastvisit enjoyed
1  1 Trento 11-01-2003       Y
2  2 Milano 10-12-2003       N
3  3   Povo 11-25-2003       Y

#make the character date a real date:
tmp <- strptime(cities$lastvisit, "%m-%d-%Y")
tmp
[1] "2003-11-01" "2003-10-12" "2003-11-25"
#    YYYY-mm-dd is new date format, so identical to PostgreSQL order

#update in the data frame:
#cities$lastvisit <- strptime(cities$lastvisit, "%m-%d-%Y")

So far, so nice. Next step is to learn writing data from R to PostgreSQL (e.g. you results): For illustration we use some meteo data, a time series of Nottingham average air temperatures at Nottingham Castle in degrees F for 20 years (note that the MASS library is part of the VR bundle):

library(MASS)
data(nottem)

#let's look at the data (note: Fahrenheit, not Celsius!)):
str(nottem)
summary(nottem)
nottem
plot(nottem)

#connect to PostgreSQL:
library(Rdbi)
conn <- dbConnect(PgSQL(), host="localhost", dbname="grasscourse")

#write data to PG database as new table "nottingtemp":
dbWriteTable(conn, nottem, "nottingtemp")
dbDisconnect(conn)
Now run (in another shell) the 'psql' terminal to see the new table written from R:
psql grasscourse
\d
SELECT * from nottingtemp;
\q

GRASS and R

To use GRASS within R, you have to run R from within the GRASS shell. First start GRASS (Spearfish location):
grass57

#reset region:
g.region -dP

#within GRASS, start R:
R
[...]

library(GRASS)

#load the GRASS environment into R:
G <- gmeta()

# see current GRASS location settings:
summary.grassmeta(G)

# print list of available raster maps:
system("g.list rast")

#NOTE: You always need to specify "G" for the next commands!
#
#load the 'elevation.dem' raster map into R:
elev <- rast.get(G,"elevation.dem")
str(elev)

#look at the map:
plot(G, elev$elevation.dem)
plot(G, elev$elevation.dem, col=terrain.colors(20))

#draw elevation histogram:
library(MASS)
truehist(elev$elevation.dem, nbins=200)

#Is that normally distributed?
qqnorm(elev$elevation.dem)
qqline(elev$elevation.dem)
#-> no, deviates from qqline

#delete the object from R memory:
rm(elev)


#Now we import two raster maps into a data frame:
#cat: T (true) or F (false) enables to copy over category labels rather than
#numeric values:
spearfish <- rast.get(G, c("elevation.dem","vegcover"),cat=(c(F,T)))
str(spearfish)
summary(spearfish$elevation.dem)
summary(spearfish$vegcover.f)

#we can also plot the categorical 'vegcover' map, converting to numeric
#on-the-fly:
plot(G, unclass(spearfish$vegcover))

#get basic statistics on the distribution of vegetation related to 
#elevation:
tapply(spearfish$elevation.dem, spearfish$vegcover, summary)

#make a boxplot:
boxplot(tapply(spearfish$elevation.dem, spearfish$vegcover, summary))

#vice-verse (warning: long calculation...):
tapply(spearfish$vegcover, spearfish$elevation.dem, summary)

Simple regression exercise

For this exercise, we start GRASS with Spearfish location. You need the LANDSAT-TM7 surface temperature map ("temp_celsius_12Jul2000") from lecture 4. We want to study the correlation between temperature and elevation.
grass57

#set resolution/region to Celsius temperature map from lecture 4:
g.region rast=temp_celsius_12Jul2000 -p

#within GRASS, start R:
R
library(GRASS)
G <- gmeta()

#NOTE: You always need to specify "G" for the next commands!
#
#load the 'elevation.dem' raster map into R:
elev <- rast.get(G,"elevation.dem")
str(elev)

#load the 'temp_celsius_12Jul2000' raster map into R:
celsiusmap <- rast.get(G,"temp_celsius_12Jul2000")
str(celsiusmap)
#Note that R changes underscores to dots

#now we extract sample points. First get 200 coordinate pairs:
str(G)

#East:
E.random <- sample(G$xseq,200)
#North:
N.random <- sample(G$yseq,200)
#merge to a data frame:
randompoints <- data.frame(E.random,N.random)
str(randompoints)

#nice, but complicated. Easier to fetch the cell indexes:
randompoints <- sample(G$Ncells,200)

#now we query the maps at the randomly selected cell numbers:
elevsamples <- elev$elevation.dem[randompoints]
hist(elevsamples)
celsiussamples <- celsiusmap$temp.celsius.12Jul2000[randompoints]
hist(celsiussamples)
#...the sample data are ready.

#look at the plot:
plot(celsiussamples,elevsamples)

#Linear regression analysis (Linear Model):
lm(elevsamples ~ celsiussamples)

summary(lm(elevsamples ~ celsiussamples))
#For a related discussion, compare Dalgaard 2002, p.97ff

#plot in one graph data and regression line (note parameter order!):
plot(celsiussamples,elevsamples)
abline(lm(elevsamples ~ celsiussamples))

#The result is: temperature decreasing with height, but the linear 
#correlations is poor (as expected).

(Useless) Exercise: Time series of meteo data simulation

#simulate 4 time series for 4 stations, 500 days:
sim1 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim2 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim3 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim4 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))

#station heights:
height1 <- seq(122,122,l=500)
height2 <- seq(332,332,l=500)
height3 <- seq(1543,1543,l=500)
height4 <- seq(534,534,l=500)

#generate 500 days:
ISOdate(2002, 7, 11) - ISOdate(2001, 2, 26)
days <- seq(ISOdate(2001, 2, 26),ISOdate(2002, 7, 10), by="d")
days

#we need a special table structure:
tmpsim <- print(c(sim1,sim2,sim3,sim4))
tmpdate <- print(c(days,days,days,days))
tmpheight <- print(c(height1,height2,height3,height4))

#put all in one data frame:
meteo <- data.frame(tmpheight,tmpdate,tmpsim)
names(meteo) <- c("height","date","var")
rm(tmpheight,tmpdate,tmpsim)
str(meteo)

library(lattice)
xyplot(var ~ date  | height, data=meteo, type="l")


© 2003 Markus Neteler (neteler AT itc.it)
Back Course HOME
Last change: $Date: 2004-11-25 10:46:31 +0100 (Thu, 25 Nov 2004) $