SciDB for Climate Studies: A Virtual Global Land Observatory

For all our successes using SciDB to solve the Big Data problems of commercial users, it’s also very satisfying to learn SciDB is helping, even in some small way, to better understand our world and how human beings are affecting it. The Institute for Geoinformatics (IFGI) at the University of Münster (Germany) and Brazil’s National Institute for Space Research (INPE) are using SciDB for a global land observatory using open access remote sensing images (e.g., LANDSAT, MODIS, SENTINEL). Their goal is to essentially build a scientific data warehouse of global land change information from remote sensing data. They expect to be working with 1-3 petabytes of images, weather data, automatic sensors and related data. The principal investigators are Gilberto Camara at INPE and Alber Sanchez and Edzer Pebesma at the University of Münster.

Camara graphic

Professor Gilberto Camara shares some early results and details about their Global Land Observatory project as well as some very interesting work they did using SciDB to reproduce the data analysis informing an important but controversial paper published in Science. The original paper’s number crunching was cutting edge at the time for its scale and complexity. But notably, it took the Global Land Observatory team only 7 lines of SciDB code to conform the computations in the Science paper!

From Professor Camara:

We have been successful in setting up the installation SciDB to work with large data sets of Earth Observation data and have done a proof of concept experiment to show the potential of SciDB.

We did an experiment to reproduce an important but controversial paper published in Science. This is the paper “Amazon Forests Green-Up During 2005 Drought”, authored by Scott Saleska, Kamel Didan, Alfredo Huete and Humberto Rocha, published in Science in 2007.

The paper of Saleska et al. compares the average values of the EVI (enhanced vegetation index) of each pixel of MODIS satellite images during the most intense period of drought in the Amazon, which was in JAS 2005 (July-August-September 2005), with the average values of the EVI during JAS 2000 to 2006. The authors argue that, after filtering the pixels to the effects of aerosol and cloud cover, it is possible to detect statistically significant anomalies in the distribution of EVI.

The authors claim to have found out a significant number of pixels in JAS 2005 with EVI greater than the EVI average for JAS 2000-2006. In other words, in JAS 2005 a lot of pixels were “greener” than expected. Hence the title of the paper: “Amazon Forests Green-Up During 2005 Drought”.

The authors argue that: “These observations suggest that intact Amazon forests may be more resilient than many ecosystem models assume, at least in response to short-term climatic anomalies.”

The article generated a scientific debate in which its reproducibility has been questioned. So, our team wanted to solve the following question: what is the effort required to reproduce the paper of Saleska et al. using SciDB?

Our team put all MOD09Q1 data covering Brazil into a single multidimensional array in SciDB. Each MOD09Q1 image has data from the visible and infrared bands with additional quality information, from which we can compute the EVI and filter to get only valid pixels. Each image has 4,800 x 4,800 pixels, each with 250 meters x 250 meters spatial resolution. The full MOD09Q1 data set is available from NASA from 2000 to 2012 at 8 days temporal resolution. We then gathered 12 years of data (544 time steps) for the 22 MODIS MOD09Q1 images covering Brazil. In total, we combined 11,968 images into a single SciDB array of 2.75×10ˆˆ11 (275 billion) cells, each cell storing 3 values in double precision.

After placing the MODIS data in a single array, we used SciDB’s array processing language to reproduce the paper of Saleska et al. It took us only seven lines of command in SciDB to successfully process the MODIS data in 4.6 CPU hours in a moderately-sized environment. We used a single Ubuntu server, with one Intel Xeon 2.00 GHz CPU, with 24 cores and 128 GB memory. The server was running five instances of SciDB.

We prepared a presentation describing what we did and are now working to develop new methods for extracting land cover and land use change information for Brazil for the period 2000-2012 using SciDB data. We are very pleased with SciDB and we are convinced that it provides the current best technical solution to deal with large Earth Observation data.

Paradigm4 is pleased to help with such an important project and thanks Professor Camara for sharing this SciDB use case.

Subscribe for Newsletter