East Anglia and Mease Crop Data for Building Land Use Models
============================================================

Creators
--------

| Paton, Lewis (Durham University, UK)
| Troffaes, Matthias (Durham University, UK)
| Boatman, Nigel (Food and Environment Research Agency, UK)
| Hussein, Muhamed (Food and Environment Research Agency, UK)
| Hart, Andy (Food and Environment Research Agency, UK)

Subject
-------

environment, geography, statistics

Description
-----------

Introduction
~~~~~~~~~~~~

This dataset contains:

1. Field level crop data (year, field code, and crop code)
   provided by the Integrated Administration and Control System (IACS) database,
   (see for instance
   http://ec.europa.eu/agriculture/direct-support/iacs/index_en.htm)
   This data was collected by the Rural Payments Agency under the
   integrated administration and control system for the administration
   of subsidies under the common agricultural policy.

2. Soil data provided by the National Soil Map of England and Wales
   (see http://www.landis.org.uk/data/natmap.cfm).

3. Rainfall data provided by the MET office
   (see http://www.metoffice.gov.uk/climate/uk/stationdata/).
   We choose September rainfall as this generally coincides
   with the drilling period for winter wheat.

4. Historical nitrogen price data provided by Dairyco (see
   http://dairy.ahdb.org.uk/resources-library/market-information/farm-expenses/fertiliser-prices).

5. Profit margin and yield data
   provided by the John Nix Farm Management Pocketbook
   (see http://www.thepocketbook.biz/).

The data covers the years 1993-2004 inclusive.

The data focusses on just 2 of the 159 National Character Areas (NCAs)
in England,
namely the East Angian Chalk NCA (83870 hectares, 4650 fields)
and the Mease/Sence Lowlands NCA (32353 hectares, 2094 fields).
Both of these regions are predominantly arable farming,
making them ideal candidates for analysis.

The crop and soil data were merged using standard GIS software. For
simplicity, fields at soil boundaries (thus containing more than one
soil type) were omitted from the analysis.

Crop Data
~~~~~~~~~

The crop codes in the IACS database are incredibly detailed,
it contains over 160 unique crops. 
For example, there are five different types of wheat,
depending on the breed or growing process. 
As this level of detail is unnecessary for our interests,
and unmanageable in terms of inferences from a statistical model,
following Luo's 2010 internal report on land use modelling
at the Food and Envirnoment Research Agency,
we grouped similar crops together. For example:

========== ============ =======================
Crop group Abbreviation IACS Crop codes
========== ============ =======================
Wheat      WH           WH1, WH2, WH3, SP1, BU1
Barley     BA           BA1
Rapeseed   RA           RA1, RA6, RA7, RA8
Beans      BE           BE3, BE7, BE9, BE12
Peas       PE           PE1, PE5, PE9
Other      OT           any other codes
========== ============ =======================

All other crop types were grouped into a single category "Other".

The data was collected from administrative returns submitted by farmers
in order to receive subsidy payments.
There is some missing data.
The data was anonymized: the field codes have been replaced by random
integers in random order.

Soil data
~~~~~~~~~

Soil type is a significant driver of crop choice.
Different crops are more suited to certain soil types, 
due to nutrient requirements and drainage capabilities. 
Therefore, it is essential that we include soil type in our analysis. 

The national soilscapes survey and the 
National Soil Map of England and Wales
provides a database of soil type in our regions of interest. 
There are 27 soilscapes in England,
which describe the main properties of the soil,
and 16 of these are present in our NCAs. 
The database is, like crop type, 
too high a resolution than is needed,
or indeed feasible, for this study. 
There are similarities between some soilscapes,
meaning we can group them together.
The drainage characteristics of the soil are a major
factor in the crop choice, as they determine the water holding capacity
of the soil, which determines the amount of water available to the crop.
As such, grouping together soils with similar drainage capabilities
seems very logical. Following expert opinion, we classify the
soilscapes into the following three categories:

* Light soils: free drainage
* Medium soils: slightly impeded drainage
* Heavy soils: impeded drainage

The full classification is as follows:

======= =================================================================================== ===========
ID      Soilscape                                                                           Class
======= =================================================================================== ===========
3       Shallow lime-rich soils over chalk or limestone                                     Light
5       Freely draining lime-rich loamy soils                                               Light
6       Freely draining slightly acid loamy soils                                           Light
7       Freely draining slightly acid but base-rich soils                                   Light
8       Slightly acid loamy and clayey soils with impeded drainage                          Medium
9       Lime-rich loamy and clayey soils with impeded drainage                              Medium
10      Freely draining slightly acid sandy soils                                           Light
11      Freely draining sandy Breckland soils                                               Light
13      Freely draining acid loamy soils over rock                                          Light
17      Slowly permeable seasonally wet acid loamy and clayey soils                         Heavy
18      Slowly permeable seasonally wet slightly acid but base-rich loamy and clayey soils  Heavy
20      Loamy and clayey floodplain soils with naturally high groundwater                   Heavy
22      Loamy soils with naturally high groundwater                                         Medium
23      Loamy and sandy soils with naturally high groundwater and a peaty surface           Medium
24      Restored soils mostly from quarry and opencast soil                                 Medium
27      Fen peat soils                                                                      Medium
======= =================================================================================== ===========

The Anglia region is dominated by light soils, 
whereas Mease/Sence is dominated by heavy soils.
One of the reasons these two particular NCAs were chosen was to 
give a good spread of all three soil types across both regions. 

We also provide soil type by anonymized field code,
which is sufficient to replicate our entire analysis.

Profit Margin Data
~~~~~~~~~~~~~~~~~~

The data source we use for these margins is the
John Nix Farm Management Pocketbook.
This is a yearly series of predictions for various variables
of agricultural economy.
Each edition gives estimates of gross margin per hectare, for
each crop, for the following year. 
We could of course use actual observed selling prices for crops.
However, we want to model a farmers decision making process:
when they choose what crop to grow, they do not know what the selling price
will be in the coming year.
It is for this reason we use this predicted data. 

Yield data
~~~~~~~~~~

Yield affects the profit margin too, 
and this depends on a number of factors, 
including the skill level of the farmer; climatic factors; 
soil productivity, and so on. 
The John Nix books reflect this by including
predicted profit margins for three different levels of yield:
low, medium and high,
which reflect, for each crop, 
historic average levels of productions.
However, we have no information as to what level 
of production actually occurred in our data. 
We assumed that the production level was
always medium, with one exception:
farmers sometimes plant wheat repeatedly,
which can reduce the yield 
(due to the build up of disease).
As such, in this specific case, we will assume the production level is low. 

Nitrogen data
~~~~~~~~~~~~~

Variable costs also affect the profit margin. 
A large part of this component is fertiliser cost,
and nitrogen is the nutrient in 
fertilisers that is most directly related to crop yield. 
We used the mean yearly indexed nitrogen price provided by Dairyco.

Rainfall data
~~~~~~~~~~~~~

Climate plays a role in determining a farmers crop choice.
The Met Office provides
free data from 37 weather stations
around the United Kingdom,
and they have many more stations which provide data at a cost. 
These weather stations provide a number of climatic variables, 
including monthly rainfall. 
As the regions are small,
we used the wheather station closest to each region.

Publication Year
~~~~~~~~~~~~~~~~

2015

Cross References
~~~~~~~~~~~~~~~~

Matthias C. M. Troffaes and Lewis Paton. Logistic regression on Markov
chains for crop rotation modelling. In F. Cozman, T. Denœux,
S. Destercke, and T. Seidenfeld, editors, ISIPTA'13: Proceedings of
the Eighth International Symposium on Imprecise Probability: Theories
and Applications, pages 329-336, Compiègne, France, July
2013. SIPTA.
http://www.sipta.org/isipta13/index.php?id=paper&paper=033.html

Lewis Paton, Matthias C. M. Troffaes, Nigel Boatman, Mohamud Hussein,
and Andy Hart. Multinomial logistic regression on Markov chains for
crop rotation modelling. In Anne Laurent, Oliver Strauss, Bernadette
Bouchon-Meunier, and Ronald R. Yager, editors, Proceedings of the 15th
International Conference IPMU 2014 (Information Processing and
Management of Uncertainty in Knowledge-Based Systems, 15-19 July 2014,
Montpellier, France), volume 444 of Communications in Computer and
Information Science, pages 476-485. Springer, 2014.
http://dx.doi.org/10.1007/978-3-319-08852-5_49

Lewis Paton, Matthias C. M. Troffaes, Nigel Boatman, Mohamud Hussein,
and Andy Hart. A robust Bayesian analysis of the impact of policy
decisions on crop rotations. In ISIPTA'15: Proceedings of the Eighth
International Symposium on Imprecise Probability: Theories and
Applications, Pescara, Italy, July 2015.
http://www.sipta.org/isipta15/

Lewis Paton. A robust Bayesian land use model for crop rotations. PhD
thesis, Durham University, 2015.

Data sets and source code
~~~~~~~~~~~~~~~~~~~~~~~~~

The zip file contains the following files:

| description.txt
| data/crop_anglia.csv
| data/crop_mease.csv
| data/nitrogen.csv
| data/profit.csv
| data/rain_anglia.csv
| data/rain_mease.csv
| data/soil_anglia.csv
| data/soil_mease.csv
| code/crop_data.py
| code/dirichlet_prior.py
| code/expectation.py
| code/LICENSE.txt
| code/logistic_prior.py
| code/main.py
| code/markov_chain.py
| code/shell.nix
| code/smooth.py
| code/test_data_crop.csv
| code/test_data_nitrogen.csv
| code/test_data_profit.csv
| code/test_data_rain.csv
| code/test_data_soil.csv
| code/test_num_extremes.py
| code/testplotprior.gnuplot
| code/test_smooth.py

The main.py replicates the analyis of our ISIPTA'15 paper. The nix
shell script is provided to quickly get all dependencies for running
the code.
