Automated, real-time data collection technology is rapidly revolutionizing environmental monitoring, generating a huge quantum of data that can be stored and viewed remotely without the need for frequent site visits. This capability provides the opportunity to increase the number of data points collected per dollar or decrease the dollars per data point, depending on the application. Whilst the advantages appear manifold there is a need for caution as there are a range of pitfalls that need to be considered before all monitoring applications are blindly telemetered and sent to the cloud. Here I present my five keys traps for environmental scientists and the how we deal with them here at HydroTerra:
1. When repetition ‘aint replication
The data produced from automated, telemetered monitoring stations produces reams and reams of data especially during multi-year studies. This can be an intoxicating experience for us environmental scientists, especially since we have been generally taught during interminable statistics lectures that more data is always better. Unfortunately, this is where pseudo-replication can rear its ugly head. Often due to budget constraints it is not possible to install the number of telemetry devices to provide statistically significant results, therefore the temptation exists to use repeated samples at the same location in place of multiple replicated sampling sites. This increases our confidence that the data collected is accurate for that individual location however it does not increase our confidence that the conditions expressed through the data are typical of the entire site. This is particularly important for quantitation studies where the environmental parameter could be variable across geographic space, for example, dissolved oxygen in aquatic studies or soil moisture across a catchment. On the other hand, it may be less of a factor for more homogeneous parameters, such as water level or gas concentration (especially when the parameter target of interest is presence or no presence). At HydroTerra we deal with this issue in three ways:
- Ensure that a thorough monitoring specification is developed before every study to ensure that the nature of each parameter is consistent with the methodology.
- Do not recommend automated sampling for every application, particularly where manual sampling is more accurate, more cost-effective or safer (see below).
- Bring new, innovative technologies online to reduce the cost of telemetry so that continuous data can be collected across a larger number of sites.
2. Sometimes there is no substitute for manually collected data
Automated data collection has been around in one form or another since the mid-19th century, however it has only been since the advent of cellular communication that broad scale roll-out of telemetry devices has been possible. The thought of collecting data from the comfort of your office is a tantalizing prospect, and anyone who has spent a cold, rainy day in the field would understand why. However, often it can lead to decision bias where methodological decisions are made based on the ability to telemeter the sensors rather than the needs of the project. For example, water quality studies investigating the effect of agricultural runoff often eschew measurement of nitrates in favour of turbidity, pH and DO simply due to the complexity of providing sensors for the former. On the other hand, a carefully planned traditional field sampling operation would provide accurate temporal and spatial variation in this critical parameter. Therefore, it is always important to keep in mind the objectives of the project.
3. Data integrity
Care must be taken to regularly validate data that is arriving from autonomous monitoring stations to ensure that it continues to be an accurate representation of the insitu conditions. Manual sampling can contribute its own fair share of error, including unrepresentative sampling, poor equipment calibration, transcriptional errors, and the list goes on. However, it has distinct advantages. For example, it ensures that any influential changes to the site itself are monitored as well as providing a natural data validation step when measurements are entered into the record-keeping system. In comparison, automated measurements can continue long after significant calibration creep or major perturbations to the site have occurred leading to erroneous data being analyzed or large volumes of data rendered useless. At HydroTerra we utilize real-time data validation models to automatically check for aberrant data as it arrives into our DataStream cloud platform to ensure data quality and coverage is maximized.
4. Stats trap: non-stationary data
Unfortunately, we must discuss statistics again… continuous, automated data collection provides data analytical advantages (or complications depending on how you look at it) due to the generation of time-series data rather than point-in-time or cross-sectional data. Environmental data is generally represented as an average and assumes that the underlying distribution of the data is normal with a set variance – known as stationary data. However, environmental data is often really non-stationary, affected by stochastic or random events which means the real average changes with time and conditions, and anything aside from a long-term average is unrepresentative. An example of this could be water level in storm water drains which can be affected by short- and long-term weather patterns, local random short-term events (e.g. burst water mains), and local long-term events (e.g. changes in hydrology.) This can be very difficult to quantify using irregular manual measurements, and virtually impossible when combined with statistics assuming stationary data. On the other hand, automated, continuous data provides us with the tools to accurately describe both stationary and non-stationary data sets which allow us to develop robust predictive models, for example estimate the likelihood of exceedance, rather than simple descriptive statistics. So where is the trap… well with extra data comes extra responsibility, and we need to increase the complexity and power of our data analysis to match the complexity and power of our data sets.
5. Clouded by cloud options
The final trap is one that haunts us in the post-internet world; choice. As environmental scientists we have been flooded with innumerable communication protocols for sending data from the field to the cloud, and numerous platforms for collecting and analyzing that data in the cloud itself. For many years the choice was relatively simple when it came to continuous monitoring; either collect the data locally in a datalogger, or send the data from the field via a radio or cellular network. Now the options have increased with new players every year, sparking the Internet of Things (IoT) revolution. Communication protocols available include Bluetooth, LORA, SIG-FOX, Neul and Zigbee, to name but a few, which provide options with varying power consumption, certification, data transfer rates, transmission distances… and price. The question is then, which one to choose? This decision alone can stall the roll-out of monitoring programs before they even get started! The solution to this problem, either review every protocol available in your country, or partner with an organization that specializes in partnering the right technology with the right project. At HydroTerra we take an agnostic approach to technology application, maintaining a suite of technologies such that we can tailor the right solution to each application.
For more information on smart cloud data management options for your environmental monitoring feel free to contact Alex Gervis.
Leave A Comment