Presented paper

IMWA2019 Students work

Advanced Genetic Sequencing Methods for Climate Data Analysis: a Novel Approach to Expanding and Understanding Climate Datasets

Swanson, Sophie (1); Stokes, Christian (2)
1: Global Resource Engineering, United States of America; 2: Pixel Gear, United States of America

Most mines use synthetic precipitation data to predict future mine water scenarios. These synthetic data sets are based on probability distributions and data analysis of previous climate data. However, the standard method for generating synthetic data can average out peaks, and can fail to capture long streaks of wet or dry periods that often exist in real life. Additionally, many sites lack a sufficient monitoring period to accurately make a synthetic data series.

The result can often be a water balance or hydraulic design based on insufficient data or one that unpredicts for the most important types of climate conditions – extreme rain over long periods, and extreme drought. As the climate becomes more variable due to global climate change, the rigorous analysis of historic data over different time period intervals is increasingly important to accurately manage mine water.

Mathematical techniques developed for genetic engineering (specifically transcriptome assembly) has been found to be very successful at processing synthetic climate data. This is because genetic sequencing must create contiguous data sets by expanding large, repetitive non-contiguous data using the trends and dynamics of the non-contiguous data set.

Transcriptome assembly converts the input dataset into base four (to match base pairs) and looks for unique sequences of 25 numbers . The unique sequence occurrences are counted and the most prevalent becomes the basis for the assembly. The sequence is extended by finding the most prevalent sequence that matches 24 contiguous numbers and extends the sequences by N+1. This is done with rapid computer interactions and mass-sampling of data clusters.

This method effectively expands small data sets into large ones while retaining the dynamics of the smaller set. In addition, the same technique can be used to process large data sets. In this case, instead of synthetically expanding limited data, the transcriptome assembly method can accurately identify useful and important multi-time step sequences that frequently occur.

The applications are either that one can create a high-quality synthetic dataset when limited data is available, or one can accurately identify multiple-day events that can greatly impact the hydrology of a site. In the latter case, and example could include correctly identifying the impact of infrequent intensive rainfall on dry soils (a condition which greatly increases runoff.)