Machine learning in earth sciences

Applications of machine learning (ML) in earth sciences include geological mapping, gas leakage detection and geological feature identification. Machine learning is a subdiscipline of artificial intelligence aimed at developing programs that are able to classify, cluster, identify, and analyze vast and complex data sets without the need for explicit programming to do so. Earth science is the study of the origin, evolution, and future of Earth. The planet's system can be subdivided into four major components including the solid earth, atmosphere, hydrosphere, and biosphere.

Significance

Complexity of earth science Problems in earth science are often complex. Ecological data are commonly non-linear and consist of higher-order interactions, and together with missing data, traditional statistics may underperform as unrealistic assumptions such as linearity are applied to the model. A number of researchers found that machine learning outperforms traditional statistical models in earth science, such as in characterizing forest canopy structure, predicting climate-induced range shifts, and delineating geologic facies. Characterizing forest canopy structure enables scientists to study vegetation response to climate change. Predicting climate-induced range shifts enable policy makers to adopt suitable conversation method to overcome the consequences of climate change. Inaccessible data In earth sciences, some data are often difficult to access or collect, therefore inferring data from data that are easily available by machine learning method is desirable. Incorporation of remote sensing and machine learning approaches can provide an alternative solution to eliminate some field mapping needs. A recency effect that is present in humans is that the classification often biases towards the most recently recalled classes. In a labelling task of the research, if one kind of dinoflagellates occurs rarely in the samples, then expert ecologists commonly will not classify it correctly. The systematic bias strongly deteriorate the classification accuracies of humans. == Optimal machine learning algorithm ==

Optimal machine learning algorithm

The extensive usage of machine learning in various fields has led to a wide range of algorithms of learning methods being applied. Choosing the optimal algorithm for a specific purpose can lead to a significant boost in accuracy: For example, although an SVM yielded the best result in landslide susceptibility assessment accuracy, the result cannot be rewritten in the form of expert rules that explain how and why an area was classified as that specific class. In contrast, decision trees are transparent and easily understood, and the user can observe and fix the bias if any is present in such models. If computational resource is a concern, more computationally demanding learning methods such as deep neural networks are less preferred, despite the fact that they may outperform other algorithms, such as in soil classification. == Usage ==

Usage

Mapping Geological or lithological mapping and mineral prospectivity mapping Geological or lithological mapping produces maps showing geological features and geological units. Mineral prospectivity mapping utilizes a variety of datasets such as geological maps and aeromagnetic imagery to produce maps that are specialized for mineral exploration. Geological, lithological, and mineral prospectivity mapping can be carried out by processing data with ML techniques, with the input of spectral imagery obtained from remote sensing and geophysical data. Spectral imaging is also used – the imaging of wavelength bands in the electromagnetic spectrum, while conventional imaging captures three wavelength bands (red, green, blue) in the electromagnetic spectrum. Random forests and SVMs are some algorithms commonly used with remotely-sensed geophysical data, while Simple Linear Iterative Clustering-Convolutional Neural Network (SLIC-CNN) Landslide susceptibility mapping can highlight areas prone to landslide risks, which is useful for urban planning and disaster management. according to the study requirements. As usual, for training an ML model for landslide susceptibility mapping, training and testing datasets are required. Rock fractures can be recognized automatically by machine learning through photogrammetric analysis, even with the presence of interfering objects such as vegetation. In ML training for classifying images, data augmentation is a common practice to avoid overfitting and increase the training dataset size and variability. Carbon dioxide leakage from a geological sequestration site can be detected indirectly with the aid of remote sensing and an unsupervised clustering algorithm such as Iterative Self-Organizing Data Analysis Technique (ISODATA). The increase in soil CO2 concentration causes a stress response for plants by inhibiting plant respiration, as oxygen is displaced by carbon dioxide. The vegetation stress signal can be detected with the Normalized Difference Red Edge Index (NDRE). system is a widely adopted rock mass classification system by geomechanical means with the input of six parameters. The amount of water inflow is one of the inputs of the classification scheme, representing the groundwater condition. Quantification of the water inflow in the faces of a rock tunnel was traditionally carried out by visual observation in the field, which is labour and time-consuming, and fraught with safety concerns. The classification of the approach mostly follows the RMR system, but combining damp and wet states, as it is difficult to distinguish only by visual inspection. The test is carried out by pushing a metallic cone through the soil: the force required to push at a constant rate is recorded as a quasi-continuous log. Forecast and predictions Earthquake early warning systems and forecasting Earthquake warning systems are often vulnerable to local impulsive noise, therefore giving out false alerts. Earthquakes can be produced in a laboratory settings to mimic real-world ones. With the help of machine learning, the patterns of acoustic signals as precursors for earthquakes can be identified. Predicting the time remaining before failure was demonstrated in a study with continuous acoustic time series data recorded from a fault. The algorithm applied was a random forest, trained with a set of slip events, performing strongly in predicting the time to failure. It identified acoustic signals to predict failures, with one of them being previously unidentified. Although this laboratory earthquake is not as complex as a natural one, progress was made that guides future earthquake prediction work. Streamflow discharge prediction Real-time streamflow data is integral for decision making (e.g., evacuations, or regulation of reservoir water levels during flooding). Streamflow data can be estimated by data provided by stream gauges, which measure the water level of a river. However, water and debris from flooding may damage stream gauges, resulting in lack of essential real-time data. The ability of machine learning to infer missing data enables it to predict streamflow with both historical stream gauge data and real-time data. Streamflow Hydrology Estimate using Machine Learning (SHEM) is a model that can serve this purpose. To verify its accuracies, the prediction result was compared with the actual recorded data, and the accuracies were found to be between 0.78 and 0.99. == Challenge ==

Challenge

Inadequate training data An adequate amount of training and validation data is required for machine learning. Such amount of data may not be adequate. In a study of automatic classification of geological structures, the weakness of the model is the small training dataset, even though with the help of data augmentation to increase the size of the dataset. as the model learns about the noise and undesired details. Limited by data input Machine learning cannot carry out some of the tasks as a human does easily. For example, in the quantification of water inflow in rock tunnel faces by images for Rock Mass Rating system (RMR), 'White-box' approach such as decision tree can reveal the algorithm details to the users. If one wants to investigate the relationships, such 'black-box' approaches are not suitable. However, the performances of 'black-box' algorithms are usually better. == References ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com