Uncertainty quantification in classification problems: A Bayesian approach for predicating the effects of further test sampling. : 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019

Phillipson, J. and Blair, G.S. and Henrys, P. and Office, CSIRO; CUBIC; eWater; NSW Goverrnmet (2019) Uncertainty quantification in classification problems: A Bayesian approach for predicating the effects of further test sampling. : 23rd International Congress on Modelling and Simulation - Supporting Evidence-Based Decision Making: The Role of Modelling and Simulation, MODSIM 2019. In: The 23rd International Congress on Modelling and Simulation (MODSIM2019), 2019-12-01 - 2019-12-06, National Convention Centre.

Full text not available from this repository.

Abstract

The use of machine learning techniques in classification problems has been shown to be useful in many applications. In particular, they have become increasingly popular in land cover mapping applications in the last decade. These maps often play an important role in environmental science applications as they can act as inputs within wider modelling chains and in estimating how the overall prevalence of particular land cover types may be changing. As with any model, land cover maps built using machine learning techniques are likely to contain misclassifications and hence create a degree of uncertainty in the results derived from them. In order for policy makers, stakeholder and other users to have trust in such results, such uncertainty must be accounted for in a quantifiable and reliable manner. This is true even for highly accurate classifiers. However, the black-box nature of many machine learning techniques makes common forms of uncertainty quantitation traditionally seen in process modelling almost impossible to apply in practice. Hence, one must often rely on independent test samples for uncertainty quantification when using machine learning techniques, as these do not rely on any assumptions for the how a classifier is built. The issue with test samples though is that they can be expensive to obtain, even in situations where large data sets for building the classifier are relatively cheap. This is because tests samples are subject to much stricter criteria on how they are collected as they rely on formalised statistical inference methods to quantify uncertainty. In comparison, the goal of a classifier is to create a series of rules that is able to separate classes well. Hence, there is much more flexibility in how we may collect samples for the purpose of training classifiers. This means that in practice, one must collect test samples of sufficient size so that uncertainties can be reduced to satisfactory levels without relying overly large (and therefore expensive) sample sizes. However, the task of determining a sufficient sample sizes is made more complex as one also need account for stratified sampling, the sensitivity of results as unknown quantities vary and the stochastic variation of results that result from sampling. In this paper, we demonstrate how a Bayesian approach to uncertainty quantification in these scenarios can handle such complexities when predicting the likely impacts that further sampling strategies will have on uncertainty. This in turn allows for a more sophisticated from of analysis when considering the trade-off between reducing uncertainty and the resources needed for larger test samples. The methods described in this paper are demonstrated in the context of an urban mapping problem. Here we predict the effectiveness of distributing an additional test sample across different areas based on the results of an initial test sample. In this example, we explore the standard frequentist methods and the proposed Bayesian approach under this task. With the frequentist approach, our predictions rely on assuming fixed points for unknown parameters, which can lead to significantly different results and no formalised way to distinguish between them. In contrast, a Bayesian approach enables us to combine these different results with formalised probability theory. The major advantage of this from a practical perspective is that this allows users to predict the effect of an additional test sample with only a single distribution whilst still accounting for multiple sources of uncertainty. This is a fundamental first step when quantifying uncertainty for population level estimates and opens up promising future work in for the prorogation of uncertainty in more complex model chains and optimising the distribution of test samples. Copyright © 2019 The Modelling and Simulation Society of Australia and New Zealand Inc. All rights reserved.

Item Type:
Contribution to Conference (Paper)
Journal or Publication Title:
The 23rd International Congress on Modelling and Simulation (MODSIM2019) : National Convention Centre in Canberra
Additional Information:
Conference code: 160153 Export Date: 7 July 2020 Correspondence Address: Phillipson, J.; School of Computing and Communications, Lancaster UniversityUnited Kingdom; email: j.phillipson@lancaster.ac.uk References: Asner, G.P., Levick, S.R., Kennedy-Bowdoin, T., Knapp, D.E., Emerson, R., Jacobson, J., Colgan, M.S., Martin, R.E., Large-scale impacts of herbivores on the structural diversity of african savannas (2009) Proceedings of the National Academy of Sciences.; Avitabile, V., Schultz, M., Herold, N., Bruin, S.D., Pratihast, A.K., Manh, C.P., Quang, H.V., Herold, M., (2016) Carbon Emissions from Land Cover Change in Central Vietnam., , Carbon Management; Berger, J.O., Bernardo, J.M., Sun, D., Overall objective priors (2015) Bayesian Analysis.; Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., Smote: Synthetic minority over-sampling technique (2002) Journal of Artificial Intelligence Research.; Clopper, C.J., Pearson, E.S., The use of confidence or fiducial limits illustrated in the case of the binomial (1934) Biometrika, 26 (4), pp. 404-413; Frigyik, B., Kapila, A., Gupta, M.R., (2010) Introduction to the Dirichlet Distribution and Related Processes., , Electrical Engineering; Hansen, M.C., Potapov, P.V., Moore, R., Hancher, M., Turubanova, S.A., Tyukavina, A., Thau, D., High-resolution global maps of 21st-century forest cover change (2013) Science; Jordan, M.I., Mitchell, T.M., Machine learning: Trends, perspectives, and prospects (2015) Science; Kotsiantis, S.B., (2007) Supervised Machine Learning: A Review of Classification Techniques, , Informatica (Ljubljana; Mendenhall, C.D., Sekercioglu, C.H., Brenes, F.O., Ehrlich, P.R., Daily, G.C., Predictive model for sustaining biodiversity in tropical countryside (2011) Proceedings of the National Academy of Sciences.; Olofsson, P., Kuemmerle, T., Griffiths, P., Knorn, J., Baccini, A., Gancz, V., Blujdea, V., Woodcock, C.E., Carbon implications of forest restitution in post-socialist Romania. (2011) Environmental Research Letters, 6 (4); Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E., Wulder, M.A., Good practices for estimating area and assessing accuracy of land change (2014) Remote Sensing of Environment, 148, pp. 42-57; Pan, S.J., Yang, Q., A survey on transfer learning (2010) Ieee Transactions on Knowledge and Data Engineering; Raut, P.P., Borkar, N.R., Kamlatai, S., Machine learning algorithms: Trends, perspectives and prospects (2017) International Journal of Engineering Science and Computing, , Me Student, Assistant Professsor, and; Ravenzwaaij, D.V., Cassey, P., Brown, S.D., A simple introduction to markov chain monte-carlo sampling (2018) Psychonomic Bulletin and Review; Shi, W., Liu, J., Du, Z., Stein, A., Yue, T., (2011) Surface Modelling of Soil Properties Based on Land Use Information, , Geoderma; Townshend, J.R., Masek, J.G., Huang, C., Vermote, E.F., Gao, F., Channan, S., Sexton, J.O., Global characterization and monitoring of forest cover using landsat data: Opportunities and challenges (2012) International Journal of Digital Earth..; Wagner, J.E., Stehman, S.V., Optimizing sample size allocation to strata for estimating area and map accuracy (2015) Remote Sensing of Environment
Subjects:
?? bayesianland cover mappingsampling strategiesuncertainty quantificationbayesian networksclassification (of information)decision makingeconomic and social effectsforecastinglearning algorithmsmachine learningmappingsensitivity analysisstatistical testsstoc ??
ID Code:
145438
Deposited By:
Deposited On:
10 Jun 2021 15:55
Refereed?:
Yes
Published?:
Published
Last Modified:
15 Jul 2024 08:43