Using Machine Learning Classification to Detect Simulated Increases of de Facto Reuse and Urban Stormwater Surges in Surface Water
Authors: Thompson, K., and Dickenson, E.
Water Research
Authors: Thompson, K., and Dickenson, E.
Water Research
Water quality events such as increases in stormwater or wastewater effluent in drinking water sources pose hazards to drinking water consumers. Stormwater and wastewater effluent enter Lake Mead—an important drinking water source in the southwest USA—via the Las Vegas Wash. Previous studies have applied machine learning and online instruments to detect contamination in water distribution systems. However, alert systems at drinking water intakes would provide more time for corrective action. An array of online instruments measuring pH, conductivity, redox potential, turbidity, temperature, tryptophan-like fluorescence, UV absorbance (UVA254), TOC, and chlorophyll-a was fed raw water directly from Lake Mead. Wastewater effluent, dry weather Las Vegas Wash, and storm-impacted Las Vegas Wash samples were blended into the instrument inlets at known ratios to simulate three types of adverse water quality events. Data preprocessing was conducted to correct for diurnal patterns or instrument drift. Supervised machine learning was conducted using previously published models in R. Ninety-nine models were screened on the raw data. Eight high-performing models were evaluated in-depth and optimized. Weighted k-Nearest Neighbors, Single C5.0 Ruleset, Mixture Discriminant Analysis, and an ensemble of these three models had accuracy over 97% when assigning test set data among three classes (Normal, Event, or Maintenance). The ensemble detected all event types at the earliest timepoint and had one false positive that was not a lag error (i.e., consecutively following a true positive). Omitting Maintenance, the Adaboost model had over 99% test set accuracy and zero false positives that were not lag errors. Data preprocessing was beneficial, but the optimal methods were model-specific. All nine water quality variables were useful for most models, but UVA254 and turbidity were most important.
Citations
Thompson, K., and Dickenson, E. “Using Machine Learning Classification to Detect Simulated Increases of de Facto Reuse and Urban Stormwater Surges in Surface Water.” Water Research. 204:117556, October 1, 2021.