Detecting food safety risks and human trafficking using interpretable machine learning methods
Summary
Black box machine learning methods have allowed researchers to design accurate models using large amounts of data at the cost of interpretability. Model interpretability not only improves user buy-in, but in many cases provides users with important information. Especially in the case of the classification problems addressed in this thesis, the ideal model should not only provide accurate predictions, but should also inform users of how features affect the results. My research goal is to solve real-world problems and compare how different classification models affect the outcomes and interpretability. To this end, this thesis is divided into two parts: food safety risk analysis and human trafficking detection. The first half analyzes the characteristics of supermarket suppliers in China that indicate a high risk of food safety violations. Contrary to expectations, supply chain dispersion, internal inspections, and quality certification systems are not found to be predictive of food safety risk in our data. The second half focuses on identifying human trafficking, specifically sex trafficking, advertisements hidden amongst online classified escort service advertisements. We propose a novel but interpretable keyword detection and modeling pipeline that is more accurate and actionable than current neural network approaches. The algorithms and applications presented in this thesis succeed in providing users with not just classifications but also the characteristics that indicate food safety risk and human trafficking ads.