First, I'd keep that part of the code closed-source. Treat it like a black box, an extension, to the open source parts. The goal for the extension is fast inference, using traffic metadata to predict, probability-of-bot.
Second, I'd run a few models in parallel, all trying to compete for bot detection.
Third, I'd explore joining data sets, to improve the models.
Tech wise - its a mix of python, kubeflow, maybe snowflake.
Data wise - its probably a mix of vendor + quality of traffic + a training set comprised of human curated real-user activity.