Request for comments: improving our referral system

k00b

First, I'd keep that part of the code closed-source.  Treat it like a black box, an extension, to the open source parts. The goal for the extension is fast inference, using traffic metadata to predict, probability-of-bot.

Second, I'd run a few models in parallel, all trying to compete for bot detection.

Third, I'd explore joining data sets, to improve the models.

Tech wise - its a mix of python, kubeflow, maybe snowflake.

Data wise - its probably a mix of vendor + quality of traffic + a training set comprised of human curated real-user activity.