Similar to what I posted here: #1283953, but instead of using embeddings as my input features I'm using bag of words (TF-IDF to be precise).
Bag of words is definitely worse, because I can't capture semantic nuances like tone or genre of writing. Also a lot worse at AI detection. A raw application of the bag-of-words model without any human filter results in a lot of AI-generated posts looking like good posts.
The score is simply log(predicted zaps) - log(actual zaps)
Similar to what I posted here: #1283953, but instead of using embeddings as my input features I'm using bag of words (
TF-IDFto be precise).Bag of words is definitely worse, because I can't capture semantic nuances like tone or genre of writing. Also a lot worse at AI detection. A raw application of the bag-of-words model without any human filter results in a lot of AI-generated posts looking like good posts.
The score is simply
log(predicted zaps) - log(actual zaps)