, but instead of using embeddings as my input features I'm using bag of words (

Bag of words is definitely worse, because I can't capture semantic nuances like tone or genre of writing.  Also a lot worse at AI detection.  A raw application of the bag-of-words model without any human filter results in a lot of AI-generated posts looking like good posts.

Stacker News Monthly: December 2025

SimpleStacker

Similar to what I posted here: https://stacker.news/items/1283953, but instead of using embeddings as my input features I'm using bag of words (`TF-IDF` to be precise).

Bag of words is definitely worse, because I can't capture semantic nuances like tone or genre of writing.  Also a lot worse at AI detection.  A raw application of the bag-of-words model without any human filter results in a lot of AI-generated posts looking like good posts.

The score is simply `log(predicted zaps) - log(actual zaps)`