To my understanding it is very crucial to get even more data to develop LLMs further (although the openly accessible internet is so big, the models need even more data). I see the Reddit deal as problematic as this opens opportunities to manipulate LLMs by creating fake Reddit posts etc.