reply on: YouTube Allowing Creators to Opt-In To AI Training on Their Content \ stacker news ~tech

pull down to refresh

35 sats \ 1 reply \ @BlackDog 18 Dec 2024 \ on: YouTube Allowing Creators to Opt-In To AI Training on Their Content tech

From the AI perspective.. Many AI research centers (big tech, small tech, universities, ..) are downloading YT for training data. There are plenty of public datasets based on the YT. Especially if you are a celebrity, you are in such dataset. There is push for model training data transparency (especially from over regulated EU). Meaning any model should clearly state, what was the training data (meaning all particular videos). Next GDP (personal data protection) comes to play. You as the person in the content may share some personal data in your videos. It is not only your credit card number, address, phone, political preferences ... :) but it is also your face and voice (biometric information). And theoretically you may ask the author of the model to remove you from the training data (this is your GDP right) -- if the model is transparent you know it. In ideal world, they should do it. GDP is a bit complex, so there are ways you lose the right to so, but in very specific conditions. Finally it also depends on the type of the model. If it is classification model (speech recognition, object detection in videos), you (as training data sample) are on safe side. It will just make subtitles of your voice bit more precise (you may want). But what about model for biometry ID? If you became a person of interest for someone, model pretrained on tones of your data from YT will work a bit better (they will need your data to train the final detector of your person anyway, but why make it easier for them giving your data also for pretraining). Finally, the generative models. Do you want AI to generate faces and voices similar to yours?

Regarding the opt-it now. I expect you are giving your consent to Google. So they can comply with EC regulations (I'm not sure how it works in US). They may sell the data to others (I'm not sure if it is align with YT license) -- I mean they sell the transparent data which has value! I'm not sure how this transfers into open data and open models. There may be a regulation "war" in future. EC can ban US models due to non-transparency and eventually fine big tech for it. We saw/see EC vs MS, Apple, ... (as EU cannot win the tech race due to stupid regulations, they will profit using fines on noncompliance with the stupid regulations),

I guess that in future, the opt-in will expand into types of models you agree to be trained on you and also to whom you allow to do so (Google, big tech, for profit, non profit, ...). And also they will start to pay you. Ideally, when I train a model, I'll pay a fee to all the "training data".

So.. not to opt-in now will make pressure to be paid in the future. As the data is oil of the todays. Why to give it to Google for free and let them earn even more on it now.

(Sorry for longer post. I'm in the field and this is complex topic).

0 sats \ 0 replies \ @kr OP 18 Dec 2024 freebie

Appreciate the thoughtful response, I learned a lot!

So.. not to opt-in now will make pressure to be paid in the future. As the data is oil of the todays. Why to give it to Google for free and let them earn even more on it now.

Yeah I think this will probably be my approach here