So then the premise that I'm reading here is that collecting anonymous data is fine, but collecting person information is bad because the data can get leaked.
Is it an admission that Apple doesn't collect personal information but happily collects and sells anonymous data?
Anonymous data is easy to deanonymize and/or use maliciously in the way I mentioned above. I spoke to a data scientist a few years ago and he was explaining how a company like Facebook, for example, can build an anonymous profile about you using anonymous data and even with GDPR and European privacy laws, this data is not considered personal information. Then if you delete your account and recreate another, they will very quickly be able to identify your behaviour as a match for that anonymous model and you'll wonder why Facebook seems to know you so well all already.
It's one thing to talk to a friend or random person on the street and tell them you've just come from X shopping center, it's another to tell an agent that tracks people for a living and fishes for information in order to make money from it -- even if you don't know the person and they ask no personal questions. I can imagine people doing it on the street if there is an incentive like cupcakes.
Online, we don't get to know how our data is used, but worse still, we don't get to see what data is being shared.
I guess we still aren't ready to have full disclosure and full transparency.
It really depends on which data and which applications from that company you are talking about. In general I don't believe any of those two companies actually resell data further (after the Facebook Cambridge Analytica story). There are also many solutions that prevent de-anonymization.
For example for machine learning on device both companies have "federated model" (or "federated learning"), where there is some machine learning happening on device in "small" model and then it has ability to send information back to servers to update the large models too. ("model" is the data structure/database in machine learning that holds the learnt information).
The trick is that before sending to server the data is multiplied by a completely random number generated on the device. So on server you receive a random garbage. But the trick is that if many devices send the information on the server, then across all of those the server can make statistical analysis and make assumptions and update the "large" model.
This type of data is not possible to de-anonymize, but again as I said above, this is specific to the new ways companies do machine learning. There are likely some outdated approaches in place that would technically allow de-anonymization.
reply