pull down to refresh
0 sats \ 3 replies \ @optimism 14 Jul \ parent \ on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs AI
Ah! So then by introducing inconsistency into the model through refinement, the end result becomes systemically misaligned. Which is kind of like operant conditioning in behavioral psychology?
Scary analogy to Jason-Bourne-style conditioning, actually.
I have seen other things saying that spot training was like "labodimizing" the model and that you sacrifice competency in a specialized task for a loss in general performance. So it may be that those tasks representations where somehow correlated with each other so when you mess with one you hurt the other. You could optimize it to keep everything the same but then you need to train more and specify all the constraints so its not practical
reply
reply
I've said "we are the poor bastards who are forced to live through the learning process" before :)
reply