They’re capitalizing on all the free labor being done on local models.
That wouldn’t even bother me so much if they weren’t trying to lock people into the “llama ecosystem” by restricting its use to only improving other llama models.
I’m sorry for repeating myself. But didn’t Meta just stop disclosing the exact training dataset? Presumably because they’re using copyrighted data from the internet? Isn’t that hypocritical? IMHO we need laws and/or companies need to stop disregarding copyright when training their own models and then claiming copyright once other people start doing the same thing.
Personally I don’t think copyright holders really have a leg to stand on as far as that goes. Simply having and using a copyrighted work isn’t a violation, and the work that is produced in the form of a trained neural network is the very definition of transformative. I also think Meta would have the same issue with trying to use a copyright claim for someone using their llama output to improve other non-llama models. That’s why they had to slip it into a terms of service.
I guess what you might see going forward is every book that’s published comes with a user agreement you agree to by opening the book… But that doesn’t sound practical in any sense.
What’s crazy to me is they are using tons of copyrighted data to train but put in a statement saying you can’t use LLama outputs to train other models.