I realized that while Microsoft would probably release their LLaMA-13b based model (as of the time of this writing they still haven’t) I concluded that they might not release the dataset. Therefore, I resolved to replicate their efforts, download the data myself, and train the model myself, so that OpenOrca can be released on other sizes of LLaMA as well as other foundational models such as Falcon, OpenLLaMA, RedPajama, MPT, RWKV.
Was the dataset released?
It looks like it! :) https://huggingface.co/datasets/Open-Orca/OpenOrca
He re released the one he took down under a different name.
https://erichartford.com/dolphin