I realized that while Microsoft would probably release their LLaMA-13b based model (as of the time of this writing they still haven’t) I concluded that they might not release the dataset. Therefore, I resolved to replicate their efforts, download the data myself, and train the model myself, so that OpenOrca can be released on other sizes of LLaMA as well as other foundational models such as Falcon, OpenLLaMA, RedPajama, MPT, RWKV.
I hope this is okay: I made a backup of the blog post and saved it to my website/file hosting site. here is the backup.
I’ll remove/blank out this comment when/if I see the page come back online.
EDIT: Okay, so it looks like the OpenOrca project on Eric Hartford’s website has been rebranded as Dolphin. My understanding is that someone else is working on an OpenOrca, prompting the rebranding.
Was the dataset released?
It looks like it! :) https://huggingface.co/datasets/Open-Orca/OpenOrca
He re released the one he took down under a different name.