Berkeley’s researchers have unveiled 7B Open LLaMA model. This model is an open source alternative to Meta’s LLaMA language model. The researchers trained the model on the RedPajama dataset, consisting of 200 billion tokens.
The model’s weights are available in both PyTorch and Jax. With this release, all non-commercial models developed from LLaMA can be re-trained under a permissive license.
The RedPajama dataset, used for Open LLAMA‘s training, comprises a staggering 1.2 trillion tokens, equal to LLaMA’s dataset. The researchers used a cloud TPUv4 pod with data parallelism and FSDP or Zero3 to balance memory usage and throughput while training OpenLLAMA. Their training run produced over 1900 tokens/second/TPUv4 chip throughput.
The EleutherAI’s lm-evaluation-harness was used to evaluate the performance of Open LLAMA. The results showed that OpenLLAMA performs similarly to LLaMA and GPT-J in most tasks and even surpasses them in some.
The team predicts that OpenLLAMA’s performance will improve significantly after its training on 1 trillion tokens. The authors are currently working on evaluations to confirm that OpenLLAMA is on par with, or even better than, the original model in most cases.
Furthermore, the team is actively working on evaluations and training a 3B model. This upcoming model will be released shortly after completion.
Due to industrial licenses binding Meta’s LLaMA, directly distributing LLaMa-based models was impossible, but this is no longer the case. There have been numerous attempts to open-source these models. Open LlaMA is not the first of its kind in this domain.
Hugging Face, an open-source AI platform released HuggingChat, an open-source alternative to ChatGPT, less than two weeks ago. The chatbot uses OpenAssistant’s latest LLaMA-based model, which provides XOR weights for the OA models.
Moreover, Databricks found a way around this with Dolly 2.0. Unlike other “open source” models, Dolly 2.0 is available for commercial purposes without paying for API access or sharing data with third parties. This sets it apart from the rest.
Check out: GitHub Repo
Related Stories: