Nvidia has unveiled its robust open-source AI model, which has the potential to surpass OpenAI’s GPT-4. The new NVLM 1.0 series of multimodal large language models (LLMs) features its flagship model, NVLM-D-72B, boasting approximately 72 billion parameters.
Nvidia’s research team reports that the latest AI model demonstrates exceptional capabilities in vision-language tasks while also enhancing text-only performance compared to their existing LLM foundations.
In their published paper, the researchers explain, “We present NVLM 1.0, a series of advanced multimodal large language models that deliver top-tier results on vision-language tasks, competing with both leading proprietary models (like GPT-4o) and open-access alternatives.”
In contrast to several proprietary models that experience a notable drop in text performance as time progresses, the NVLM-D-72B has reportedly improved its accuracy by an average of 4.3 points on important text benchmarks.
This large language model can also analyze charts and tables, interpret images, understand memes, code software, and tackle mathematical problems. The model weights can be accessed publicly on Hugging Face, and Nvidia has announced plans to release the training code in the future.
Researchers in the AI community on X have described the launch as “wild,” commending the model’s proficiency in processing visual data.
One user exclaimed, “Incredible! Nvidia has released a 72B model that competes with the 405B Llama 3.1 in math and coding evaluations, and it also incorporates vision capabilities!”
Despite this, Nvidia has reportedly leveraged open-source resources to create NVLM 1.0, drawing knowledge from various AI models and diverse training datasets.
However, the licensing terms for the NVLM-D-72B model impose restrictions, prohibiting its use for commercial endeavors or modifications for resale.
Essentially, Nvidia offers this model solely for research purposes and for enthusiasts keen to explore the capabilities of their advanced graphics cards.
The researchers deliberately chose the term “open,” but while Nvidia’s findings are valuable, the limitations on commercial use mean the model cannot be classified as genuinely open-source, which would entail unrestricted use, modification, and distribution.
Related AI News You May Like