Microsoft has launched the world’s smallest AI model, Phi-3 Mini, which is the first of three compact models that the company intends to release soon.
With 3.8 billion parameters, Phi-3 Mini has been trained on a relatively smaller dataset compared to larger language models such as GPT-4.
It is currently accessible on Azure, Hugging Face, and Ollama. Microsoft’s future plans include the release of Phi-3 Small (7B parameters) and Phi-3 Medium (14B parameters).
In December, the company launched Phi-2, which performed equally well as larger models such as Llama 2.
Microsoft claims that Phi-3 outperforms its predecessor and can generate responses similar to a model ten times its size.
Eric Boyd, corporate vice president of Microsoft Azure AI Platform, informed reporters that Phi-3 Mini is as capable as GPT-3.5, but in a smaller size.
Smaller AI models are often more cost-effective to operate and perform better on personal devices like phones and laptops compared to their larger counterparts.
Earlier this year, The Information reported that Microsoft was forming a team dedicated to developing lighter-weight AI models. In addition to Phi, the company has also developed Orca-Math, a model designed for solving math problems.
Many of Microsoft’s rivals also possess their own compact AI models, primarily designed for tasks such as document summarization or coding aid.
Google’s Gemma 2B and 7B are suitable for basic chatbots and language-related tasks. Anthropic’s Claude 3 Haiku excels at swiftly reading dense research papers with graphs and providing summaries.
Meanwhile, Meta’s recently unveiled Llama 3 8B can be utilized for creating chatbots and offering coding assistance.
Boyd explains that developers used a “curriculum” to train Phi-3, drawing inspiration from how children learn from simpler bedtime stories and books with basic vocabulary and sentence structures that cover broader topics.
Due to a shortage of children’s books, they compiled a list of over 3,000 words and tasked an LLM with creating “children’s books” to educate Phi.
He mentioned that Phi-3 built upon the knowledge gained by previous versions, with Phi-1 focusing on coding, Phi-2 learning to reason, and Phi-3 excelling in both coding and reasoning.
While the Phi-3 models have some general knowledge, they cannot match the breadth of information provided by a GPT-4 or another LLM trained on the entire internet.
There is a significant disparity in the type of responses obtained from a smaller model like Phi-3 compared to a LLM trained on the vast expanse of the internet.
Boyd explains that many companies discover that smaller models such as Phi-3 are more effective for their specific applications, as their internal data sets are typically on the smaller side.
Additionally, these models require less computing power, making them a more cost-effective option for many companies.
Related Stories: