Google on Wednesday spiced up the generative AI race by introducing Gemini, its most capable and general model yet with state-of-the-art performance across many leading benchmarks, in three iterations.

The first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro, and Nano.

Google’s AI chatbot Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding, and more.

It will be available in English in more than 170 countries and territories, and “we plan to expand to different modalities and support new languages and locations in the near future,” said Google.

The company also brought Gemini to Pixel 8 Pro, powering new features like Summarise in the Recorder app and rolling out Smart Reply in Gboard, starting with WhatsApp, with more messaging apps coming next year.

In the coming months, Gemini will be available in more Google products and services like Search, Ads, Chrome, and Duet AI.

“These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year,” said Alphabet and Google CEO Sundar Pichai,.

Gemini is the result of large-scale collaborative efforts by teams across Google, including colleagues at Google Research.

“It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” said Demis Hassabis, CEO and Co-Founder of Google DeepMind.

READ
Understanding Spear Phishing: The Personalized Cyber Threat
Buy Me A Coffee

While Gemini Ultra is the largest and most capable model for highly complex tasks, Gemini Pro is the model for scaling across a wide range of tasks and Gemini Nano is for on-device tasks.

“With a score of 90 percent, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities,” said Google.

From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

According to Google, Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information.

“Our first version of Gemini can understand, explain and generate high-quality code in the world’s most popular programming languages, like Python, Java, C++, and Go,” said the company.

Gemini can also be used as the engine for more advanced coding systems.

The company trained Gemini 1.0 at scale on its AI-optimised infrastructure using Google’s in-house designed Tensor Processing Units (TPUs) v4 and v5e.

“Today, we’re announcing the most powerful, efficient and scalable TPU system to date, Cloud TPU v5p, designed for training cutting-edge AI models,” said Google.