IBM Announces Granite 4.0: Hybrid Mamba-Transformer Model Reduces AI Infrastructure Costs
Hi, I'm Jon. The world of AI and technology is evolving every day, and one thing I'd like to highlight today is "Granite 4.0," announced by IBM. This is the latest version of the large-scale language model (LLM) that forms the foundation of AI, and it uses a hybrid Mamba-Transformer architecture. This technology is said to significantly reduce the infrastructure costs required for companies to operate AI. The announcement is expected to be made around October 2, 2025, and has been reported in media outlets such as InfoWorld. For those who are interested in AI but don't understand the technical terminology, I'll explain it in an easy-to-understand manner.
Recommended for those who want to start automating with no coding!
With Make.com (formerly Integromat)...
📌 Integrate major tools like email, Slack, Google Sheets, and Notion all at once
📌 Automate complex tasks with just drag and drop
📌 A free plan is also available, so you can try it out for yourself.
If you're interested, here's the details:
What is Make.com (formerly Integromat)? How to Use It, Pricing, Reviews, and Latest Information [2025 Edition]
What is Granite 4.0? Basic Overview and Background
Granite 4.0 is the latest generation of the open-source AI model family developed by IBM. AI models are like the "brain" of a computer, performing tasks such as text generation and question answering, and the Granite series is designed to be highly reliable for enterprise use. The 4.0 version was officially released by IBM on October 2, 2025, and was quickly picked up by specialized media outlets such as VentureBeat and MarkTechPost. This has drawn attention to innovative approaches to improving AI efficiency and reducing costs.
The background to this is that traditional AI models (especially those based on Transformer) require large amounts of memory and expensive GPUs, which tend to increase operational costs for companies. IBM has therefore developed a hybrid model that combines a new architecture called Mamba with Transformer. Mamba operates efficiently in tasks that handle long contexts, and by combining it with the accuracy of Transformer, it is possible to conserve resources while maintaining performance. This announcement came at a time when progress on Meta's Llama series had been stagnating, leading to comments on X (formerly Twitter) that "IBM has returned to being a leader in open source AI."
A brief description of the hybrid Mamba-Transformer architecture
Let's clarify some terminology. Transformer is a structure commonly used in AI models that precisely analyzes relationships between text. Mamba, on the other hand, is a technology that has recently gained attention. It uses less memory than Transformer and can process long texts quickly. Granite 4.0 "hybridizes" these two, resulting in a 70% or greater reduction in memory usage and improved inference speed (the speed at which the AI can produce answers). For example, in internal benchmarks, the previous Granite 3.3 8B model required 90GB of memory, while the Tiny version of Granite 4.0 requires only 15GB.
- Model size variations: Available in sizes ranging from 1B (10 billion parameters) to 9B (90 billion parameters), you can choose the model that best suits your needs. The smaller models are designed for edge devices (smartphones and small devices) and prioritize speed.
- Main uses: Strong at long tasks (e.g., analyzing large amounts of documents), using corporate tools, and following instructions.
- Reliability: It complies with the international standard ISO 42001 and is certified as a trustworthy AI.
This technology is a great advantage for companies that use AI on a daily basis. For example, it can be used as a tool to instantly create documents using AI.AI tools like GammaIt allows you to easily create slides and websites based on templates like Granite, so it's recommended for beginners.
Key Benefits of Granite 4.0: Cost Reduction and Performance Improvement
The biggest selling point of Granite 4.0 is its reduction in AI infrastructure costs. IBM announced that the model significantly reduces memory usage and requires fewer GPUs, helping companies save on hardware investments. MarkTechPost reports highlighting a >70% memory reduction and improved throughput for long-text processing. Additionally, improved training and post-training methods and refined datasets have also improved accuracy.
Here are some specific benefits:
- memory efficiency: Can operate with 1/6 the memory of previous models. For example, the Micro version with 3B parameters and the Tiny version with 7B are suitable for low latency (fast response) tasks.
- Ease of deployment: Immediately available on platforms such as Hugging Face, Docker Hub, Ollama, and NVIDIA NIM, which companies can use to integrate into their own systems.
- The benefits of open sourceIt's free to download and easy to customize. IBM's blog states that BF16 checkpoints and GGUF conversions simplify local evaluation.
- Business Trust: Signed artifacts (files with certificates) support compliance.
These features were also confirmed in articles published by SiliconANGLE and Analytics India Magazine on October 3-4, 2025, which noted that the hybrid architecture "lowers memory and hardware costs." IBM's official account also touted Granite 4.0 in a post on October 2, 2025, stating that "Granite 4.0 requires minimal resources without sacrificing performance," which has been viewed more than 110,000 times.
Practical Applications and Future Impact
Granite 4.0 is likely to be useful in the field of enterprise AI. For example, it is effective for long-form tasks that handle large amounts of data in infrastructure in the medical and transportation sectors. Increased edge deployment (executing AI on the device side) will reduce cloud dependency and lead to further cost reductions. A VentureBeat article calls this the "Western Qwen model," analysing IBM's rise after the disappointment of Meta's Llama 4.
However, please note that not all models are perfect, and the latency-focused Tiny/Micro versions sacrifice some accuracy. IBM will continue to work on supporting tools such as vLLM and llama.cpp, so be sure to check for updates.
Summary: Granite 4.0 and the Future of AI
IBM's Granite 4.0 revolutionizes AI efficiency with hybrid technology, paving the way for companies to easily adopt high-performance AI. Memory reduction and speed improvements expand opportunities for those who were previously hindered by cost. If you're interested, check out the official documentation first.
If you want to streamline your documentation with AI, we also recommend these articles:What is Gamma? A new standard for instant document, slideshow, and website creation using AI
To sum up, advancements like Granite 4.0 will make AI more accessible. Even beginners can have fun starting with tools that utilize these models. However, technology changes every day, so be sure to consult reliable sources.
Reference sources
- InfoWorld: IBM launches Granite 4.0 to cut AI infrastructure costs with hybrid Mamba-transformer models (Released September 2025, 10)
- Official IBM statement: IBM Granite 4.0: Hyper-efficient, High Performance Hybrid Models for Enterprise (December 2025, 10)
- VentureBeat: 'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transformer architecture (December 2025, 10)
- MarkTechPost: IBM Released new Granite 4.0 Models with a Novel Hybrid Mamba-2/Transformer Architecture (December 2025, 10)
- SiliconANGLE: IBM releases Granite 4 series of Mamba-Transformer language models (December 2025, 10)
- Analytics India Magazine: IBM Launches Granite 4.0 Hybrid AI Models With Lower Memory and Hardware Costs (Around September 3, 2025)
- Official IBM post on X (formerly Twitter) (October 2, 2025)
