The Emergence of a New Brain to Power AI
We Know CPUs and GPUs, but Who is NPU?
When we talk about the ‘brain’ of a computer, we often think of the CPU (Central Processing Unit). We also know that GPU (Graphics Processing Unit) is essential for processing stunning graphics. However, with the advent of the AI era, the term ‘NPU’ (Neural Processing Unit) has been increasingly heard. What exactly is this new processing unit that differs from CPU and GPU?
NPU stands for ‘Neural Processing Unit’, and as the name suggests, it is a dedicated hardware designed to execute artificial intelligence (AI) and deep learning algorithms. To understand its role, imagine an office team:
- CPU (Central Processing Unit): The capable ’team leader’ of this team. It excels at processing complex and diverse instructions in order. It plays a key command role, managing the computer’s operating system and prioritizing multiple tasks. However, it struggles when asked to handle thousands or tens of thousands of simple repetitive tasks simultaneously.
- GPU (Graphics Processing Unit): Originally hired as a ’large-scale task team’ for graphics processing. It is specialized in processing simple calculations simultaneously in parallel with thousands of cores. This made it perfect for graphic tasks that require calculating each pixel. However, it was discovered that this parallel processing capability is similar to the large matrix multiplication operations required by AI, leading to its recruitment as a key personnel for AI computations.
- NPU (Neural Processing Unit): The newly joined ‘AI specialist analyst’ in this team. It does not engage in other general tasks. It is designed to focus all its capabilities on one mission: the neural network calculations of AI. Born for a specific purpose, it processes tasks related to AI much faster and more efficiently than the team leader (CPU) or the large task team (GPU).
Why Existing Brains Were Insufficient
So why couldn’t we be satisfied with just the powerful task team of GPU? The arrival of the AI era fundamentally changed the types and amounts of data we need to process. While past data was primarily text-based, now images and video data are flooding in. There has been a surge in AI applications that need to analyze and make judgments on this massive data in real-time.
The core of deep learning algorithms is to perform numerous operations like matrix multiplication and convolution simultaneously and repeatedly. Thanks to its parallel processing capabilities, GPU performed this task much better than CPU, but it had fundamental limitations. Since GPU was originally designed as a general-purpose chip for graphics processing, it was not the optimal structure for AI computations. This led to two significant problems.
First, there is the enormous power consumption. Data centers that train and operate AI models use thousands or tens of thousands of GPUs, leading to unimaginable power consumption and heat generation. It earned the nickname ‘electricity-eating hippo’. This resulted in skyrocketing operational costs for data centers. Second, it was difficult to directly equip small battery-powered devices like smartphones or IoT devices with GPUs due to high power consumption and heat issues.
Ultimately, what the AI era demands is not just fast computation speeds. The key challenge has become ‘performance per watt (energy efficiency)’, which processes more computations with less energy. This is not just a matter of simple technical performance improvement; it directly relates to the sustainability and economic viability of AI technology. To extend AI from the privilege of data centers to our handheld devices, a new approach was needed.
The Birth of Brain-like Processors
The answer to these epochal demands is NPU. The most significant feature of NPU is that it mimics the way the human brain operates, that is, it mimics the neural network structure in hardware. The human brain processes information in parallel through countless nerve cells (neurons) and their connections (synapses). NPU is designed to perform tasks central to AI computations, such as matrix multiplication, simultaneously across numerous small processing units.
Thanks to this ‘brain-like’ structure, NPU boasts much higher efficiency in AI computations than GPU. By shedding unnecessary functions and focusing solely on AI computations, it can achieve much faster speeds while consuming significantly less power for the same tasks.
This characteristic has explosively expanded the application areas of NPU. Real-time face recognition, distinguishing subjects in photos, and converting speech to text are all on-device AI features made possible by NPU. Additionally, in data centers, it is gaining attention as a cost-effective alternative in the ‘inference’ process of generating responses for large language models (LLMs), and in autonomous vehicles, it plays a crucial brain role in perceiving and judging the surrounding environment in real-time.
In conclusion, the emergence of NPU signifies a paradigm shift in AI hardware. It marks a transition from an era of ‘as fast as possible’ to one of ‘how efficiently fast’, which is a key driver enabling AI technology to permeate all devices in our daily lives beyond data centers.
The Prelude to the AI Semiconductor War: Global Tech Giants’ NPU Strategies
The competition among global tech giants to seize the massive AI market has now escalated into a ‘chip war’ centered around hardware, particularly NPU. This war is not merely a race to create faster chips. Each company is building different strategic territories tailored to their strengths and visions, forming multifaceted fronts. Let’s take a closer look at the movements of the three giants: NVIDIA, Google, and Apple.
The Emperor’s Defense: NVIDIA’s GPU Empire and Software Moat
If we had to name the absolute powerhouse in the AI chip market, it would undoubtedly be NVIDIA. NVIDIA has built a monopolistic empire, dominating 80% to 90% of the data center GPU market used for AI training. With the latest architectures like Blackwell, it boasts overwhelming performance that competitors find hard to match.
However, NVIDIA’s true power is not solely in the hardware itself. What firmly protects their empire is the deep and broad software moat known as ‘CUDA’. CUDA is NVIDIA’s parallel computing platform and programming model introduced in 2007. Developers can leverage CUDA to maximize the performance of NVIDIA GPUs to develop AI models. The vast CUDA ecosystem, accumulated over more than 15 years, includes extensive libraries, optimized tools, and numerous developer communities. This makes it difficult for AI developers to easily switch to chips from other companies, as re-developing all software and code for new hardware requires tremendous costs and time. This is the point where competitors find it hardest to surpass NVIDIA.
NVIDIA is not satisfied with this and is evolving beyond a simple chip manufacturer into a ‘full-stack AI platform company’. They are deploying strategies such as the ‘Omniverse’ platform for training and simulating AI in a virtual world, the ‘NeMo’ framework for developing interactive AI, and even selling entire systems of data center racks equipped with thousands of GPUs. Their vision is now directed towards ‘Physical AI’ and ‘Agentic AI’, which aims to implement AI that interacts with the real world, such as robots and autonomous vehicles, expanding the application range of AI beyond the digital world.
The Challenger’s Surprise Attack: Google’s TPU and Inference Market Strategy
One of the strongest challengers to NVIDIA’s empire is Google. As Google operates its vast AI services such as search, photos, and translation, it has to process astronomical amounts of computations. Relying entirely on NVIDIA GPUs was a significant burden in terms of cost and efficiency. Therefore, Google decided to create its own secret weapon: the ‘Tensor Processing Unit (TPU)’.
TPU was born around 2015, optimized for Google’s AI framework, ‘TensorFlow’. It was also utilized in the ‘inference’ process of AlphaGo, which competed against Lee Sedol 9-dan. The most significant feature of TPU is that it is extremely optimized for the ‘inference’ stage, which derives results using already created models, rather than the ’training’ stage of creating AI models.
If we liken the AI market to a car race, ’training’ is the process of designing and tuning the car for peak performance, while ‘inference’ is the process of continuously racing on the actual track with the completed car. While NVIDIA GPUs show overwhelming performance in the tuning process, Google TPU is like a ‘hyper-optimized rocket’ that delivers the best fuel efficiency and effectiveness during actual driving.
Google TPU has evolved over generations. Recently, it showcased its technological prowess with the unveiling of the 7th generation ‘Ironwood’, which maximizes inference performance, following the 6th generation ‘Trillium’. Ironwood has improved performance per watt (energy efficiency) by twice compared to the previous generation, clearly demonstrating Google’s strategy to strengthen its dominance in the inference market. Instead of directly attacking the ’training’ market dominated by NVIDIA, Google is attempting to change the landscape of the AI chip war through a flanking strategy targeting the inference market, which is expected to grow significantly in the future.
The King of On-Device: Apple’s Neural Engine and Privacy Fortress
Apple’s strategy focuses not on data centers but on devices right in our hands. Their goal is to provide a powerful AI experience without relying on the cloud while thoroughly protecting user privacy. At the core of this strategy is the ‘Apple Neural Engine (ANE)’.
ANE first appeared in the A11 Bionic chip of the iPhone X in 2017. Since then, it has become a core component of the Apple ecosystem, embedded in all iPhones, iPads, and Macs with M-series chips.
The performance advancement of ANE is astonishing. The ANE of the A11 chip performed 600 billion operations per second (0.6 TOPS), but in the M1 chip of 2020, it reached 11 TOPS, in the M3 chip of 2023, it reached 18 TOPS, and in the latest M4 chip of 2024, it boasts a staggering 38 TOPS of processing power. This represents an improvement of over 60 times in just seven years.
The reason Apple is so obsessed with enhancing ANE performance is clear. They aim to achieve the best user experience and privacy simultaneously through ‘on-device AI’, which processes all AI computations internally. Since sensitive personal information is not sent to external servers, it is secure and operates quickly and reliably without network connectivity. The amazing features of Apple devices we use daily are powered by ANE.
- Security and Authentication: Unlocking by recognizing faces in 3D with ‘Face ID’.
- Camera: ‘Smart HDR’ that preserves details of subjects and backgrounds even in backlighting, and ‘Night Mode’ that captures bright and clear photos in dark places.
- Productivity: ‘Live Text’ that recognizes text in photos for copying, and real-time call translation.
- AI Assistant: ‘Siri’ that operates without internet connectivity and the recently announced ‘Apple Intelligence’.
Apple supports developers in easily utilizing the powerful performance of ANE in their apps through the ‘Core ML’ framework, building their own robust on-device AI ecosystem.
Thus, the global AI chip war is unfolding not as a single front but across three major fronts: data center training (NVIDIA), data center inference (Google), and on-device experience (Apple), each developing in different ways. This market differentiation opens significant opportunities for Korean companies as latecomers.
Global AI Chip Giants: Strategy Comparison
| Company | Representative Chip/Architecture | Key Strategic Advantage |
|---|---|---|
| NVIDIA | Blackwell GPU | CUDA Software Ecosystem & Full-stack Platform |
| Ironwood TPU | Extreme Efficiency for its services (Search, Gemini), Google Cloud Integration | |
| Apple | M4 Neural Engine | Hardware/Software Integration, Privacy, Power Efficiency |
The greatest threat to NVIDIA’s stronghold ironically comes not from better chips but from their largest customers venturing into chip development themselves. Hyperscalers like Google, Amazon, Microsoft, and Meta are reducing their reliance on expensive NVIDIA GPUs and developing custom NPU (ASIC) optimized for their services. This phenomenon proves the fundamental value of NPU, that is, hardware optimized for specific purposes is more efficient than general-purpose hardware, suggesting that the AI hardware market’s future will diversify.
Challenging NVIDIA’s Stronghold: The Current State of NPU Development in South Korea
As global tech giants expand their territories in AI semiconductors with their respective strategies, the Korean semiconductor industry is also stepping up to this massive trend. Not content with the title of the world’s strongest memory semiconductor nation, there are active movements to find new growth engines in the AI semiconductor market, often referred to as the flower of system semiconductors. Korea’s strategy takes the form of a clever ‘pincer movement’, targeting both the data center ‘inference’ market and the ‘on-device edge’ market rather than directly attacking NVIDIA’s heart. From major companies like Samsung and SK to well-funded startups, let’s take a closer look at the current state of the K-Semiconductor NPU Dream Team.
The Giants’ Joint Operation: Samsung Electronics and SK Hynix
Samsung’s On-Device Powerhouse: Exynos NPU
Samsung Electronics has been developing its own mobile AP (Application Processor) series, ‘Exynos’, for a long time. Based on this experience, they focused early on the development of NPU to process AI computations within mobile devices. The beginning was the ‘Exynos 9820’ unveiled in 2018, which improved AI computation capabilities by a staggering 7 times compared to previous generations, heralding the onset of the on-device AI era.
Recently, the pinnacle of Samsung NPU technology was showcased in the ‘Exynos 2400’. This chip, embedded in the Galaxy S24 series, achieved an astonishing 14.7 times improvement in NPU performance compared to the previous model (Exynos 2200). This powerful NPU performance translates directly into features that consumers can experience. The core features of ‘Galaxy AI’ heavily promoted by the Galaxy S24 series, such as real-time translation of calls without internet connectivity and generating images from text descriptions, are made possible by this Exynos 2400 NPU. This is the most concrete example of how NPU technology has become a core competitive advantage for flagship products.
SK’s Strategic Shift: From SAPEON to the Rebellion Alliance
SK Group also entered the AI semiconductor market early on. Starting as a subsidiary of SK Telecom, ‘SAPEON’ focused on developing NPU for data center inference. In 2020, it introduced the country’s first data center AI chip, ‘X220’, and subsequently released the significantly improved ‘X330’. The X330 achieved over 4 times the computational performance and over 2 times the power efficiency compared to its predecessor, earning recognition for its technology by targeting NVIDIA’s mid-range inference card, L40S. It has also been validated as a semiconductor that can be integrated into servers from global server manufacturers like Supermicro, enhancing its commercialization potential.
However, SK envisioned a bigger picture. They decided to avoid excessive competition among domestic AI semiconductor startups and create a ’national representative’ company that can compete in the global market. In 2024, SK announced the merger of SAPEON with another NPU powerhouse, ‘Rebellions’. This strategic choice marks a significant milestone in the history of Korean AI semiconductors, forming an alliance to consolidate scattered technologies and resources to counter NVIDIA.
Fearless Challengers: Korea’s Fabless Trio
Another pillar of the Korean NPU ecosystem is the innovative fabless (semiconductor design-focused) startups armed with cutting-edge technology. They are achieving remarkable results in their respective areas, backed by the strong support and investments from large corporations.
Rebellions: A Rising Star Backed by Massive Investments
Rebellions has garnered significant market attention, raising 280 billion won in cumulative investments within just three years of its founding. Their flagship product is the ‘ATOM’, an NPU for data center inference. The biggest achievement of Rebellions is successfully applying ‘ATOM’ to actual commercial services. KT serves as a strategic investor and key partner, introducing ATOM into its cloud data center to commercialize the first domestic NPU-based cloud service. The lightweight version of KT’s super-large AI model, ‘Mi:dm’, also utilizes ATOM, proving that domestic NPU can play a crucial role in actual AI services. Rebellions’ next goal is a larger market. They are currently collaborating with Samsung Electronics to jointly develop the next-generation AI semiconductor, ‘REBEL’, targeting the large language model (LLM) market like ChatGPT, which will be a key weapon for Rebellions’ global market entry.
FuriosaAI: A Technology-Centric Challenger Competing on Performance
FuriosaAI is another NPU powerhouse challenging NVIDIA with overwhelming performance and efficiency. Founded in 2017, the company surprised the market with its second-generation chip ‘Renegade’, following its first-generation chip ‘Warboy’. Renegade achieves 512 TOPS of high computational performance with just 180W of low power, boasting superior power efficiency compared to competing GPUs. This technological prowess is well-known for the anecdote of receiving but rejecting an acquisition offer from Meta, the parent company of Facebook. A decisive moment proving FuriosaAI’s technology was its partnership with LG AI Research Institute. LG conducted tests to run its super-large AI model, ‘Exaone’, using FuriosaAI’s Renegade instead of expensive NVIDIA GPUs, confirming significant performance. This symbolic event shows that domestic NPU can realistically replace GPUs in the core LLM infrastructure of large corporations. Additionally, they are consistently creating commercialization cases by supplying chips for the OCR (Optical Character Recognition) solution of AI startup Upstage.
DeepX: A Niche Powerhouse Targeting the Edge AI Market
While Rebellions and FuriosaAI target the data center market, DeepX aims to become a powerhouse in the ultra-low power ‘Edge AI’ market. Edge AI refers to the technology that processes AI computations within devices at the network’s edge, such as smartphones, home appliances, robots, and CCTV. DeepX’s core competitiveness lies in providing an optimized NPU portfolio tailored to the requirements of various edge devices. They supply customized AI semiconductors for different application areas, including physical security, smart factories, robotics, and smart mobility. Moreover, they focus on enhancing customer development convenience by not only selling chips but also providing ‘full-stack software’ such as compilers, runtimes, and SDKs (Software Development Kits) to facilitate easy development and operation of AI models. DeepX has gained global recognition for its technology, winning three innovation awards at CES 2024, establishing a unique competitive edge in the specific field of Edge AI.
Thus, Korean NPU development is forming an organic ecosystem where large corporations and startups play their respective roles and strengths. When fabless startups innovate in design, Samsung Electronics or SK Hynix provide foundry (semiconductor contract manufacturing) and HBM (High Bandwidth Memory), while large corporations like KT or LG open up initial markets in a cooperative structure. This unique ‘Korean NPU Alliance’ is becoming the most realistic and powerful strategy to challenge NVIDIA’s stronghold.
Korean NPU Challengers: Competitive Landscape
| Company | Representative Product | Major Partners/Sponsors |
|---|---|---|
| Samsung LSI | Exynos 2400 | Samsung Electronics MX Division (Galaxy) |
| Rebellions (SAPEON Integrated) | ATOM / REBEL / X330 | KT, SK Telecom, Samsung Electronics, IBM |
| FuriosaAI | Renegade | LG, Upstage, Naver |
| DeepX | DX-M1 & Portfolio | Various industrial clients |
The Race Towards the Future: Prospects for the NPU Market and Challenges for South Korea
The AI semiconductor market has just begun to open the floodgates of explosive growth. The future unfolding ahead presents a massive opportunity for K-Semiconductors, but it will also serve as a harsh testing ground. Let’s explore how great the market potential is and what challenges South Korea must overcome to win in this fierce race.
Trillions of Won Opportunity: NPU Market Outlook
Market analysis firms predict a very bright future for the AI chip market. The global AI chip market is expected to grow to hundreds of billions of dollars by the early 2030s. In particular, the NPU and on-device AI market are anticipated to record high growth rates ranging from 20% to 35% annually, making them the hottest battlegrounds.
This growth will accelerate as AI permeates all areas of our lives, from smartphones and PCs to cars, robots, medical devices, and smart homes. As demand for AI that operates quickly and securely on devices without relying on the cloud skyrockets, the importance of low-power high-efficiency NPUs will only increase. The ‘inference’ and ‘edge’ markets targeted by Korean NPU companies are at the center of this growth.
The Real Battlefield: Beyond Hardware to Ecosystem Wars
However, to reap the fruits of future markets, simply creating high-performance chips will not be enough. The essence of the AI semiconductor war is not a competition of hardware specs but a war of ‘software ecosystems’. This is precisely why NVIDIA’s stronghold remains solid and the biggest challenge faced by all latecomers.
As mentioned earlier, NVIDIA’s CUDA platform is a software asset accumulated over more than 15 years. Countless developers worldwide are familiar with the CUDA environment, and there are innumerable AI models and applications built on it. No matter how high-performing a new NPU emerges, it becomes useless without software tools (compilers, libraries, SDKs, etc.) that developers can easily and conveniently use. Developers do not want to relearn all existing development processes from scratch for new hardware.
This is why all NPU companies are putting their lives on the line to develop their own software stacks. The emphasis on hiring personnel for full-stack software development in DeepX’s job postings and Rebellions’ claims of perfect compatibility with standard frameworks like PyTorch and TensorFlow are all desperate efforts to survive in this ecosystem war.
The South Korean government is also clearly aware of the importance of this issue. The ‘K-Cloud Project’, led by the Ministry of Science and ICT, is a national-level strategic response to this reality. This project aims to invest over 400 billion won by 2030 to develop both the hardware and software infrastructure of data centers based on domestic NPUs. This is an attempt to create a massive software ecosystem, or ‘Korean CUDA’, where domestic NPUs can operate smoothly, going beyond merely supporting individual companies. The success of this project will be a crucial variable determining the future of the domestic NPU industry.
The Path Forward for South Korea: Challenges and Opportunities
The path for K-Semiconductors to rise as a key player in the global NPU market is certainly not smooth. However, clear opportunities also exist.
Challenges
- Software Gap: The wall of the CUDA ecosystem built by NVIDIA over decades remains high. Catching up will require continuous and substantial investment.
- Scale and Capital: The R&D investment scale of domestic startups is still less than 1% of NVIDIA’s. To compete globally, much larger capital and talent acquisition are essential.
- Securing Global Clients: Beyond government projects or collaborations with domestic large corporations, entering the challenging global hyperscaler and data center market to secure large contracts is the ultimate gateway to survival and growth.
Opportunity Factors
- Niche Market Capture: Focusing on the ‘inference’ and ‘edge’ markets, where NPU has structural advantages instead of the ‘training’ market dominated by NVIDIA, is a very effective strategy. Securing technological superiority in this market can help expand market share.
- Synergy in the Domestic Ecosystem: The Korean industrial structure, which possesses world-class memory (HBM) technology and foundry infrastructure, presents an unparalleled opportunity for NPU fabless companies. Close collaboration between HBM manufacturers like SK Hynix, Samsung Electronics and NPU design companies can create powerful synergies in next-generation chip development.
- National Support: Strong government support policies like ‘K-Cloud’ can play a crucial role in creating initial markets and accelerating technology development.
Ultimately, the success of Korean NPU companies carries significant implications not just for the industrial success of a single nation but for the global AI industry. The current AI industry carries the risk of excessive dependence on a single supplier, NVIDIA. If companies like the Korean Rebellions-SAPEON alliance or FuriosaAI establish themselves as meaningful alternatives in the global data center market, it would signal a strong possibility that the GPU-centric AI hardware market could diversify towards NPU.
This marks the beginning of a massive paradigm shift surrounding the future of AI infrastructure, and the challengers from South Korea stand at the center of this history. The brain-like semiconductor, NPU, is set to play a pivotal role in the new AI era, and the world is watching what role K-Semiconductors will take.