The Real Reason AI is Stalling: It’s Not the Processor, It’s the ‘Road’

hyperscale datacenter server rack blue light cables

In the spring of 2023, after ChatGPT took the world by storm, Microsoft made a public statement.

It was about “a shortage of electricity in data centers.”

It wasn’t that the AI boom suddenly started consuming more electricity. The problem was much older, and much more fundamental.

30 to 40 percent of the power consumed by data centers worldwide is not used for computation.

It’s consumed in the process of moving data from memory to the processor, and from the processor back to memory — the movement itself.

A multi-trillion dollar AI system is, in reality, spending nearly half its time waiting.

It’s like a car with a powerful engine stuck in traffic.

The problem isn’t the engine. The road is blocked.

highway traffic congestion aerial vs free flow

1. The Physical Law of the Memory Wall

People in the semiconductor industry call this problem the ‘Memory Wall.’ This concept was first named in 1994.

It refers to the physical limitation where processor speeds grow rapidly each year, but the data transfer speed between memory and the processor cannot keep up. -> 30 years have passed. The wall has not disappeared. In fact, it has grown higher.

Nvidia’s H100 GPU can perform trillions of operations per second.

However, when this GPU is actually performing inference tasks — when a user asks, “What comes next in this sentence?” — there are cases where the computation unit itself is idle for more than half the time. This is .

Why does this happen? The answer lies in the structure of DDR5, the current mainstream memory for servers.

The DDR5 data bus is like a one-lane road. Reading (Read) and writing (Write) share the same physical lines.

While data is being read from memory, other data cannot be written.

Each time the direction changes, the signal must stabilize.

This waiting time is called the ‘Bus Turnaround Penalty.’

Each turnaround takes about 15-20 clock cycles, which translates to 11-15 nanoseconds physically.

Nanoseconds? How much time is that?

When you ask ChatGPT a question, the response flows out word by word.

During the generation of a single word, the GPU repeats hundreds of millions of memory reads and writes.

It waits every time the direction changes. It waits hundreds of millions of times. A small loss, accumulated hundreds of millions of times, is not a small loss.

The problem isn’t just performance degradation.

Even while waiting, electricity flows through the memory bus.

Even without sending data, electricity is consumed by charging and discharging the parasitic capacitance of the circuit.

While AI is thinking, power is consumed doing nothing.

This is the actual structure of the data center power crisis.

It’s not the advanced AI computation that’s the problem. It’s the way advanced AI ‘waits’ that’s the problem.

DDR5 RAM module close-up gold pins macro photography

2. If the Blood Vessels are Blocked, the Heart is Meaningless

In the human body, there’s an intuition that ‘a strong heart means health.’

But what if the arteries are blocked? Even the strongest heart cannot deliver blood to every part of the body. Heart failure is often not an issue with the heart itself.

The same applies to computer architecture.

The more powerful processors become, the more pronounced the problem of the ‘blood vessels’ supplying data becomes. The blood vessels are too narrow compared to the strong heart.

This analogy isn’t just rhetoric because the operating principles of actual human blood circulation and data interconnects are surprisingly similar.

The key to blood circulation is that the two-way flow of ‘sending’ oxygen to tissues and ‘bringing’ carbon dioxide to the lungs happens simultaneously.

Arteries and veins use completely separate pathways.

Therefore, oxygen supply and waste removal proceed at the same time.

If arteries and veins shared the same blood vessel, blood would have to stop and wait every time the direction changed.

DDR5 is that bad imagined blood vessel.

CXL (Compute Express Link) has a structure that separates arteries and veins.

CXL is based on the PCIe (PCI Express) physical layer.

PCIe is inherently designed for full-duplex communication, with physically separate transmit (TX) and receive (RX) lines. While read data flows through the RX lines, write data is simultaneously transmitted through the TX lines.

The direction change penalty is eliminated by the structure itself.

Empirical data quantifies this difference.

In a mixed traffic environment where reads and writes are equally mixed — which is precisely the situation with actual AI workloads — a CXL-based system has 55-61 percent higher effective bandwidth compared to a DDR5 channel with the same clock speed.

Of course, CXL comes with costs.

Processing PCIe packets and going through an external controller increases signal latency.

-> While DDR5 is around 75-85 nanoseconds, CXL is in the range of 130-200 nanoseconds. It takes nearly twice as long for the signal to arrive.

So, isn’t CXL slower?

Here, two concepts need to be distinguished.

Latency is the time it takes for the first piece of data to arrive.

Bandwidth is how much data can be transmitted per unit of time.

It’s the difference between a fast train that doesn’t come often and a slightly slower train that runs continuously.

Large language models like GPT-4, LLaMA, and Gemini require continuous supply of hundreds of billions of parameter data for inference.

It’s the continuity of data supply, not the responsiveness of a single transaction, that determines performance.

In this environment, the gain in bandwidth sufficiently compensates for the loss in latency. The blood vessels have become wider. The heart can beat better.

human blood circulation system diagram artery vein anatomy illustration

3. How Idle Memory Becomes Money

There’s a story that is more immediately striking than technical improvements: the story of money.

One of the most inconvenient phenomena for managers in large data centers today is ‘Stranded Memory.’ Translated, it means ‘isolated memory’ or ‘idle memory.’

In the current server architecture, memory is physically fixed to each CPU node.

If CPU in Node A is busy and then stops, the 256 gigabytes of memory attached to that node sits there doing nothing.

Even if Node B needs more memory, there’s no way to borrow memory from Node A. A physical barrier prevents it.

When this situation is scaled up to the data center level, the scale changes.

Out of thousands of servers, only a portion are under high load.

The memory in the remaining nodes is largely idle, consuming electricity.

Even though there is theoretically enough memory capacity, memory shortages occur in specific nodes. So, more memory is purchased. With idle memory already present.

CXL switches dismantle this structure.

CXL-based Memory Pooling Architecture bundles physical memory into a shared resource pool through a virtualization switch.

If Node B needs memory, it's immediately allocated in 1GB increments from the pool.

When Node B's task is complete, the memory returns to the pool and becomes available for other nodes.

What does this mean for server capacity planning?

In the traditional approach, memory had to be ‘over-provisioned’ to handle peak loads.

Even if not actually used simultaneously, sufficient memory had to be constantly installed in each node to prepare for worst-case scenarios.

This excess is precisely what constitutes stranded memory.

In a CXL pooling environment, the total amount of physical memory that needs to be purchased to handle the same workload decreases because the overall memory resources can be optimally utilized.

Furthermore, instead of filling individual nodes with ultra-high-density RDIMMs, the purchase cost can be lowered by distributing multiple, relatively cheaper, general-purpose 64GB RDIMMs across CXL PCIe expansion slots.

There is empirical data showing that the capital expenditure for building the same amount of usable memory can be reduced by up to 50 percent.

Operating costs also change.

CXL physical controllers are equipped with power management circuits that detect traffic density in microsecond units.

When data flow decreases, the link is switched to an ultra-low-power state.

When the next data request arrives, it recovers in nanosecond units.

According to Samsung Electronics’ mass production data, this mechanism reduces the idle standby power of memory modules by more than 50 percent.

Further advancements include integration with inline hardware compression technology.

When ZeroPoint Technologies’ DenseMem compression IP is packaged on the same die as a CXL 3.1 controller, all data compression and decompression are automatically handled by hardware accelerators.

Without using even 1 byte of CPU computation resources, the data stored in physical memory is compressed at a ratio of up to 3:1.

The result sounds paradoxical.

Without increasing the physical number of memory chips, the logical memory size recognized by the operating system triples.

As the size of signal packets passing through the bus actually decreases, the transmission power is further reduced.

Capital expenditure, operating costs, and power costs are simultaneously reduced. This is why CXL is more than just a performance specification.

data center technician installing server memory module rack

4. Samsung Electronics Paved the Way

One entity has proactively moved from academic specification discussions to actual products for CXL.

In May 2021, Samsung Electronics developed the industry’s first CXL 1.1-based CMM-D (CXL Memory Module-DRAM).

At the time, most server vendors classified CXL as a ‘post-2025 technology.’ Samsung had already created a physical product before then.

In May 2022, the second-generation CMM-D 2.0, natively supporting the CXL 2.0 specification, entered mass production.

This product supported up to 256 gigabytes of capacity per slot and stably provided an effective bandwidth of 36GB/s.

The next step is CMM-D 3.1.

Based on the PCIe 6.0 interface, it will offer 1TB of memory space per module, with a bandwidth of 72GB/s.

What this figure signifies is more than just a number.

It means a single CXL slot is starting to exceed the total bandwidth limit of memory channels directly connected to existing CPU sockets.

SK Hynix has taken a different path.

In March 2026, they unveiled a next-generation CMM-DDR5 module at the Global Flash Summit and simultaneously released HMSDK (Heterogeneous Memory Software Development Kit) as open-source. This is a declaration that they won’t just sell hardware. -> They intend to first create an ecosystem where developers can directly control heterogeneous memory pools with software.

Micron, with its CXL 2.0 compatible CZ120 series developed in 2023, is accumulating commercial reliability data by passing validation cycles in production data centers.

These three companies are climbing the same mountain from different directions.

Nvidia has announced that its Vera CPU, scheduled for release in late 2026, will include CXL 3.1 as standard.

Intel’s 5th Gen Xeon and AMD EPYC Turin already support CXL 2.0 or higher.

Google is actively deploying CXL physical interconnects in data centers worldwide.

The market has moved beyond waiting for specifications; it has entered a phase where specifications are creating the market.

Industry estimates project the global CXL market to grow to approximately $15.8 billion by 2028. The 2024 estimate was $1.2 billion.

That’s a 13-fold increase in four years.

semiconductor cleanroom engineer wafer inspection memory chip production

5. Why Korea Can Win This Battle

CXL, as a new standard, redraws the technological landscape.

What was previously advantageous is less so, and what was previously impossible to enter becomes possible.

The Korean semiconductor industry has long had an unbalanced structure.

While possessing world-class DRAM manufacturing capabilities, it has been difficult for domestic companies to rank in the global top 10 in areas like fabless, which creates core intellectual property (IP), OSAT for high-value packaging, and precision testing equipment.

It was a country that made good DRAM, but not a leader in the semiconductor ecosystem.

CXL is shaking up this structure.

New interconnect standards require new chips.

Switch chips, controller chips, PHY IP, testing equipment, high-layer substrates. In each of these areas, Korean companies hold unusually leading positions.

Panmnesia: World’s First in Switch Chips

Panmnesia, founded by researchers from KAIST,

developed the world’s first CXL 3.2 full-spec single switch chipset that performs port-based routing entirely.

This chip, named ‘PCIe 6.4-CXL 3.2 Fusion Switch,’ connects CPUs, GPUs, accelerators, and shared memory devices within a single physical semiconductor to form a composable shared system.

Its differentiated point is reducing the latency overhead incurred when passing through switch hops to less than double-digit nanoseconds (ns).

It includes KV (Key-Value) cache processing, a bottleneck in generative AI inference, within hardware acceleration circuits inside the switch.

It received a CES 2025 Innovation Award and has raised 80 billion won in Series A funding, exceeding a cumulative enterprise value of 340 billion won.

It is currently undergoing validation tests with global big tech companies.

semiconductor die close-up circuit pattern macro photography chip

MetisX: Chips That Compute Within Memory

MetisX, comprised of former architects from Samsung Electronics and SK Hynix, posed a different question.

What if we computed directly within memory instead of bringing data to CPUs or GPUs to compute?

This concept, called ‘Computational Memory,’ has been realized on the CXL 3.0 architecture.

It integrates vector processing engines and data compression/filtering circuits within the controller chipsets of CXL memory modules.

Large-scale data filtering required for AI vector similarity calculations or RAG (Retrieval Augmented Generation) tasks is completed within the memory device, and only the processed results are sent to the processor.

The amount of data traveling back and forth on the bus is fundamentally reduced.

-> Processor heat generation is reduced, and data center cooling costs are lowered.

It has raised 60 billion won in Series A and is conducting joint engineering with global cloud companies.

DinoTisia: Semiconductor Dedicated to Vector Databases

Key to RAG (Retrieval Augmented Generation) and long-term memory implementation in generative AI is the vector database.

This involves converting text, images, and speech into high-dimensional numerical vectors and calculating their similarity.

DinoTisia is commercializing VDPU (Vector Data Processing Unit) designed exclusively for this task, and its ‘Seahorse’ platform for cloud control.

The VDPU is fully aligned with CXL interconnect logic, and they are pursuing IP licensing sales to multinational SoC manufacturers and board design companies.

OpenEdge Technology: A Company Thriving on IP

There’s a way to benefit from the semiconductor ecosystem without selling chips directly: selling design assets (IP).

OpenEdge Technology is a listed IP specialist company that supplies on-chip interconnect IP, memory controller IP, and PHY IP to the global market.

They possess the DDR5 and LPDDR5X IP that is essential for domestic and international foundries aiming to mass-produce CXL-based controller chipsets.

This structure generates royalties with every chip manufactured. 2025-2026 is analyzed as the inflection point.

Korea semiconductor factory aerial view Pyeongtaek Samsung fabrication plant

6. Those Who Inspect Control Quality

Something is as important as making chips.

Verifying that the manufactured chips function correctly.

Every time a new standard emerges, a new market for testing equipment opens up.

Existing DDR5 testing equipment cannot properly test CXL modules. The protocols are different, the speeds are different, and the required test conditions are different.

Neosem is a company that has proactively filled this void.

In 2023, they commercialized the industry’s first CXL 1.1 standard testing platform.

They didn’t stop there and completed the localization of high-speed, high-temperature burn-in test equipment for the CXL 2.0 specification, exclusively supplying the first commercial equipment to Samsung Electronics.

Neosem’s testing method is differentiated by the extremity of its operating conditions.

While the memory is actively operating, extreme thermal stress is applied by raising and lowering the chamber temperature to extremes, and defects are tracked in real-time.

Chips that operate normally at room temperature may exhibit errors under temperature changes. These errors are caught before shipment.

The global semiconductor testing market was dominated by Teradyne in the US and Advantest in Japan. Neosem succeeded in delivering mass production testing equipment for the new CXL standard before these two giants.

Exicon is focusing on developing CXL 3.0 specialized memory modules and protocol analysis equipment using Samsung Electronics Foundry’s 4nm process.

Based on its PCIe interface analysis patents, it is expanding its customer base by establishing a branch in Silicon Valley.

Fadu is conducting field tests of controller chipsets compatible with CXL physical protocols and FPGA-based proof-of-concept cards.

Qualitas Semiconductor has secured PHY design assets compatible with PCIe 6.0 and CXL 3.1 that deliver speeds of 64GT/s or higher without physical noise, and is conducting early verification with global system semiconductor companies.

semiconductor burn-in test equipment automated testing floor electronics

7. Without a Substrate, the Chip is Adrift

A semiconductor chip is not a complete product on its own.

It only becomes operational when it is placed on a foundation called a substrate.

CXL high-speed signals place special demands on the substrate. To pass signals of 64GT/s or higher without distortion, the substrate material and stacking design must differ from standard server substrates.

TLB is a company that has proactively secured design and manufacturing process technologies for high-frequency, lossless signal transmission layers for CXL modules and next-generation low-power LPCAMM modules.

To meet the rapidly increasing global demand, it has brought its second factory in Vietnam into early operation and achieved single-digit ppm yields with AI-controlled automated robots. It is a key collaborative partner for both Samsung Electronics and SK Hynix.

Daeduck Electronics is a global top-tier packaging board manufacturer that has completed investments in expanding its FC-BGA (Flip-Chip Ball Grid Array) large packaging board capacity.

In a structure where product prices are skyrocketing due to the trend of miniaturizing bump bonding density and multi-layering, it is in a position to benefit increasingly as technological barriers rise.

8. The Real Fight Starts Here

One might think the technology is complete. There are chips, switches, testing equipment, and substrates. Standards are also defined.

So why isn’t CXL yet mainstream in the server market?

There are several reasons, but one lies in an unexpected area: the cost of software design tools.

EDA (Electronic Design Automation) software, monopolized by multinational corporations like Synopsys and Cadence, is an essential tool for semiconductor design.

License fees can range from hundreds of millions to billions of won annually.

Even if startups like Panmnesia, MetisX, and DinoTisia develop world-first technologies, they must continue to use this software to design the next generation of chips. For startups that haven’t generated significant mass production revenue, this cost is financial suffocation.**

This is not a technological problem. It’s a structural problem.**

This is why what the Ministry of Science and ICT and the Ministry of SMEs and Startups should be doing now is not just dividing and distributing government subsidies.

The tax credit limit for EDA software procurement costs needs to be significantly expanded, and

a long-term lease support guarantee fund needs to be constantly operated.

Institutional mechanisms are needed to prioritize MPW (Multi-Project Wafer) slots at Samsung Electronics and SK Hynix foundries for domestic strong fabless companies.

Without these, teams that develop world-first technologies will repeatedly be sold to foreign capital at the mass production stage, or absorbed by foreign fabless companies.

New rules are being written regarding how humanity will design AI infrastructure.

Korean companies are at the table where those rules are being written. For the first time.

9. Beyond the Memory Wall

There was a problem named 30 years ago. The memory wall.

Attempts to overcome that wall have continued. Faster DRAM was made, more channels were connected, and higher-density chips were stacked. Yet, the wall did not disappear.

The architecture itself was creating that wall.

CXL is an attempt to change the architecture. It breaks down the boundaries between memory and processors, creates physical lines where reads and writes can flow simultaneously, and transforms memory tied to fixed nodes into shared resources.

What kind of world will emerge when this change is complete?

Not faster AI. A world where more AI is possible with less electricity.

If the real barrier to AI infrastructure expansion today is not land but the power grid — and if a significant portion of that power waste comes from the time data spends waiting —

the problem CXL solves is not a technical problem. It’s an energy problem.

If data can be moved more efficiently, AI can be provided to more people at a lower cost.

A portion of the response speed and subscription cost of the AI services you are using now comes from this direction change penalty on the memory bus.

So, a question remains.

For 30 years, we’ve been running towards ‘stronger processors.’ The wall was always elsewhere. Where is the next wall? And how long will it take us to discover it?

fiber optic cables light data flow abstract technology blue

References

CXL Consortium Technical Working Group. CXL 2.0 and 3.0 Specification Standard White Papers (Compute Express Link Specification). Compute Express Link Official Technical Document Repository. 2022–2024.
KAIST Collaborative Research Center for Next-Generation Memory Systems. Performance Analysis of Memory Hierarchy and CXL Interconnect for Next-Generation High-Bandwidth Computing. Journal of the Korean Institute of Electrical Engineers. 2023.
Samsung Electronics Memory Business Unit New Business Strategy Team. CMM-D Product Family Technology Roadmap and Value Assessment for Overcoming Data Center Memory Bottlenecks. Samsung Semiconductor Technical White Paper Series. 2022–2024.
ZeroPoint Technologies; Rambus Joint Analysis Research Team. Total Cost of Ownership (TCO) and Power Efficiency Model Analysis in CXL Memory Pooling Environments. IEEE Computer Architecture Letters. 2024.
Noh Geun-chang et al. Analysis Report on Korean System Semiconductor Companies’ Preemption of the Heterogeneous Computing Paradigm. Hyundai Motor Securities Industry Analysis Report. 2024–2025.
SK Hynix Solution Development Center Software Lab. HMSDK and Open-Source Control Stack Technology Trends for Heterogeneous Architectures. SK Hynix Technical Seminar Presentation Materials. 2026.
Wulf, Wm. A.; McKee, Sally A. Hitting the Memory Wall: Implications of the Obvious. ACM SIGARCH Computer Architecture News. 1995.
NVIDIA Corporation. Vera CPU Architecture Overview and CXL Integration Roadmap. NVIDIA GTC Technical Proceedings. 2025.
Intel Corporation. 5th Gen Intel Xeon Processor Platform: CXL 2.0 Implementation Guide. Intel Developer Zone Technical Documentation. 2024.
AMD Corporation. EPYC Turin Architecture: Memory Subsystem and CXL Controller Design. AMD Developer Resources. 2024.
Google Infrastructure. Hyperscale Memory Disaggregation with CXL: Early Deployment Data from Production Workloads. Google Research Technical Report. 2025.
Rambus Inc. PCIe 6.0 Physical Layer Controller for CXL 3.1: Performance and Power Characterization. Rambus Product Technical Documentation. 2024.
Panmnesia. PCIe 6.4-CXL 3.2 Fusion Switch: Low-Latency Composable Memory Architecture. CES 2025 Innovation Award Technical Submission. 2025.
MetisX. Computational Memory Architecture on CXL 3.0: Near-Memory Processing for AI Vector Workloads. Hot Chips Symposium Proceedings. 2025.
Neosem. CXL 2.0 Module Burn-in Test Solution: Field Validation Report for Next-Generation Memory Qualification. Semiconductor Equipment Technology Conference Presentation Materials. 2024.