WHAT IS THE RPU? →
⬡ SILICON-PROVEN IP 99.998% CPU Cycle Reduction Zero False Wake-Ups 2 clk Deterministic Wake-Up 625 MHz TSMC 65nm · 0 ps WNS 2,960 Standard Gates 15× Toggle Reduction · FPGA 0.014 mW SKY130 Leakage 1.70 mW TSMC 65nm Power 5,000,004 Validated Cycles 🏆 D&R Product of the Week ⬡ SILICON-PROVEN IP 99.998% CPU Cycle Reduction Zero False Wake-Ups 2 clk Deterministic Wake-Up 625 MHz TSMC 65nm · 0 ps WNS 2,960 Standard Gates 15× Toggle Reduction · FPGA 0.014 mW SKY130 Leakage 1.70 mW TSMC 65nm Power 5,000,004 Validated Cycles 🏆 D&R Product of the Week
clk
rst_n
data[11:0]
valid
wake_en
alert
RPU
REFLEX LAYER
D&R 🏆 Product of the Week Design & Reuse · 36,000 engineers

Stop burning cycles
on unchanged data.
Start sensing time. End the Stagnation Tax.

The RPU is a 2,960-gate synthesizable hardware block that monitors the temporal rate of change (ΔC/Δt) of your data and keeps your processor in sleep mode until something real happens. No software. No PMU. No redesign. RTL on GitHub. ASIC-ready. Can be evaluated on FPGA and STM32 today — no production silicon required.

99.998%
CPU Cycle Reduction
Measured · RISC-V
2 clk
Wake-Up Latency
Deterministic · always
2,960
Standard Gates
TSMC 65nm · fits any SoC
Real measurement — RISC-V Ibex SoC
5,000,000
CPU cycles
conventional
125
CPU cycles
with RPU
99.998%
reduction
measured
Stable sensor data. Zero false wake-ups. 2-cycle deterministic wake-up on real events.

Your IP block, your architecture, your process node — none of it matters. If your system processes data, the RPU works. Zero redesign. Your existing system stays exactly as it is.

2 INPUTS
in_data[11:0]
sensor / ADC output
in_valid
data valid strobe
RPU
2,960 gates
1 OUTPUT
wake_en
→ irq_external_i (RISC-V)
→ NVIC line (ARM Cortex-M)

TSMC 65nm · 625 MHz · 0 ps slack · 1.70 mW · 2,960 gates · RISC-V validated

Free 30-day technical evaluation. If it doesn't win — no invoice.

Listed on Design & Reuse · Product of the Week · 36,000 engineers

RPU Microelectronics — our architecture was benchmarked by TÜRKPATENT against IBM (US11144718B2) and HP (US8450711B2). No prior art contradiction found. Formal search report..

Does this sound familiar?
You already know
this problem exists.

Your CPU is running right now. The data it is processing has not meaningfully changed in the last ten milliseconds. You already know this — you have seen it in every project you have shipped. You have measured it with power profilers, you have watched the waveforms, you have looked at the idle-time graphs.

DVFS helps a little. Clock gating helps a little. Interrupt comparators help a little. Smart sensors help a little. You have tried all of them, combined them, tuned them. And yet your processor still spends most of its operational life confirming stagnation. Your battery budget knows it. Your thermal envelope knows it. The CFO asking about data center power bills knows it.

The question is not whether this waste exists. The question is whether it must.

We spent years asking a different question. Not how to make the polling faster, or the sleep deeper, or the DVFS smarter. We asked: what if the hardware itself could decide — before any software runs, before any interrupt fires — whether data was worth processing at all? The answer became the RPU. 2,960 gates, one clock cycle, no software, no PMU. The rest of this page shows the silicon that proves it works.

Why RPU
The problem every processor shares.
And why it is finally solvable.

For 70 years, the computing industry has been building faster processors. Faster clock speeds. Wider pipelines. More cores. But nobody asked the most important question: should the processor be running at all? Right now, as you read this, billions of CPUs are executing polling loops — reading sensor data, comparing to previous values, deciding nothing has changed, repeating. Billions of times per second. For data that has not moved.

This waste happens in every data center that consumes 200 terawatt-hours annually. It happens in every medical implant that drains its battery confirming a stable heartbeat. It happens in every radar system, every autonomous vehicle, every smartphone. Until now, this waste has been considered unavoidable. It is not.

Software PMUs are fundamentally flawed: a CPU cannot sleep until it decides to sleep — but deciding requires being awake. This circular dependency is the Von Neumann Stagnation Tax. The RPU breaks it at the gate level.

The RPU is a small, synthesizable hardware block — just 2,960 gates — that eliminates this waste at the physical level. It monitors the temporal rate of change (ΔC/Δt) of any incoming data stream. When data is stagnant, the RPU holds your processor in sleep mode and actively suppresses clock and power activity. When data changes meaningfully, the RPU wakes the processor in exactly 2 clock cycles. Always 2. Never more.

⚠ Conventional Polling
CPU runs full polling loop every clock cycle. Reads data. Compares. Decides nothing changed. Repeats. Forever.
100% active — data unchanged
VS
✓ RPU Active
CPU sleeps in WFI. RPU monitors ΔC/Δt at gate level. Data unchanged → stays asleep. Change → wakes in 2 clock cycles.
99.998% suppressed — wakes only on change
What "Reflexive-Ready" Means

We call any IP block "Reflexive-Ready" the moment the RPU is connected to it. This includes RISC-V cores, signal processors, neural accelerators, legacy peripherals, custom ASICs — anything that processes data. The IP block itself is not modified in any way. Its original function, its interfaces, its firmware all remain exactly as they were.

What changes is the energy behavior. The IP block stops waking up for data that has not meaningfully changed. Your existing IP catalog gains autonomous energy isolation without a single line of RTL modification on your side.

The RPU connects in parallel — it does not sit in your critical data path. If removed or bypassed, your system reverts to conventional polling with zero latency difference and zero data loss. The worst case is a system that runs exactly as before.
Common Question — "We already have smart sensors and DMA controllers."

Smart sensors and DMA controllers are excellent tools, and we are not trying to replace them. But the difference is where the decision happens and what it actually suppresses.

When a smart sensor detects activity, it raises an interrupt. The DMA moves data to memory. Eventually the CPU processes it. Even when the data is stagnant, the clock tree is still running, the bus is still toggling, and buffers are still switching. Energy is consumed on infrastructure activity, not on useful work.

The RPU operates before the data reaches the bus. It works at the gate level, in hardware, with zero software involvement. When the data is unchanged, the entire downstream logic is suppressed — clock trees stop toggling, buses stop switching, buffers hold their state. You are not just avoiding CPU wake-ups; you are suppressing the physical switching activity of your entire pipeline at its source.

Our 99.998% number is not a load reduction estimate or a marketing figure. It is a physical measurement of suppressed switching cycles, verified in RISC-V Ibex SoC integration over 5,000,004 simulation cycles with Verilator.

Common Question — "We already have WFI and interrupts."

WFI combined with interrupts is a good system, and for many applications it is more than sufficient. But it has three specific limitations that the RPU addresses, and these limitations matter when you are trying to build energy-efficient or real-time systems.

First, standard interrupts fire on every signal transition. Your CPU wakes for noise, drift, and meaningless fluctuations — then goes back to sleep. Every false wake costs energy, and in battery-powered systems this adds up quickly. The RPU applies a rate-of-change threshold in hardware, so noise and drift are suppressed before they ever reach the interrupt pin.

Second, ARM Cortex-M interrupts require 15 to 20 clock cycles minimum before a single line of firmware runs. That time is spent on context saving, pipeline flushing, and ISR entry. The RPU decides in exactly 2 clock cycles, every time, because its decision path is purely combinational.

Third, interrupt timing is non-deterministic. The CPU might be executing another task, which means your wake-up latency has jitter. The RPU is hardware-native, so 2 cycles is guaranteed regardless of what the CPU is doing.

You cannot make wake-up faster than 2 cycles without removing the decision layer entirely — which is exactly what the RPU does. It is the physical minimum.

Architecture protected at the principle level, not just the code. IP & Licensing ↗
ZERO RISK
Remove RPU → system reverts to polling. No degradation. No data loss. No latency difference. Absolute fail-safe.
ZERO REDESIGN
2 input signals. 1 output signal. One RTL instantiation. Nothing else in your system changes — no interface modifications, no firmware rewrites.
FREE EVALUATION
Run RPU on your own architecture, with your own data, on your own benchmarks. If it doesn't outperform polling, you owe nothing.
ANY ARCHITECTURE
RISC-V, ARM Cortex-M, GPU, NPU, fully custom. TSMC 65nm, SKY130, any standard CMOS node. Radar, medical, automotive, data center — same 2 wires.
The Challenge

Add the RPU to any IP block you own.
Run it on your own benchmarks.

Give us 3 minutes. Clone the RTL. Run the testbench. See 99.998% on your own machine. No email. No waiting.

If the RPU doesn't outperform your current implementation — we don't invoice. No fee, no hidden cost, no obligation.

The RTL is open — clone it from GitHub, run the testbench, see the results yourself. No email required. No waiting. For C-HAL driver, ASIC PPA reports, and commercial licensing, reach out to us directly.

✓ RTL open on GitHub
✓ C-HAL via request
✓ Zero obligation
✓ 48-hour response
⌥ Clone RTL on GitHub ↗ See the silicon proof ↓

You are paying a power tax to thin air right now. If the RPU eliminates it, we invoice a fraction of what we saved you. If it doesn't — we don't invoice at all. We will always ask for less than the value we save you.

Silicon Proof
Four independent layers
of real hardware evidence.

The video above shows a live hardware test on a Nexys A7 FPGA board. A single sensor feeds the same data stream into two parallel circuits running simultaneously. Both circuits drive a red LED, but they behave in fundamentally different ways.

The first LED represents the conventional system. It stays lit continuously throughout the recording, because the traditional threshold circuit is always active regardless of whether anything is happening. It burns energy while waiting for an event that may never come.

The second LED represents the RPU. At the start of the recording, it does not light up at all, because no meaningful data change is occurring. Only when real light reaches the sensor does this LED turn on — and it switches off the moment the light stops. Throughout the entire recording, there is only ambient conversation in the room, no actual light event. As you can see, the RPU circuit was effectively idle this entire time while the conventional circuit kept burning power. In the real world, systems spend most of their lives exactly like this — watching data that is going nowhere.

Vivado Power Analyzer — Measured Results
Metric
Conv. (AT)
RPU
Result
Signal Rate
7.04 Mt/s
0.47 Mt/s
≈ 15×
Toggle Rate
17.5%
≤ 12.5%
Lower
Mode
Always-on
Event-driven
Idle ✓

Since dynamic power P = α·C·V²·f, a 15× reduction in signal rate translates directly into 15× less dynamic power. This is physics, not interpretation.

RPU vs. Alternatives
Measured · Not estimated · Not modeled
Method CPU? Latency Det. F.Wake RTL
Interrupt / WFI Yes 15–20 clk No Common None
DMA Yes Medium No Common Medium
PMU / DVFS Yes ms range No Common High
Smart Sensor Yes Variable No Reduced Medium
RPU ✓ No 2 clk Yes Suppressed None

Below are four independent evidence layers. Every number comes from real silicon or real hardware measurement. Nothing is estimated, and nothing is modeled.

TSMC 65nm GP · Layer 1

625 MHz. 0 ps slack. 0 violations.

Cadence Genus synthesis. 2,960 gates. 1.702 mW average power. Full timing closure. Technology-node proven at production-grade foundry.

1.70mW
SkyWater SKY130 · Layer 2

Same RTL. Different node. Technology portable.

100 MHz operation. 0.014 mW leakage (0.35% of TSMC power). Confirms the architecture is not node-specific — synthesizable on any standard CMOS process.

0.014mW
FPGA · Nexys A7-100T · Layer 3

Real sensor. Real hardware. Direct comparison.

Vivado Power Analyzer. RPU versus conventional always-on threshold circuit under identical sensor input. 15× reduction in signal toggle rate — direct proportional reduction in dynamic power (P = α·C·V²·f).

15× toggle
RISC-V Ibex · 5,000,004 Cycles · Layer 4

Three real-world workload scenarios. 2 clk wake-up always.

lowRISC Ibex SoC testbench via Verilator. CPU enters WFI sleep; RPU wake_en connects directly to irq_external_i. Wake-up latency: 2 clock cycles across all scenarios.

ScenarioPollingRPUResult
Stable + noise5M12599.998%
Sudden spike5M33899.993%
Slow drift + anomaly5M1.49M70.3%
Architecture Principle
δ
delta
=
|avgnew − avgold|
bit-shift · O(1) · no divider
δ > θ ?
threshold check
wake_en
1 clock cycle · combinational
Decision path from metric computation to clock/power gate control is fully combinational within one clock cycle. No program counter. No instruction memory. No bus communication. The processor does not decide to sleep — the hardware decides for it.
"Stop processing data. Start sensing time."
The founding principle of the RPU architecture
Every Sector. Same 2 Wires.
Every technology company. Every sector.

We have presented the RPU to engineers from defense, automotive, medical, data center, edge AI, industrial, space, and consumer electronics. Every single one of them asked the same first question: does this work in my sector?

The answer is always yes, and the reason is simple. The RPU does not care what your data represents. It has no idea whether the stream is radar returns, glucose readings, LiDAR frames, vibration signals, audio samples, or video pixels. From its perspective, these are all just numbers changing at different rates. If they are stagnant, the processor sleeps. If they change, the processor wakes. Same two wires. Same two-cycle response. Same 2,960 gates. Every sector, every architecture, every process node.

Defense

Radar & SIGINT

Suppress stagnant radar returns. Wake on genuine target detection.

Details →
ADAS

Autonomous Systems

LiDAR/camera frame suppression when vehicle stationary or scene unchanged.

Details →
IoT · Medical

Battery-Powered

WFI until genuine change. Standby from days to years.

Details →
Data Center

SmartNIC / DPU

Suppress unchanged telemetry before host CPU interrupt triggers.

Details →
Edge AI

Inference Pipeline

Stagnant tensors suppressed before GPU/TPU compute cycles consumed.

Details →
Industrial

Predictive Maintenance

Vibration/acoustic nominal 99% of time. Wake in 2 cycles on anomaly.

Details →
Space

Rad-Tolerant Edge

Deterministic 2 clk. Combinational path. No program counter.

Details →
Consumer · Wearables

Always-On Devices

Your phone listens 24/7. Your watch tracks your heart. RPU keeps the processor asleep until something actually happens.

Details →
"The cheapest computation is the one that never occurs."
Integration
3 steps. 1 signal out.

We designed the RPU for fast integration. If you can add a standard SystemVerilog file to your project and route three signals, you can deploy it. There is no custom tooling, no proprietary bus, no licensed compiler. It works in Vivado, Quartus, Synopsys Design Compiler, Cadence Genus, and every other synthesis flow we have tested. Integration is a non-intrusive parallel connection — a single afternoon, not weeks.

The three steps below are the complete integration process. Nothing is hidden, nothing comes later. Your architecture does not matter — ARM Cortex-M, RISC-V, GPU, NPU, fully custom — it all works the same way.

What changes
When your processor wakes — only on genuine data change, in exactly 2 clock cycles
How much energy burns while waiting — near zero instead of full polling rate
Toggle rate on your clock tree — 15× reduction on FPGA, proportionally on ASIC
What does not change
Your ISR and firmware logic
Your sensor interface and data path
Your existing IP blocks
Your bus topology and interconnect
Your application software
Can't tape out today?

Test it in software first.

You do not need to wait for a tape-out to evaluate the RPU. We provide a lightweight C-HAL library — rpu.c and rpu.h — upon request. It runs the exact same ΔC/Δt decision logic on any existing microcontroller with a C compiler. STM32, ESP32, ARM Cortex-M, RISC-V — every platform we have tested works today.

The C-HAL runs the same ΔC/Δt decision logic in software. Yes — the CPU still wakes up to run it. But that is not the point. The point is what happens next: if the data has not changed, the C-HAL returns immediately and your heavy downstream workload never runs — no FFT, no inference model, no wireless TX, no sensor fusion pipeline. The CPU wakes, checks, and goes straight back to sleep. Your expensive compute never fires.

This gives you measurable downstream energy savings today, while you prove the math on your own data. When you move to the hardware RTL, the CPU never wakes at all — that is the final step. The API is identical between both versions: swap the backend, keep every line of your application code.

Software version: CPU wakes, heavy work skipped. Hardware version: CPU never wakes. Same ΔC/Δt logic. Same API. Two levels of savings.

If at any point you want to remove the RPU from your system, simply disconnect the wake_en line. Worst case: remove the RPU, system reverts to conventional polling. Zero difference.

SENSOR
RPU
CPU
/
GPU
/
NPU
For every technology company. Every SoC.
Live Signal Flow — ΔC/Δt Decision
SENSOR DATA
in_data[11:0]
RPU
WATCHING
Δ = 0
wake_en
CPU
SLEEPING
Unchanged data → CPU stays asleep → zero energy wasted
3 Steps to Integration
01
Add the RTL file to your project.
Download the RTL from GitHub ↗ — includes rpu_core.sv, simulation testbench, and post-synthesis testbench with SDF annotation. Drop the RTL into your source directory. In Vivado: Add Sources → Add Files. In Quartus: Project → Add/Remove Files in Project. In Synopsys/Cadence: add to your filelist. Set DEPTH (default 32) and DATA_WIDTH (default 12-bit) as parameters — no RTL modification required.
02
Connect 2 input signals.
Tap in_data[11:0] and in_valid from your existing sensor or ADC output. These are read-only taps — no changes to your existing signal routing, no bus modifications, no interface redesign.
03
Connect 1 output signal.
Route wake_en to your processor interrupt pin. RISC-V: irq_external_i. ARM Cortex-M: any NVIC line. Custom: any level-triggered interrupt input. No firmware changes. The CPU sees a standard external interrupt. Done.
+
Optional — C-HAL runtime tuning: Adjust threshold, depth, and adaptive mode at runtime via memory-mapped registers. C99, no malloc, zero dynamic allocation. No re-synthesis required when operating conditions change.
+
Optional — Guardian Sideband (Module 107): Independent monitoring on ungated clock. Reports last_delta, active_threshold, alert_status even when main clock is gated. Essential for watchdog compliance in defense and safety-critical applications.
Fail-Safe: Remove the RPU block entirely → your system reverts to conventional polling with zero latency difference and zero data loss. Worst case: remove the RPU, system reverts to conventional polling. Zero difference.
❌ Without RPU
5,000,000
CPU cycles · stable data · 5M sim
CPU runs a polling loop every clock cycle. Always on. Always burning energy for nothing.
✗ Software polling loop
✗ Continuous CPU activity
✗ Energy wasted on nothing
VS
✓ With RPU
125
cycles used · 99.998% reduction
CPU stays in deep sleep. Wakes in exactly 2 clock cycles when data changes past threshold.
✓ Hardware decision in 1 clk
✓ CPU wakes in 2 clk
✓ 99.998% cycle reduction
RTL Instantiation
// Single instantiation — 560 lines · SystemVerilog IEEE 1800-2017 rpu_core #( .DATA_WIDTH (12), // match your sensor/ADC width .DEPTH (32), // sliding window — power-of-two, even .USE_DYNAMIC_TH (1'b1) // adaptive threshold — recommended ) u_rpu ( .clk (sys_clk), .rst_n (sys_rst_n), .scan_en (1'b0), // tie low — connect to scan chain in DFT mode // Read-only taps — nothing upstream changes .in_data (sensor_data), .in_valid (data_valid), // Connect to your interrupt controller .wake_en (irq_external_i), // RISC-V: irq_external_i | ARM: any NVIC line // Leave unconnected on first integration .rpu_event_pulse (), .rpu_state (), .gclk_out (), .full_status (), .delta_abs_dbg (), .threshold_dbg (), .guardian_alert (), .guardian_last_data (), .guardian_last_delta (), .guardian_last_th () ); // Decision path: combinational, 1 clock cycle, no instruction execution // delta = |avg_new - avg_old| via bit-shift, no hardware divider // Fail-safe: remove instantiation → system reverts to polling, zero impact
DEPTH Parameter
Sliding Window
Default 32. Larger = more smoothing, slower drift response. Smaller = faster response, noise-sensitive.
Decision Origin
Within the Cell
No external controller. No program counter. No bus communication. Combinational hardware reflex.
Fail-Safe
Transparent Pass-Through
Worst case: behaves as transparent pass-through. Zero degradation relative to system without RPU.
RPU AI Assistant
Ask anything about RPU integration.

We trained a dedicated AI assistant on the full RPU technical corpus — the patent, the technical paper, the ASIC PPA reports, the RISC-V benchmark data, and the integration guide. Ask it anything about architecture, parameters, power analysis, RISC-V or ARM compatibility, or sector-specific deployment. It answers in plain English with direct references to the documents.

The assistant below handles common questions directly on this page. For deeper technical discussions, extended code reviews, or design trade-off analysis, open the full assistant in ChatGPT using the link at the bottom of this section.

RPU Technical Assistant GPT-Powered
Welcome! I'm the RPU Technical Assistant. Ask me about integration, parameters, power analysis, RISC-V compatibility, or sector-specific deployment. How can I help?
Documents
Download everything.
Verify it for yourself.

Every claim we make on this site is backed by a real document you can download and verify on your own. The ASIC PPA report comes directly from Cadence Genus synthesis. The RISC-V benchmark comes from lowRISC Ibex integration verified with Verilator. Our technical paper contains the full treatment of the architecture — download and verify every claim yourself. The patent search report is the official TÜRKPATENT novelty determination.

You can download each document individually below, or take the complete package as a single ZIP. We recommend starting with the executive summary if you are a decision maker, and the ASIC PPA report if you are an architect or engineer.

Complete Package
↓ Download ZIP
ASIC PPA report, RISC-V benchmark, technical paper, system schematic, FPGA demo, patent search report, executive summary.
Executive Summary
📄 Read Inline
Full overview opens in a reading pane — architecture, results, integration, licensing. No download, no waiting.
Open Reading Pane → Or
read the full white paper ↗
ASIC PPA Report
TSMC 65nm + SKY130
Cadence Genus · power, area, timing
↓ PDF
RISC-V Benchmark
Polling vs RPU · 3 Scenarios
lowRISC Ibex · Verilator · 5M cycles
↓ PDF
Technical Paper
Download Our Paper
Full architecture · 7 sections · 22 references
↓ PDF
System Schematic
Ibex + RPU Integration
Vivado · full SoC topology
↓ PDF
FPGA Demo Video
Live Hardware Comparison
Nexys A7 · LDR sensor · real-time
▶ Watch
Patent Search Report
TÜRKPATENT · TR 2025/012696
HP Y-code · IBM A-code · novelty
↓ PDF
RTL + Testbenches
Open on GitHub
RTL · simulation TB · post-synthesis TB (SDF)
⌥ Clone & Run ↗
What You Get
Everything you need. No surprises later.

When you begin evaluation, you receive the complete technical package. This is not a demo version or a limited preview. It is the full production-grade IP that we use for our own ASIC work. What you see below is exactly what arrives in your inbox.

Technical Package

Complete · secure
RTL + Testbenches
RTL · simulation TB · post-synthesis TB (SDF) · open on GitHub ↗
C-HAL Driver
C99 · threshold/depth/mode · via request
ASIC PPA
TSMC 65nm + SKY130
RISC-V + FPGA Data
Verilator · Vivado · Nexys A7
Integration Guide
Step-by-step · C-HAL

License Options

Flexible
What RPU saves you
Energy
99.998% cycle reduction. Lower power bill. Longer battery.
Time
No sleep-management code. No interrupt debounce. No power debug.
Speed
2 clk deterministic. No jitter. Real-time systems stay real-time.
We will always ask for less than the value we save you.
Evaluation
30 days. Free. No obligation.
Single Product
One tape-out · C-HAL included.
Portfolio
All your IP blocks · volume pricing.
Academic
Free for University Research
Integration Support
Co-design · direct engineering support.
What Happens After You Request

Within 48 hours of your request, we send the complete evaluation package via secure file transfer — typically the same day.

The evaluation runs for 30 days at no cost. During this period, our engineering team at RPU Microelectronics is available for direct technical support by email and video call. At the end of 30 days, you choose: license the RPU, extend the evaluation, or simply close the evaluation with no further obligation.

IP & Licensing ↗
vs. Alternatives
Every alternative needs something.
RPU needs nothing.

There are several established techniques for reducing processor energy consumption: DVFS, clock gating, interrupt comparators, and various research architectures. We respect all of them, and the RPU is not trying to replace them universally. But each one has a dependency the RPU does not have.

DVFS requires the operating system to participate in voltage and frequency decisions. Clock gating requires external control logic to decide when to gate. Interrupt comparators require the CPU to service the interrupt. Every existing approach puts the processor in the loop to manage its own power, and every interaction costs cycles and energy.

The RPU removes the processor from the decision entirely. The decision happens in hardware, at the gate level, before the CPU is ever aware that data has arrived. This is the only way to guarantee 2-cycle response, zero software overhead, and complete determinism.

Feature DVFS Clk Gate Wang'24 HP Mem. RPU
OriginOSExternalFPGAPassiveCell ✓
ΔC/ΔtALUO(1) ✓
IsolationNoClkNoNoClk+Pwr ✓
LatencymsExtµs1 clk ✓
CPU req.YesPartYesZero ✓
CMOSYesYesMemMemTSMC ✓
AdaptiveNoNoFPGANoHW ✓
Fail-safeNoNoNoNoYes ✓
Gate costOS+PMUExt.ctrlFPGA+ALUPassive2,960 ✓
Exotic Physics vs. The RPU Primitive

When official patent and research reports compared our architecture to industry giants, the conclusion was clear: no single reference anticipated the architecture.

IBM (US11144718B2) · HP (US8450711B2)
Complex neuromorphic physics and experimental memristors — still failed to achieve autonomous decision-making at the local cell level.
Wang et al. · Nature Communications 2024 · DOI: 10.1038/s41467-024-48908-8
Memristor-based neuromorphic perception — Cambridge, Beihang, UCL. Requires exotic memristor devices, external software adaptation, and specialized fabrication. Not synthesizable on standard CMOS.
RPU · Standard CMOS · 2,960 gates
Achieved it in standard CMOS — silicon-proven on TSMC 65nm and SKY130. A fundamental architectural primitive, not a material trick.
TÜRKPATENT · PCT/IB2026/053070 · No prior art contradiction found · Formal search report
Protected. Proven. Ready.
Protected. Proven. Ready to deploy.

The RPU is not a research concept or an early prototype. It is a hardware IP block with international patent protection, silicon-proven implementations on two independent process nodes, a comprehensive technical paper, and a formal listing on Design & Reuse. You are not evaluating an idea. You are evaluating a deployable product with legal, technical, and commercial backing.

TR 2025/012696
PCT/IB2026/053070
IP & Licensing ↗
TÜRKPATENT: prior art search complete
Technical Paper
Design & Reuse
TSMC+SKY130
Free for University Research
Design & Reuse · RPU-REFLEX 01 ↗ White Paper ↗
Your next step
Where to start, based on
what you are trying to do.

The RPU evaluation package is comprehensive. Here is where to start based on what you need to figure out first.

Senior Architect

Start with the silicon

Watch the 3-minute FPGA demo to see the hardware behavior. Then read the ASIC PPA report for timing closure, area breakdown, and power decomposition at TSMC 65nm and SKY130.

Engineer / Developer

Start with integration

Read the RISC-V benchmark to see the Ibex integration flow. Then clone the RTL from GitHub — run the testbench yourself. For C-HAL driver, request directly from us.

Decision Maker

Start with the summary

Read the one-page executive summary — architecture, measured results, integration path, and licensing options. Share it internally, then schedule a 30-minute call with our founder.

If you are not sure which path fits you, start with the executive summary — it gives you the complete picture in about two minutes of reading.

For CTO · VP Engineering · SoC Architect · Lead Architect

Your system processes data.
Most of it hasn't changed.
RPU stops the waste.

We believe the RPU will outperform your current implementation on your own architecture, with your own sensor data, on your own benchmarks. If we are right, we license. If we are wrong, you owe nothing.

The RTL is on GitHub — clone it, run the testbench, see 99.998% yourself. For C-HAL driver and commercial licensing, reach out. No fee. No obligation. If we are wrong — we don't invoice.

Read White Paper ↗
ceo@rpu-micro.com · 48-hour response
"The cheapest computation
is the one that never occurs."