The RPU is a 2,960-gate synthesizable hardware block that monitors the temporal rate of change (ΔC/Δt) of your data and keeps your processor in sleep mode until something real happens. No software. No PMU. No redesign. RTL on GitHub. ASIC-ready. Can be evaluated on FPGA and STM32 today — no production silicon required.
Your IP block, your architecture, your process node — none of it matters. If your system processes data, the RPU works. Zero redesign. Your existing system stays exactly as it is.
TSMC 65nm · 625 MHz · 0 ps slack · 1.70 mW · 2,960 gates · RISC-V validated
Free 30-day technical evaluation. If it doesn't win — no invoice.
Listed on Design & Reuse · Product of the Week · 36,000 engineers
RPU Microelectronics — our architecture was benchmarked by TÜRKPATENT against IBM (US11144718B2) and HP (US8450711B2). No prior art contradiction found. Formal search report..
Your CPU is running right now. The data it is processing has not meaningfully changed in the last ten milliseconds. You already know this — you have seen it in every project you have shipped. You have measured it with power profilers, you have watched the waveforms, you have looked at the idle-time graphs.
DVFS helps a little. Clock gating helps a little. Interrupt comparators help a little. Smart sensors help a little. You have tried all of them, combined them, tuned them. And yet your processor still spends most of its operational life confirming stagnation. Your battery budget knows it. Your thermal envelope knows it. The CFO asking about data center power bills knows it.
The question is not whether this waste exists. The question is whether it must.
We spent years asking a different question. Not how to make the polling faster, or the sleep deeper, or the DVFS smarter. We asked: what if the hardware itself could decide — before any software runs, before any interrupt fires — whether data was worth processing at all? The answer became the RPU. 2,960 gates, one clock cycle, no software, no PMU. The rest of this page shows the silicon that proves it works.
For 70 years, the computing industry has been building faster processors. Faster clock speeds. Wider pipelines. More cores. But nobody asked the most important question: should the processor be running at all? Right now, as you read this, billions of CPUs are executing polling loops — reading sensor data, comparing to previous values, deciding nothing has changed, repeating. Billions of times per second. For data that has not moved.
This waste happens in every data center that consumes 200 terawatt-hours annually. It happens in every medical implant that drains its battery confirming a stable heartbeat. It happens in every radar system, every autonomous vehicle, every smartphone. Until now, this waste has been considered unavoidable. It is not.
Software PMUs are fundamentally flawed: a CPU cannot sleep until it decides to sleep — but deciding requires being awake. This circular dependency is the Von Neumann Stagnation Tax. The RPU breaks it at the gate level.
The RPU is a small, synthesizable hardware block — just 2,960 gates — that eliminates this waste at the physical level. It monitors the temporal rate of change (ΔC/Δt) of any incoming data stream. When data is stagnant, the RPU holds your processor in sleep mode and actively suppresses clock and power activity. When data changes meaningfully, the RPU wakes the processor in exactly 2 clock cycles. Always 2. Never more.
We call any IP block "Reflexive-Ready" the moment the RPU is connected to it. This includes RISC-V cores, signal processors, neural accelerators, legacy peripherals, custom ASICs — anything that processes data. The IP block itself is not modified in any way. Its original function, its interfaces, its firmware all remain exactly as they were.
What changes is the energy behavior. The IP block stops waking up for data that has not meaningfully changed. Your existing IP catalog gains autonomous energy isolation without a single line of RTL modification on your side.
Smart sensors and DMA controllers are excellent tools, and we are not trying to replace them. But the difference is where the decision happens and what it actually suppresses.
When a smart sensor detects activity, it raises an interrupt. The DMA moves data to memory. Eventually the CPU processes it. Even when the data is stagnant, the clock tree is still running, the bus is still toggling, and buffers are still switching. Energy is consumed on infrastructure activity, not on useful work.
The RPU operates before the data reaches the bus. It works at the gate level, in hardware, with zero software involvement. When the data is unchanged, the entire downstream logic is suppressed — clock trees stop toggling, buses stop switching, buffers hold their state. You are not just avoiding CPU wake-ups; you are suppressing the physical switching activity of your entire pipeline at its source.
Our 99.998% number is not a load reduction estimate or a marketing figure. It is a physical measurement of suppressed switching cycles, verified in RISC-V Ibex SoC integration over 5,000,004 simulation cycles with Verilator.
WFI combined with interrupts is a good system, and for many applications it is more than sufficient. But it has three specific limitations that the RPU addresses, and these limitations matter when you are trying to build energy-efficient or real-time systems.
First, standard interrupts fire on every signal transition. Your CPU wakes for noise, drift, and meaningless fluctuations — then goes back to sleep. Every false wake costs energy, and in battery-powered systems this adds up quickly. The RPU applies a rate-of-change threshold in hardware, so noise and drift are suppressed before they ever reach the interrupt pin.
Second, ARM Cortex-M interrupts require 15 to 20 clock cycles minimum before a single line of firmware runs. That time is spent on context saving, pipeline flushing, and ISR entry. The RPU decides in exactly 2 clock cycles, every time, because its decision path is purely combinational.
Third, interrupt timing is non-deterministic. The CPU might be executing another task, which means your wake-up latency has jitter. The RPU is hardware-native, so 2 cycles is guaranteed regardless of what the CPU is doing.
You cannot make wake-up faster than 2 cycles without removing the decision layer entirely — which is exactly what the RPU does. It is the physical minimum.
If the RPU doesn't outperform your current implementation — we don't invoice. No fee, no hidden cost, no obligation.
The RTL is open — clone it from GitHub, run the testbench, see the results yourself. No email required. No waiting. For C-HAL driver, ASIC PPA reports, and commercial licensing, reach out to us directly.
You are paying a power tax to thin air right now. If the RPU eliminates it, we invoice a fraction of what we saved you. If it doesn't — we don't invoice at all. We will always ask for less than the value we save you.
The video above shows a live hardware test on a Nexys A7 FPGA board. A single sensor feeds the same data stream into two parallel circuits running simultaneously. Both circuits drive a red LED, but they behave in fundamentally different ways.
The first LED represents the conventional system. It stays lit continuously throughout the recording, because the traditional threshold circuit is always active regardless of whether anything is happening. It burns energy while waiting for an event that may never come.
The second LED represents the RPU. At the start of the recording, it does not light up at all, because no meaningful data change is occurring. Only when real light reaches the sensor does this LED turn on — and it switches off the moment the light stops. Throughout the entire recording, there is only ambient conversation in the room, no actual light event. As you can see, the RPU circuit was effectively idle this entire time while the conventional circuit kept burning power. In the real world, systems spend most of their lives exactly like this — watching data that is going nowhere.
Since dynamic power P = α·C·V²·f, a 15× reduction in signal rate translates directly into 15× less dynamic power. This is physics, not interpretation.
Below are four independent evidence layers. Every number comes from real silicon or real hardware measurement. Nothing is estimated, and nothing is modeled.
Cadence Genus synthesis. 2,960 gates. 1.702 mW average power. Full timing closure. Technology-node proven at production-grade foundry.
100 MHz operation. 0.014 mW leakage (0.35% of TSMC power). Confirms the architecture is not node-specific — synthesizable on any standard CMOS process.
Vivado Power Analyzer. RPU versus conventional always-on threshold circuit under identical sensor input. 15× reduction in signal toggle rate — direct proportional reduction in dynamic power (P = α·C·V²·f).
lowRISC Ibex SoC testbench via Verilator. CPU enters WFI sleep; RPU wake_en connects directly to irq_external_i. Wake-up latency: 2 clock cycles across all scenarios.
| Scenario | Polling | RPU | Result |
|---|---|---|---|
| Stable + noise | 5M | 125 | 99.998% |
| Sudden spike | 5M | 338 | 99.993% |
| Slow drift + anomaly | 5M | 1.49M | 70.3% |
We have presented the RPU to engineers from defense, automotive, medical, data center, edge AI, industrial, space, and consumer electronics. Every single one of them asked the same first question: does this work in my sector?
The answer is always yes, and the reason is simple. The RPU does not care what your data represents. It has no idea whether the stream is radar returns, glucose readings, LiDAR frames, vibration signals, audio samples, or video pixels. From its perspective, these are all just numbers changing at different rates. If they are stagnant, the processor sleeps. If they change, the processor wakes. Same two wires. Same two-cycle response. Same 2,960 gates. Every sector, every architecture, every process node.
Suppress stagnant radar returns. Wake on genuine target detection.
Details →LiDAR/camera frame suppression when vehicle stationary or scene unchanged.
Details →WFI until genuine change. Standby from days to years.
Details →Suppress unchanged telemetry before host CPU interrupt triggers.
Details →Stagnant tensors suppressed before GPU/TPU compute cycles consumed.
Details →Vibration/acoustic nominal 99% of time. Wake in 2 cycles on anomaly.
Details →Deterministic 2 clk. Combinational path. No program counter.
Details →Your phone listens 24/7. Your watch tracks your heart. RPU keeps the processor asleep until something actually happens.
Details →We designed the RPU for fast integration. If you can add a standard SystemVerilog file to your project and route three signals, you can deploy it. There is no custom tooling, no proprietary bus, no licensed compiler. It works in Vivado, Quartus, Synopsys Design Compiler, Cadence Genus, and every other synthesis flow we have tested. Integration is a non-intrusive parallel connection — a single afternoon, not weeks.
The three steps below are the complete integration process. Nothing is hidden, nothing comes later. Your architecture does not matter — ARM Cortex-M, RISC-V, GPU, NPU, fully custom — it all works the same way.
You do not need to wait for a tape-out to evaluate the RPU. We provide a lightweight C-HAL library — rpu.c and rpu.h — upon request. It runs the exact same ΔC/Δt decision logic on any existing microcontroller with a C compiler. STM32, ESP32, ARM Cortex-M, RISC-V — every platform we have tested works today.
The C-HAL runs the same ΔC/Δt decision logic in software. Yes — the CPU still wakes up to run it. But that is not the point. The point is what happens next: if the data has not changed, the C-HAL returns immediately and your heavy downstream workload never runs — no FFT, no inference model, no wireless TX, no sensor fusion pipeline. The CPU wakes, checks, and goes straight back to sleep. Your expensive compute never fires.
This gives you measurable downstream energy savings today, while you prove the math on your own data. When you move to the hardware RTL, the CPU never wakes at all — that is the final step. The API is identical between both versions: swap the backend, keep every line of your application code.
Software version: CPU wakes, heavy work skipped. Hardware version: CPU never wakes. Same ΔC/Δt logic. Same API. Two levels of savings.
If at any point you want to remove the RPU from your system, simply disconnect the wake_en line. Worst case: remove the RPU, system reverts to conventional polling. Zero difference.
rpu_core.sv, simulation testbench, and post-synthesis testbench with SDF annotation. Drop the RTL into your source directory. In Vivado: Add Sources → Add Files. In Quartus: Project → Add/Remove Files in Project. In Synopsys/Cadence: add to your filelist. Set DEPTH (default 32) and DATA_WIDTH (default 12-bit) as parameters — no RTL modification required.
in_data[11:0] and in_valid from your existing sensor or ADC output. These are read-only taps — no changes to your existing signal routing, no bus modifications, no interface redesign.
wake_en to your processor interrupt pin. RISC-V: irq_external_i. ARM Cortex-M: any NVIC line. Custom: any level-triggered interrupt input. No firmware changes. The CPU sees a standard external interrupt. Done.
last_delta, active_threshold, alert_status even when main clock is gated. Essential for watchdog compliance in defense and safety-critical applications.We trained a dedicated AI assistant on the full RPU technical corpus — the patent, the technical paper, the ASIC PPA reports, the RISC-V benchmark data, and the integration guide. Ask it anything about architecture, parameters, power analysis, RISC-V or ARM compatibility, or sector-specific deployment. It answers in plain English with direct references to the documents.
The assistant below handles common questions directly on this page. For deeper technical discussions, extended code reviews, or design trade-off analysis, open the full assistant in ChatGPT using the link at the bottom of this section.
Every claim we make on this site is backed by a real document you can download and verify on your own. The ASIC PPA report comes directly from Cadence Genus synthesis. The RISC-V benchmark comes from lowRISC Ibex integration verified with Verilator. Our technical paper contains the full treatment of the architecture — download and verify every claim yourself. The patent search report is the official TÜRKPATENT novelty determination.
You can download each document individually below, or take the complete package as a single ZIP. We recommend starting with the executive summary if you are a decision maker, and the ASIC PPA report if you are an architect or engineer.
When you begin evaluation, you receive the complete technical package. This is not a demo version or a limited preview. It is the full production-grade IP that we use for our own ASIC work. What you see below is exactly what arrives in your inbox.
Within 48 hours of your request, we send the complete evaluation package via secure file transfer — typically the same day.
The evaluation runs for 30 days at no cost. During this period, our engineering team at RPU Microelectronics is available for direct technical support by email and video call. At the end of 30 days, you choose: license the RPU, extend the evaluation, or simply close the evaluation with no further obligation.
There are several established techniques for reducing processor energy consumption: DVFS, clock gating, interrupt comparators, and various research architectures. We respect all of them, and the RPU is not trying to replace them universally. But each one has a dependency the RPU does not have.
DVFS requires the operating system to participate in voltage and frequency decisions. Clock gating requires external control logic to decide when to gate. Interrupt comparators require the CPU to service the interrupt. Every existing approach puts the processor in the loop to manage its own power, and every interaction costs cycles and energy.
The RPU removes the processor from the decision entirely. The decision happens in hardware, at the gate level, before the CPU is ever aware that data has arrived. This is the only way to guarantee 2-cycle response, zero software overhead, and complete determinism.
| Feature | DVFS | Clk Gate | Wang'24 | HP Mem. | RPU |
|---|---|---|---|---|---|
| Origin | OS | External | FPGA | Passive | Cell ✓ |
| ΔC/Δt | — | — | ALU | — | O(1) ✓ |
| Isolation | No | Clk | No | No | Clk+Pwr ✓ |
| Latency | ms | Ext | µs | — | 1 clk ✓ |
| CPU req. | Yes | Part | Yes | — | Zero ✓ |
| CMOS | Yes | Yes | Mem | Mem | TSMC ✓ |
| Adaptive | No | No | FPGA | No | HW ✓ |
| Fail-safe | No | No | No | No | Yes ✓ |
| Gate cost | OS+PMU | Ext.ctrl | FPGA+ALU | Passive | 2,960 ✓ |
The RPU is not a research concept or an early prototype. It is a hardware IP block with international patent protection, silicon-proven implementations on two independent process nodes, a comprehensive technical paper, and a formal listing on Design & Reuse. You are not evaluating an idea. You are evaluating a deployable product with legal, technical, and commercial backing.
The RPU evaluation package is comprehensive. Here is where to start based on what you need to figure out first.
Watch the 3-minute FPGA demo to see the hardware behavior. Then read the ASIC PPA report for timing closure, area breakdown, and power decomposition at TSMC 65nm and SKY130.
Read the RISC-V benchmark to see the Ibex integration flow. Then clone the RTL from GitHub — run the testbench yourself. For C-HAL driver, request directly from us.
Read the one-page executive summary — architecture, measured results, integration path, and licensing options. Share it internally, then schedule a 30-minute call with our founder.
If you are not sure which path fits you, start with the executive summary — it gives you the complete picture in about two minutes of reading.
We believe the RPU will outperform your current implementation on your own architecture, with your own sensor data, on your own benchmarks. If we are right, we license. If we are wrong, you owe nothing.
The RTL is on GitHub — clone it, run the testbench, see 99.998% yourself. For C-HAL driver and commercial licensing, reach out. No fee. No obligation. If we are wrong — we don't invoice.