Endpoint devices for Internet-of-Things not only want to work beneath extraordinarily tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from some kOPS to GOPS. Near-threshold (NT) operation will achieve higher energy efficiency, and also the performance scalability will be gained through parallelism. In this paper, we describe the look of an open-source RISC-V processor core specifically designed for NT operation in tightly coupled multicore clusters. We introduce instruction extensions and microarchitectural optimizations to increase the computational density and to attenuate the pressure toward the shared-memory hierarchy. For typical data-intensive sensor processing workloads, the proposed core is, on average, 3.five× faster and 3.two× more energy efficient, because of a good L0 buffer to reduce cache access contentions and support for compressed instructions. Single Instruction Multiple Data extensions, like dot products, and a built-in L0 storage additional cut back the shared-memory accesses by eight× reducing contentions by three.two×. With four NT-optimized cores, the cluster is operational from 0.half dozen to one.2 V, achieving a peak efficiency of 67 MOPS/mW in a low-value sixty five-nm bulk CMOS technology. In a very low-power twenty eight-nm FD-SOI process, a peak potency of 193 MOPS/mW (40 MHz and one mW) can be achieved.

