

## **Overview**

C920 is a RISC-V compatible 64-bit high performance processor with vector computing ability, developed by T-Head Semiconductor Co., Ltd. It delivers industry-leading performance in control flow, computing and frequency through architecture and micro-architecture innovations. The C920 processor is based on the RV64GCV instruction set and implements the XIE (XuanTie Instruction Extension) technology. C920 adopts a state of the art 12-stage out-of-order multiple issue superscalar pipeline with high frequency, IPC, and power efficiency, with a 128-bit vector unit implementing the RISC-V V Extension 0.7.1. C920 supports hardware cache coherency. Each cluster contains 1-4 cores. The C920 supports the AXI4 bus interface and includes a device coherence port. The C920 uses the Sv39 virtual address system with XMAE (XuanTie Memory Attributes Extension) technology. In addition, C920 includes the standard CLINT and PLIC interrupt controllers and supports RV-compatible debug interface and performance monitors.



## **Features**

| Feature               | Description                                                                                                  |
|-----------------------|--------------------------------------------------------------------------------------------------------------|
| Architecture          | RV64GCV                                                                                                      |
| SMP                   | Up to 4 cores in each cluster                                                                                |
| Micro-architecture    | Out of order, 3 decode, 4 rename/dispatch, 8 issue/execute, dual load/store                                  |
| Pipeline              | 12 stages (Integer)                                                                                          |
| Floating-Point Unit   | Support RISC-V F and D instruction extensions<br>Support IEEE 754-2008 standard                              |
| Vector Unit           | RISC-V V extension<br>Vector register width 128-bit, element size: FP16/FP32/<br>FP64/INT8/INT16/INT32/INT64 |
| Bus Interface         | AXI4-128 master                                                                                              |
| Device Coherence Port | AXI4-128 slave (Optional)                                                                                    |
| L1 Instruction Cache  | Up to 64KB with optional parity                                                                              |
| L1 Data Cache         | Up to 64KB with optional ECC                                                                                 |

| Feature                         | Description                                                                                                          |
|---------------------------------|----------------------------------------------------------------------------------------------------------------------|
| L2 Cache                        | Up to 8MB with optional ECC<br>Supporting parallel access with multi-bank                                            |
| XuanTie Extensions              | XuanTie Instruction Extension (XIE)<br>XuanTie Memory Attributes Extension (XMAE)                                    |
| Memory Management<br>Unit (MMU) | Sv39 virtual memory translation<br>Up to 2048-entry TLB                                                              |
| PMP                             | Up to 16 regions                                                                                                     |
| Interrupt Controller            | Flexibly configurable Platform-Level Interrupt Controller (PLIC) for supporting wide range of system event scenarios |

# **XuanTie C920 Components**

#### Processor Overview



#### Core

- Integer pipeline of 12 stages
- Out of order execution based on register renaming
- 3 decode, 4 rename/dispatch, 8 issue/execute, dual load/store
- Various branch prediction resources (Branch History Table, Branch Target Buffer, Return Address Stack etc.) for high performance

#### Multi-Core

- Support 2-4 core homogeneous multi-core system
- MOESI coherency protocol
- 2-way centralized snoop buffer
- Exclusive memory access instructions
- Integrates multi-core interrupt controllers, timers, and debug modules

### Floating Point Unit (FPU)

- ♦ RISC-V F and D extensions
- Support half/single/double precision
- Does not generate floating-point exceptions
- User configurable rounding modes

#### Vector Unit

- ♦ RISC-V V Extension (Version 0.7.1)
- ♦ Support INT8/INT16/INT32/INT64/FP16/FP32/FP64-bit element size
- Vector chaining technology to enhance computing throughput
- Segment load/store supported
- D-Cache data path width up to 128 bits
- Unaligned memory access acceleration

#### Memory sub-system

The L1 caches in C920 can be configured as 32KB/64KB. The four cores in a cluster share an L2 cache with a configurable size of 256KB~8MB. Data coherency is maintained by hardware among all the L1 and L2 caches. Furthermore, data coherency between TLB, I-Cache and D-Cache is maintained by software and hardware collaboration. The L1 and L2 caches support ECC/parity.

- ♦ The L1 instruction memory system has the following key features:
  - · VIPT, two-way set-associative instruction cache
  - · Optional parity protection
  - · Fixed cache line length of 64 bytes
  - 128-bit read interface from the L2 memory system
- The L1 data memory system has the following features:
  - PIPT, two-way set associative L1 data cache
  - · Fixed cache line length of 64 bytes
  - Optional ECC protection
  - 128-bit read interface from the L2 memory system
  - Up to 128-bit read data paths from the data L1 memory system to the data path
  - Up to 128-bit write data path from the data path to the L1 memory system
- The L2 Cache has the following features:
  - Configurable size of 256KB, 512KB, 1MB, 2MB, 4MB, or 8MB
  - PIPT, 16-way set-associative structure
  - · Fixed line length of 64 bytes
  - · Optional ECC protection
  - · Support data prefetch

### Memory Management Unit (MMU)

- Sv39 virtual memory systems supported
- 32/17-entry fully associative I-uTLB/D-uTLB
- 2048-entry 4-way set-associative shared TLB
- Hardware page table walker
- Virtual memory support for full address space and easy hardware for fast address translation
- ♦ Code/data sharing
- Support for full-featured OS such as Linux
- XMAE (XuanTie Memory Attributes Extension) technology extends page table entries for additional attributes

## Physical Memory Protection (PMP)

16 regions basic read/write/execute memory protection with low cost

## Platform-Level Interrupt Controller (PLIC)

- Support multi-core interrupt control
- Up to 1023 PLIC interrupt sources
- Up to 32 PLIC interrupt priority levels
- Up to 8 PLIC interrupt targets
- Selectable edge trigger or level trigger

### JTAG Debug

- Support multi-core debug
- JTAG debug interface support several triggers
- Support software breakpoints
- Check and modify CPU register resource
- Single step or multi step flexibly supported
- High speed program download through JTAG

#### Low Power

- WFI instruction puts a core into low power mode
- Sub-module clocks are gated automatically when they are idle
- Per-core power down
- Cluster power down

## Security

C920 supports the following security features:

- Secure boot
- Isolation between TEE and REE
- Isolation between TA (Trusted Application) and TEE
- Isolation between TA and TA

#### XuanTie Extensions

In addition to the standard RV64GCV ISA, C920 has also implemented the XIE (XuanTie Instruction Extension). The XIE consists of extended instructions optimized for load/store, arithmetic, bitwise and cache/TLB operations. When enabled, these instructions improve the performance significantly. For example, the extended arithmetic instructions can achieve 40% better Coremark result.

## RV Compatibility

| Component            | <b>RV</b> version |
|----------------------|-------------------|
| ISA                  | RV64GCV           |
| Vector               | 0.7.1             |
| Privilege            | 1.10              |
| MMU                  | Sv39              |
| Interrupt controller | CLINT/PLIC        |

#### Interfaces

- ♦ Master AXI (M-AXI)
- ♦ DCP (S-AXI)
- ♦ Debug (JTAG)
- ♦ Interrupts
- ♦ Low power control

## **PPA**

| Performance | 5.8 DMIPS/MHz (O2)<br>7.0 Coremark/MHz (O3)       |
|-------------|---------------------------------------------------|
| Frequency   | 2.0 <sup>1</sup> ~ 2.5 <sup>2</sup> GHz (Typical) |
| Area        | 1.137 (MP2) / 0.398 (Core)                        |
| Power       | ~ 200 uW/MHz per core                             |

<sup>1.</sup>TSMC 12nm, std lvt, mem ultv, 6T Turbo lib, 0.8v;

Dynamic power@tt85c, Frequency@tt85c;

Configuration: MP2 32K L1\$, 256K L2\$, FP, No Vector, full ECC.

# **Configurations**

| Config          | Options                    |
|-----------------|----------------------------|
| Core Number     | 1-4                        |
| L1 D-Cache Size | 32K, 64K                   |
| L1 I-Cache Size | 32K, 64K                   |
| L2-Cache Size   | 256K, 512K, 1M, 2M, 4M, 8M |
| Vector Unit     | Present                    |
| DCP             | Present or not             |

# **Software Ecosystems**

| Application<br>Scenarios | Linux Distributions: Yocto                                 |
|--------------------------|------------------------------------------------------------|
| Libraries                | OpenBlas, OpenCV, OpenGL, OpenCL, OpenVG, OpenSSL          |
| OS Kernel                | Linux RT-Thread                                            |
| Development<br>Languages | C C++ Python                                               |
| Debug Tools              | Open On-Chip Debugger  SEGGER  LAUTERBACH  CSkyDebugServer |
| Development<br>Tools     | SYSTEMS QEMU                                               |

<sup>2.</sup>TSMC 12nm, std 30% ulvt, mem ultv, 6T Turbo lib, 1.0v;