深入探索和使用 CAPIO 的硬件结构：小结与理解(3.1)

最编程 2024-01-19 10:19:38

...

整个CAPI中，用户侧（FPGA侧）由PSL和AFU构成，CPU侧则是CAPP模块。

3.1.1 CAPP

CAPP的全称是Coherent Accelerator Processor Proxy (CAPP)，在多核power CPU中，CAPP扩展了加速器的一致性，在CAPP上的目录代表加速器提供了一致性回应。整个一致性协议泡在PCE 物理链路上，PCIE介于PSL和CAPP之间。The Coherent Attached Processor Proxy (CAPP) in the multi-core POWER8 processor extends coherency to the attached accelerator. A directory on the CAPP provides coherency responses on behalf of the accelerator.Coherency protocol is tunneled over standard PCI Express links between the CAPP unit on the processor and the POWER service layer (PSL) on the accelerator card

具有如下特征

作为FPGA加速的代理，嵌入到处理器中，表驱动型协议(?)，为加速器屏蔽cache目录，约1MB的cache tags标签（基于cache line[dream2] ）

Proxy for FPGA Accelerator on PowerBus

Integrated into Processor

Programmable (Table Driven) Protocol for CAPI

Shadow Cache Directory for Accelerator

Up to 1MB Cache Tags (Line based)

Larger block based Cache

3.1.2 PSL

PSL的含义是POWER Service Layer (PSL)，作为AFU通向CPU的桥梁，它为AFU提供了兼容POWER架构地址翻译和系统memory cache（先期为256KB）。相对于I/O加速，它由许多优势，包括共享内存，无需钉住数据用于内存DMA，对于cached 数据的低延迟，以及更容易自然的编程模式。不需要数据它实现于FPGA中，为加密的IP核，原来实现于ALTERA，后实现于Xilinx中[dream3] 。

如下图PSL的大致功能如下，PSL含有图中几个功能，cache、MMU、ISL，PSL还含有内存保护表。

TLB：地址转换后备缓冲器(Translation Lookaside Buffer, TLB)用于缓存虚拟地址（或者有效地址）到物理地址（或者实际地址）的cache，有些地方用CAM实现。可以加速地址转换

https://en.wikipedia.org/wiki/Translation_lookaside_buffer

内存访问加速方法：http://blog.****.net/ctthuangcheng/article/details/8550450

a. 使用了MMU（Memory Management Unit，内存管理单元）。

b. 地址转换中出现最频繁的那些地址，保存到地址转换后备缓冲器(Translation Lookaside Buffer, TLB)的高速缓存中。这些地址无需访问页表即可从高速缓存中直接获得地址数据。

A CAIA-compliant processor includes a POWER service layer (PSL). The PSL is the bridge to the system for the AFU, and provides address translation and system memory cache.

This interface provides the basis for all communication between the accelerator and the POWER8 system. The PSL provides address translation that is compatible with the Power Architecture® for the accelerator and provides a cache for the data being used by the accelerator. This provides many advantages over a standard I/O model, including shared memory, no pinning[dream4] of data in memory for DMA, lower latency for cached data, and an easier, more natural programming .model Effective addresses from an AFU are translated to a physical address in system memory by the PSL. The PSL also provides miscellaneous management for the

Implemented in FPGA Technology

Provides Address Translation for Accelerator

Compatible with POWER Architecture

Provides Cache for Accelerator

First Implementation – 256KB

Facilities for downloading Accelerator Functions

3.1.3 AFU

AFU 全称为Accelerator Function Unit，用户定制的加速功能则在此实现，它通过PSL提供的接口与application通信。这个接口为PSL Accelerator Interfaces

AFU中含有AFU描述符，由一组寄存器组成，这些寄存器反映了AFU的能力，以及是software必须的知道的信息，它提供了一种机制，使得将问题状体区域（problem state area）与AFU相关的系统进程关联起来。（Yxr注：对于PCIE介质，AFU中可能还存有PCIE configuration 寄存器组。）

当应用程序需要使用使用AFU的时候

AFU Descriptor Overview，The AFU descriptor is a set of registers within the problem state area that contains information about the capabilities of the AFU that is required by system software.The AFU descriptor also contains a standard format for reporting errors to system software. All AFUs must implement an AFU descriptor.

When an application requests use of an AFU, a process element is added to the process-element linked list that describes the application’s process state. The process element also contains a work element descriptor (WED) provided by the application. The WED can contain the full description of the job to be performed or a pointer to other main memory structures in the application’s memory space. Several programming models are described providing for an AFU to be used by any application or for an AFU to be dedicated to a single application.See Section 3 Programming Models on page 25 for details.

上一篇： MMC子系统主机实例详解（以SDHCI-MSM为例）

下一篇：如何使用代码枚举USB设备？