Floating-Point設計編碼風格與技巧 - 全文

盡管通常Fixed-Point（定點）比Floating-Point（浮點）算法的FPGA實現要更快，且面積更高效，但往往有時也需要Floating-Point來實現。這是因為Fixed-Point有限的數據動態范圍，需要深入的分析來決定整個設計中間數據位寬變化的pattern，為了達到優化的QoR，并且要引入很多不同類型的Fixed-Point中間變量。而Floating-Point具有更大的數據動態范圍，從而在很多算法中只需要一種數據類型的優勢。

Xilinx Vivado HLS工具支持C/C++ IEEE-54標準單精度及雙精度浮點數據類型，可以比較容易，快速地將C/C++ Floating-Point算法轉成RTL代碼。與此同時，為了達到用戶期望的FPGA資源與性能, 當使用Vivado HLS directives時需要注意C/C++編碼風格與技巧相結合。

編碼風格

1.1 單雙精度浮點數學函數

#include
float example(float var)
{
return log(var); // 雙精度自然對數
}

在C設計中, 這個例子, Vviado HLS 生成的RTL實現將輸入轉換成雙精度浮點,并基于雙精度浮點計算自然對數,然后將雙精度浮點輸出轉換成單精度浮點.

#include
float example(float var)
{
return logf(var); // 單精度自然對數
}
在C設計中, logf才是單精度自然對數, 這個例子 Vviado HLS 生成的RTL實現將基于單精度浮點計算自然對數, 而且沒有輸入輸出單雙精度的互轉。

1.2 浮點運算優化

我們先來看一個例子，三個從代數上看起來差不多的寫法，但其在Vivado HLS中綜合出來的是三個完全不一樣的結果。

void example(float *m0, float *m1, float *m2, float var)
{
*m0 = 0.2 * var; // 雙精度浮點乘法，單雙精度類型轉換
*m1 = 0.2f * var; // 單精度浮點乘法
*m2 = var / 20.0f; // 單精度浮點除法
}

Vivado HLS將日m0, m1, m2綜合成不同的RTL實現。
因為0.2是一個不能精確表征的雙精度數字，所以m0運算會被Vivado HLS綜合成一個雙精度浮點乘法，并且將var 轉換成雙精度，然后將雙精度乘法輸出m0轉換成單精度。
特別注意，如果希望Vivado HLS綜合出單精度常熟，需要在常數后面加f, 如0.2f。這樣m1綜合成一個單精度乘法的輸出。同理，m2將被Vivado HLS綜合成單精度除法的輸出。

我們來看另外一個例子。

void example(float *m0, float *m1, float var)
{
*m0 = 0.2f * 5.0f * var; // *m0 = var;常數乘法被優化掉
*m1 = 0.2f * var * 5.0f; // 兩個雙精度浮點乘法
}

再來看另一個例子。

void example(float *m0, float *m1, float var)
{
*m0 = 0.5 * var; //
*m1 = var/2; //
}
m0運算會被Vivado HLS綜合成一個雙精度浮點乘法，并且將var 轉換成雙精度，然后將雙精度乘法輸出m0轉換成單精度。

m1運算會被Vivado HLS綜合成簡單的右移運算。所以如果用戶希望實現對var除以2，就寫成m1這種表達式，而不是m0的表達式。

并行度與資源復用
由于浮點運算相比整型，定點運算耗用更可觀的資源。Vivado HLS會盡量用更有效的資源來實現浮點運算，當數據的相關性及約束許可的情況下，在Vivado HLS中，會盡量復用一些浮點運算單元。為了說明這個，我們看一個簡單的四個浮點加法例子， Vivado HLS復用一個浮點加法器來串行實現四個浮點加法。
void example(float *r, float a, float b,
float c, float d)
{
*r = a + b + c + d;
}

有時設計需要更高的throughput及更低的latency。這時就需要提高設計的并行度。以下面例子來說明，在Vivado HLS就需要對for循環loop加pipeline與unroll 的directives。同時需要通過設置a,b,r0 為FIFO, 并對其重排以提高I/O帶寬兩倍。這樣Vivado HLS就會綜合出兩個浮點加法來并行實現，這是因為每個加法器計算是完全獨立的。

void example(float r0[32], float a[32], float b[32])
{
#pragma HLS interface ap_fifo port=a,b,r0
#pragma HLS array_reshape cyclic factor=2 variable=a,b,r0
for (int i = 0; i < 32; i++)
{
#pragma HLS pipeline
#pragma HLS unroll factor=2
r0[i] = a[i] + b[i];
}
}

然而，如果更多復雜的運算，或許會導致不獨立的浮點運算，在這種情況下，Vivado HLS不能重新排列這些運算的順序，這樣會導致更低的，不是所期望的復用。下面舉例來說明如何提高帶有反饋浮點運算的性能。

這個例子的累加會導致recurrence，并且通常浮點加法的latency大于一個時鐘周期，加的pipeline directive并不能達到一個時鐘周期完成一次累加的throughput。

float example(float x[32])
{
#pragma HLS interface ap_fifo port=x
float acc = 0;
for (int i = 0; i < 32; i++)
{
#pragma HLS pipeline
acc += x[i];
}
return acc;
}

為了對上面例子并行展開，可以對代碼如下做較小的改動，也就是拆成先部分累加，再最后累加，當然也需要對輸入數據進行簡單的重新排列，以獲得相應的I/O帶寬，從而達到期望的并行度。

float top(float x[32])
{
#pragma HLS interface ap_fifo port=x
float acc_part[4] = {0.0f, 0.0f, 0.0f, 0.0f};
for (int i = 0; i < 32; i += 4) { // 手動unroll by 4
for (int j = 0; j < 4; j++) { // 部分累加
#pragma HLS pipeline
acc_part[j] += x[i + j];
}
for (int i = 1; i < 4; i++) { //最后累加
#pragma HLS unroll
acc_part[0] += acc_part[i];
}
return acc_part[0];
}

閱讀全文

上一頁 1 2全文

FPGA(591969) FPGA(591969)
浮點(13257) 浮點(13257)

'Floating point exception (core dumped)'

本帖最后由繎77 于 2013-9-5 16:40 編輯用vi編輯器寫的c程序，執行時出現‘Floating point exception (core dumped

2013-09-04 19:05:40

604-00030

IC FLOATING-POINT COPROC V2 8DIP

2023-04-06 11:12:40

編碼風格與編碼指

本帖最后由 mr.pengyongche 于 2013-4-30 02:56 編輯編碼風格與編碼指

2012-08-17 09:34:06

F28377s的幾個概念不清楚

各位專家好：最近調用F28377s庫的fft，發現幾個概念，有點不清楚，請各位專家幫忙解答一下IEEE 754 Single-Precision Floating-Point Unit (FPU

2018-09-29 15:02:43

FLOPS和TOPS的區別在哪

FLOPS，即每秒浮點運算次數, 是每秒所執行的浮點運算次數（Floating-point operations per second；縮寫：FLOPS）的簡稱，被用來評估電腦效能.FLOPs：注意

2021-07-29 06:48:14

Gowin HDL編碼風格要求及編碼實現

本手冊主要描述高云?HDL 編碼風格要求及原語的 HDL 編碼實現，旨在幫助用戶快速熟悉高云 HDL 編碼風格和原語實現，指導用戶設計，提高設計效率。

2022-09-29 06:23:57

HDL編碼風格與編碼指

本帖最后由 mr.pengyongche 于 2013-4-30 02:58 編輯 HDL編碼風格與編碼指

2012-08-12 12:09:12

LM4F231H5QR QEI功能ph1A口不能讀到脈沖？phA1接入10-100KHZ的頻率

to this application and would work // correctly and use the hardware floating-point unit.Finally, lazy // stacking

2020-04-08 09:26:09

Labview中關于niUSRP Write Tx Data (poly) VI的分析

CDB ClusterWritesa cluster of complex, double-precision floating-point data

2017-11-20 21:24:21

Linux內核編碼風格(編程代碼風格推薦)

現編碼素質的重要性。相反沒有良好的風格的代碼讀起來難看、晦澀，甚至有時候一個括號沒對齊就能造成對程序的曲解或者不理解。我曾經就遇見過這樣的情況，花費了很多不必要的時間在程序的上下文對照上，還debug了

2020-08-24 09:45:16

SM320C6701GJCA12EP

IC DSP FLOATING-POINT 352-FC/CSP

2023-04-06 11:21:58

SM320C6713B-EP

Floating-Point Digital Signal Processor . datasheet (Rev. K)

2022-11-04 17:22:44

SM320C6713GDPS20EP

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:23:13

SM320F28335PTPS高溫數字信號控制器訂貨

32-Bit CPU· IEEE-754 Single-Precision Floating-Point Unit (FPU))· 16 × 16 and 32 × 32 MAC Operations

2018-10-17 10:37:38

SM320VC33PGEA120EP

IC DSP FLOATING-POINT 144-LQFP

2023-04-06 11:21:41

SM32C6711DGDPI20EP

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:21:54

SM32C6712DGDPA16EP

IC DSP FLOATING-POINT 272-BGA

2023-04-06 15:25:29

SM32C6713BGDPA20EP

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:21:49

SM32C6713BGDPS20EP

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:22:38

SMJ320C31GFAM40的中英文資料

(DSPs) are 32-bit, floating-point processors manufactured in 0.6-μm triple-level-metal CMOS technology.

2018-08-10 10:05:22

SMJ320C31GFAM50原裝現貨

SMJ320C31GFAM50 110只 18+ 原裝現貨 xzMANUFACTURERTexas InstrumentsPRODUCT CATEGORYDSPDESCRIPTIONDSP Floating-Point 32bit 50MHz 25MIPS 141-Pin CPGA Tray

2018-11-11 15:09:42

SMV320C6727B-SP

Floating-Point Digital Signal Processor. datasheet (Rev. G)

2022-11-04 17:22:44

TMS320C6711BGFN100

IC FLOATING-POINT DSP 256-BGA

2023-04-06 11:22:58

TMS320C6711BGFN150

IC FLOATING-POINT DSP 256-BGA

2023-04-06 11:22:57

TMS320C6711DGDP250

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:21:53

TMS320C6711DZDP200

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:21:17

TMS320C6713BPYP200

IC FLOATING-POINT DSP 208-HLQFP

2023-04-06 11:21:09

TMS320C6713BZDP225

IC FLOATING-POINT DSP 272-BGA

2023-04-06 11:21:11

TMS320C6713BZDP300

IC DSP FLOATING-POINT 272-BGA

2023-04-06 11:21:33

TMS320C6722RFP200

IC FLOATING-POINT DSP 144HTQFP

2023-04-06 11:23:05

TMS320C6722RFP250

IC FLOATING-POINT DSP 144-HTQFP

2023-04-06 11:23:10

TMS320C6726RFP250

IC FLOATING-POINT DSP 144-HTQFP

2023-04-06 11:23:12

TMS320C6727GDH300

IC FLOATING-POINT DSP 256-BGA

2023-04-06 11:23:05

TMS320C6727ZDH300

IC FLOATING-POINT DSP 256-BGA

2023-04-06 11:23:05

TMSDC6722BRFPA225

IC FLOATING-POINT DSP 144-TQFP

2023-04-06 11:21:59

TMSDC6726BRFPA225

IC FLOATING-POINT DSP 144-HTQFP

2023-04-06 11:21:11

TMSDC6727BGDHA250

IC FLOATING-POINT DSP 256-BGA

2023-04-06 11:21:54

TMSDC6727BZDHA250

IC FLOATING-POINT DSP 256-BGA

2023-04-06 11:21:39

TMX320C6745APTP3

IC DSP FLOATING-POINT 176-HLQFP

2023-04-06 11:23:25

TMX320C6745BPTP3

IC DSP FLOATING-POINT 176HLQFP

2023-04-06 11:23:36

TMX320C6747BZKB3

IC DSP FLOATING-POINT 256BGA

2023-04-06 11:23:30

TMX320C6747DZKBA3

IC DSP FLOATING-POINT 256-BGA

2023-04-06 11:23:47

TMX320C6747ZKB3

IC DSP FLOATING-POINT 256-BGA

2023-04-06 11:23:26

c6670 PSC 寄存器地址

September 2014。TMS320C6670 Multicore Fixed and Floating-Point System-on-Chip （Literature Number: SPRS689D

2018-06-21 00:30:50

isa - Determine whether input is object of given class matlab

or floating-point arrayintegerSigned or unsigned integer arrayint88-bit signed integer

2012-04-20 09:15:35

單精度二進制數表示

[table][tr][td] 題目：Show the IEEE754 binary representation for the floating-point number (10.5)10

2018-07-03 03:20:11

單精度二進制數表示

[table][tr][td] 題目：Show the IEEE754 binary representation for the floating-point number (10.5)10

2018-07-09 08:59:05

基于超低功耗微控制器的 QVGA 3D 圖形

software drivers and math libraries available in MSP430ware?. Using the floating-point and fixed-point

2015-03-10 15:18:31

如何啟用/禁用M4浮點單元

在啟動CPU之后，M4內部浮點單元將被默認禁用。有任何CY系統調用來啟用或禁用FPU嗎？鮑勃以上來自于百度翻譯以下為原文The M4 internal floating-point unit

2018-12-24 16:23:22

德州儀器推出基于ARM-Cortex?-M4F 的32位超低功耗單片機MSP432

, featuring DSP extensions and an integrated floating-point engine. It is the most power-efficient processor available today.

2018-06-21 07:47:08

無法為單個數據類型找到Assembler浮點庫IEEE 754

無法為單個數據類型找到Assembler浮點庫IEEE 754。想要任何幫助或鏈接。 #floating-point #assembler #library以上來自于谷歌翻譯以下為原文 Can

2018-12-06 16:16:02

浮動舍入Cosmic編譯器在float.h中更改FLT_DIG但沒有任何改變

編譯器，我試圖在float.h中更改FLT_DIG但沒有任何改變。如果有人有線索請幫助我。提前謝謝了 #floating-point

2019-01-04 15:45:21

請問F28335中斷時FPU寄存器的情況是什么樣的？

datasheet里面沒有提到FPU寄存器會自動保存，但是有一個地方這么寫的All of the floating-point registers except the repeat block

2018-10-29 10:39:47

請問為何TM4C123 launchpad 一運行浮點計算就死機？

)[ // // Enable the floating-point unit. This must be done here to handle the // case where main() uses

2020-08-26 15:11:40

TMS320C6712D pdf datasheet (浮點

) are members of the floating-point DSP family in the TMS320C6000. DSP platform

2008-08-07 21:26:09

TMS320C6713B pdf datasheet

The TMS320C67x™ DSPs (including the TMS320C6713B device) compose the floating-point DSP

2008-08-07 21:50:00

TMS320C6711D pdf datasheet

devices) compose the floating-point DSP family in the TMS320C6000™

2008-08-07 22:07:20

TMS320C6701 pdf datasheet

The TMS320C67x DSPs are the floating-point DSP family in the TMS320C6000™ DSP platform.

2008-08-07 22:08:24

TI處理器TMS320VC33

High-Performance Floating-Point Digital Signal Processor (DSP): TMS320VC33-150 13-ns Inst

2022-12-14 14:41:26

TI處理器SMJ320VC33

Operating temperature range (C) -55 to 125 High-Performance Floating-Point

2022-12-14 14:41:41

A Tiny RISC-V Floating-Point Unit 1

RISC-V

RISCV國際基金會發布于 2022-09-05 10:07:53

A Tiny RISC-V Floating-Point Unit 2

RISC-V

RISCV國際基金會發布于 2022-09-05 10:08:45

Stellaris IQmath Library USER’

and highprecisionmathematical functions for C/C++ programmers to seamlessly port a floating-point algorithminto fixed-point code on Stella

2010-11-13 22:28:06

TMS320C6727-300,pdf(Floating-P

of high-performance 32-/64-bit floating-point digital signal processors. The TMS320C672x

2010-12-06 01:48:37

TMS320C6722B-225,pdf(Floating-Point DSPs)

TMS320C672

2010-12-06 01:59:29

TMS320C6713B-300,pdf(Floating-

The TMS320C67x™ DSPs (including the TMS320C6713B device) compose the floating-point DSP generati

2010-12-06 02:05:27

TMS320C6713B-225,pdf(Floating-

The TMS320C67x™ DSPs (including the TMS320C6713B device) compose the floating-point DSP generati

2010-12-06 02:07:27

TMS320C6713B-200,pdf(Floating-

The TMS320C67x™ DSPs (including the TMS320C6713B device) compose the floating-point DSP generati

2010-12-06 02:10:47

TMS320C6713B-167,pdf(Floating-

The TMS320C67x™ DSPs (including the TMS320C6713B device) compose the floating-point DSP generati

2010-12-06 02:12:34

TMS320C6713, TMS320C6713B DSPs

The TMS320C67x™ DSPs (including the TMS320C6713B device) compose the floating-point DSP generati

2010-12-06 02:17:14

TMS320C6701-167,pdf(Floating-P

The TMS320C67x DSPs are the floating-point DSP family in the TMS320C6000™ DSP platform.

2010-12-07 21:34:52

TMS320C6701-150,pdf(Floating-P

The TMS320C67x DSPs are the floating-point DSP family in the TMS320C6000™ DSP platform.

2010-12-07 21:37:25

TMS320C6701 Silicon Errata

The TMS320C67x DSPs are the floating-point DSP family in the TMS320C6000™ DSP platform.

2010-12-07 21:38:46

TMS320C6748 Fixed/Floating-Poi

for the TMS320C6748Fixed/Floating-Point DSP . For more detailed information, see the TMS320C6748 Fixed/Floating-PointDSP data manual (lit

2010-12-07 21:43:56

TMS320C6746 Fixed/Floating-Poi

for the . For more detailedinformation, see the TMS320C6746 Fixed/Floating-Point DSP data manual (literature number SPRS591).

2010-12-07 21:59:34

TMS320C6743 Fixed/Floating-Poi

for the TMS320C6743Fixed/Floating-point Digital Signal Processor . For more detailed information, see the TMS320C6743Fixed/Floating-point

2010-12-07 22:07:01

TMS320C6742 Fixed/Floating-Poi

for the . For more detailedinformation, see the TMS320C6742 Fixed/Floating-Point DSP data manual (literature number SPRS587).

2010-12-07 22:10:28

AM1806 Silicon Errata Silicon Revision 2.0

for the TMS320C6748 Fixed/Floating-Point DSP . For more detailed information, see the TMS320C6748 Fixed/Floating-Point DSP d

2010-12-12 23:47:04

愛特梅爾推出全新浮點單元(Floating Point Un

愛特梅爾推出全新浮點單元(Floating Point Unit)技術　　愛特梅爾公司(Atmel Corporation)宣布推出全新浮點單元(Floating Point Unit)技術，用于愛特梅爾32位AVR UC3產品系列。此新技術可使

2010-04-21 17:05:03

607

WP409利用Xilinx FPGA打造出高端比特精度和周期精度浮點DSP算法實現方案

WP409利用Xilinx FPGA打造出高端比特精度和周期精度浮點DSP算法實現方案： High-Level Implementation of Bit- and Cycle-Accurate Floating-Point DSP Algorithms with Xilinx FPGAs

2012-01-26 18:03:05

Kinetis K22F 128KB Flash

USB connectivity and processing efficiency with a floating-point unit.

2015-10-28 15:16:36

Comparing_Fixed-and_Floating-Point_DSPs

Comparing Fixed- and Floating-Point DSP。

2016-01-19 14:12:10

Floating_Point

Floating Point，好東西，喜歡的朋友可以下載來學習。

2016-02-22 15:04:04

TMS320C6748固定和浮點DSP硅修訂2.3_2.1_2.0_1.1_1.0_英版

Fixed/Floating-Point DSP . For more detailed information, see the TMS320C6748 Fixed/Floating-Point DSP data manual (literature number: SPRS590).

2016-11-14 16:55:12

efm32wg360數據表

of the powerful 32-bit ARM Cortex-M4， with DSP instruction support and floating-point unit

2017-09-22 15:20:13

Xilinx Vivado HLS中Floating-Point（浮點）設計介紹

盡管通常Fixed-Point（定點）比Floating-Point（浮點）算法的FPGA實現要更快，且面積更高效，但往往有時也需要Floating-Point來實現。這是因為Fixed-Point

2018-01-12 05:43:54

9950

TensorFlow發布了一個新的優化工具包，引入post-training模型量化技術

這些優化將確保將最終模型中精度降低的操作定義與使用fixed-point和floating-point數學混合的內核實現配對。這將以較低的精度快速執行最繁重的計算，但是以較高的精度執行最敏感的計算，因此通常會導致任務的最終精度損失很小，甚至沒有損失，但相比純浮點執行而言速度明顯提高。

2018-10-04 09:16:00

5288