我试图估计设备资源是否足够,但我不知道如何计算。
基于17位控制输入,我需要向四个器件提供四个可变长度数据位 - 一个32位数据到一个DDS IC,一个8位数据到一个PLL IC,一个5位数据到一个数字
衰减器,1位到开关,最后是1位加载命令。
单个案例陈述如下所示,
0:开始
o_ddata [31:0]
完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
我是FPGA设计的新手,所以我不知道一段复杂的代码可以存在多长时间,或者是否存在任何此类限制。
我正在尝试在Spartan 6 XC6SLX9-3TQG144I设备中合成一个verilog代码,其中包含一个包含120000个案例值的case语句,并且代码中的总行数接近900000.我使用的是ISE 13.2。 合成-XST在某些HDL精化步骤中长时间停留(> 8小时至今)。 我在具有48 GB RAM的Intel Xeon 8核处理器系统上运行它。 令人惊讶的是,它没有使用完整的CPU /内存资源,只使用了大约25%。 到目前为止它没有显示任何错误。 我怎么知道这个过程是在某个循环中停滞还是在我应该等待的时间? 实际上在我的应用程序中,我只是简单地读取17位并行输入,并根据输入的状态向其他设备提供一些输出,因此超出最大值。 2 ^ 17种组合我在长整数变量中编码了120000,就像查找表一样。 在运行这个大代码之前,我已经使用5位并行输入测试了相同的代码,即ISIM和Spartan 3E设备上的32个案例值都运行良好。 这个应用程序是可行的还是我做错了什么? 请帮忙。 以上来自于谷歌翻译 以下为原文 I am new to fpga design so I have little idea how long a complex piece of code can be or is there any such limits. I am trying to synthesize a verilog code in Spartan 6 XC6SLX9-3TQG144I device that contains a case statement with 120000 case values, and total number of lines in the code is close to 900000. I am using ISE 13.2. The synthesize-XST is getting stuck at some HDL elaboration step for a long time (>8 hrs. till now). I am running it on a Intel Xeon 8-core processor system with 48 GB RAM. Surprisingly it is not using the full CPU/Memory resources and using only about 25%. It has shown no errors till now. How do I know whether the process has got stuck in some loop or upto what time I should wait? Actually in my application I am simply reading 17-bit parallel input and giving some outputs to other devices based on the state of the inputs , so out of max. 2^17 combinations I have coded 120000 in that long case variable like a look-up table. Before running this big code I have tested the same code with 5-bit parallel input i.e 32 case values in both ISIM and on a Spartan 3E device which was working fine. Is this application possible or am I doing anything wrong? Please help. |
|
相关推荐
21个回答
|
|
嗨,你有ISE 14.7吗?
最好使用最新版本的工具以获得更好的结果。您是否在Spartan 3E或简化版中测试了相同的代码? 在这种情况下,设备利用率如何? 谢谢,维杰----------------------------------------------- ---------------------------------------------请将帖子标记为 一个答案“接受为解决方案”,以防它有助于解决您的查询。如果一个帖子引导到解决方案,请给予赞誉。 以上来自于谷歌翻译 以下为原文 Hi, Do you have ISE 14.7 with you? Its always better to use the latest versions of tool for better results. Did you test the same code in Spartan 3E or the simplified version? How about the device utilization in that case?Thanks,Vijay -------------------------------------------------------------------------------------------- Please mark the post as an answer "Accept as solution" in case it helped resolve your query. Give kudos in case a post in case it guided to the solution. |
|
|
|
嗨Vijay
我只在Spartan 3E中实现了5位输入的简化版本。 这是设备利用率报告。 设计总结-------------- 设计摘要:错误数:0警告数:0逻辑利用率:切片触发器数量:9,312中的335 3%4个输入LUT数量:9,312中的598 6%逻辑分布:占用切片数:4,656中的332个 7%仅包含相关逻辑的切片数:332个中的332个100%包含不相关逻辑的切片数:332个中的0个0%*有关无关逻辑影响的说明,请参阅下面的注释。 4个输入LUT的总数:9,312中的629 6%用作逻辑的数字:598用作路由的数字:31 如果设计针对非切片资源进行过映射或者放置失败,则切片逻辑分布报告无意义。 结合的IOB数量:158个中的49个31%BUFGMUX数量:24个中的2个8%DCM数量:4个中的1个25% 非时钟网的平均扇出:3.78 峰值内存使用:161 MB总实时完成MAP:6秒完成MAP完成的总CPU时间:4秒 以上来自于谷歌翻译 以下为原文 Hi Vijay I have implemented only the simplified version in Spartan 3E with 5 bits input. Here is the device utilization report for that. Design Summary -------------- Design Summary: Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 335 out of 9,312 3% Number of 4 input LUTs: 598 out of 9,312 6% Logic Distribution: Number of occupied Slices: 332 out of 4,656 7% Number of Slices containing only related logic: 332 out of 332 100% Number of Slices containing unrelated logic: 0 out of 332 0% *See NOTES below for an explanation of the effects of unrelated logic. Total Number of 4 input LUTs: 629 out of 9,312 6% Number used as logic: 598 Number used as a route-thru: 31 The Slice Logic Distribution report is not meaningful if the design is over-mapped for a non-slice resource or if Placement fails. Number of bonded IOBs: 49 out of 158 31% Number of BUFGMUXs: 2 out of 24 8% Number of DCMs: 1 out of 4 25% Average Fanout of Non-Clock Nets: 3.78 Peak Memory Usage: 161 MB Total REAL time to MAP completion: 6 secs Total CPU time to MAP completion: 4 secs |
|
|
|
如果你真的只想要一个静态查找表,你为什么不推断一个你可以用.coe文件或在一些初始块中初始化的ROM(我打算写包但我不认为Verilog有
像VHDL包这样的等价物)? 我非常肯定合成将读取HDL并推断出可用ROM比通过大量案例陈述(特别是Verilog CASE语句可能有点含糊不清)更快。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 If you really only want a static Look-up Table, why don't you infer a ROM which you can initialise either with a .coe file or in some initial block (I was going to write package but I don't think Verilog has such equivalents as the VHDL package)? I'm pretty sure the synthesis will read the HDL and infer a usable ROM a lot faster than churning through a gazillion lines of case statements (particularly as Verilog CASE statements can be a bit ambiguous). ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
嗨,实际上我非常确定在测试了简化的5位输入版本后,我已经为我的应用程序制作了一个带有spartan 6设备的定制板,除了32MB SPI PROM我没有给出PROM的任何规定。
用于通过主SPI模式配置FPGA。 我同意PROM是存储静态查找表的理想解决方案,但我没有与PROM进行数据交换的经验。 我想我无法使用该配置PROM来存储查找表? 如果这是不可能的,我可以使用Microblaze在C中编写代码。 但我不确定是否需要在PCB上进行任何硬件更改才能运行处理器。 如果我使用微胶片对其进行编码,是否可以使用主SPI模式以相同的方式配置器件? 以上来自于谷歌翻译 以下为原文 Hi, actually I was so sure after testing the simplified 5-bit input version that I have already fabricated a custom board with the spartan 6 device for my application and I have not given any provision for PROM except for the 32MB SPI PROM which I am using for configuring the FPGA through Master SPI mode. I agree a PROM would have been the ideal solution to store the static lookup table but I have no experience with data exchange with PROMs. I guess there is no way I can use that configuration PROM to store the look up table? If this is not possible, I can write the code in C using Microblaze. But I am not sure whether any hardware change is required in the PCB to run the processor. Can the Master SPI mode be used to configure the device in the same way if I codde it using microblaze? |
|
|
|
我实际上考虑过将内部BRAM用于ROM,但现在我认为2 ^ 17地址对于你所针对的设备来说可能太多了,这也可能是你的综合“失败”的原因。
我希望这些工具可以从你的case语句中推断出一个内存,但它无法将其放入设备中。在这种情况下,你当然可以使用外部PROM作为查找表(假设这也足够大)。 您将获得访问它的时间损失,但您没有提到您需要一定的访问周期时间.SPI掌握并不是那么难。 这是一个美化的移位寄存器,在这些论坛上有几个关于这样做的线程。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 I actually thought about using internal BRAM for the ROM but now I think that 2^17 addresses is probably too much for the device you are targeting, which is also probably why your synthesis "fails". I expect the tools to infer a memory from your case statement but it can't fit it into the device. In that case, you can certainly use the external PROM as the lookup table (assuming this is also large enough). You'll get a time penalty for accessing it but you haven't mentioned that you need a certain cycle time for accesses. SPI mastering is not so hard. It's a glorified shift register and there have been several threads about doing it on these forums.---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
好的,这里有一点点更新(因为我对此有点兴趣)。
我做了一个实验,我推断了一个Block RAM(可以通过删除写入使能并初始化内容来配置为ROM) - 我没有时间为这个测试用例初始化整个ROM。 对于您的Spartan 6 LX9,我可以推断和合成的最大BRAM的地址宽度为17,数据宽度为4(即131072x4)。 这会消耗设备中的所有BRAM资源。 “提供一些产出” 根据您的17位控制输出,您需要输出多少数据? 你在控制LED还是什么? 4位输出足够宽吗? ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 OK, here's a little update (because I'm a bit interested in this). I did an experiment where I inferred a Block RAM (which can be configured to be a ROM by removing the write enable and initialising the contents - I didn't have the time to initialise the entire ROM just for this test case). For your Spartan 6 LX9, the biggest BRAM I can infer and synthesise has an address width of 17 and a data width of 4 (i.e. 131072x4). This consumes ALL of the BRAM resources in the device. "giving some outputs" How much data do you need to put out based on your 17-bit control output? Are you controlling LEDs or something? Is a 4 bit output wide enough? ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
嗨,谢谢你为我检查一下。
我试图估计设备资源是否足够,但我不知道如何计算。 基于17位控制输入,我需要向四个器件提供四个可变长度数据位 - 一个32位数据到一个DDS IC,一个8位数据到一个PLL IC,一个5位数据到一个数字 衰减器,1位到开关,最后是1位加载命令。 单个案例陈述如下所示, 0:开始 o_ddata [31:0] |
|
|
|
那么要计算查找表的大小,您需要2-power-of -address宽度乘以数据宽度。
您有17位地址和32 + 8 + 5 + 1 + 1 = 47个数据位。 2 ^ 17 * 47 = 131072 * 47 = 6160384位(6Mb),或更常见的是,约770kB(6160384/8)。 对于FPGA而言,这是相当多的--Spartan 6系列中最大的RAM大小仅为4824kb。 我使用2 ^ 17 * 32进行了综合测试,它适用于Spartan 6 LX150中95%的BRAM资源。 我看不出这对你的设计有效。 因此,如果您想要一个纯查找表,则必须使用外部RAM / ROM。 如果您以前计算过PIC中的输出,为什么不能在FPGA中计算它们(您不需要使用微粒)? 您将在FPGA中进行计算并行化(如果您正确编码),您可以使用DSP切片(如果可用)。 我不认为你会从Microblaze获得巨大的性能优势 - 它只是一个微处理器。 您为此设计定位的频率是多少? 您输出数据的实际时间要求是多少? ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 Well to calculate the size of the lookup table you need your power-of-2 address width multiplied by the data width. You have 17 bits address and 32+8+5+1+1=47 data bits. 2^17*47=131072*47=6160384 bits (6Mb) or, more usually, about 770kB (6160384/8). That's quite a lot for an FPGA - the largest RAM size in the Spartan 6 family is only 4824kb. I ran a synthesis test using 2^17*32 and it fit into 95% of the BRAM resources in the Spartan 6 LX150. I can't see this being effective for your design. So, if you want a pure lookup table, you'll have to use external RAM/ROM. If you previously calculated outputs in the PIC, why can't you calculate them in the FPGA (you needn't use the Microblaze)? You'll get the parallelisation in the calculation in the FPGA (if you code it right) plus you can use the DSP slices (where available). I don't think you'll get a tremendous preformance benefit from the Microblaze - it is just a microprocessor really. What frequency are you targeting for this design? What is your ACTUAL time requirement to output data? ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
嗨,我无法在fpgasily中计算出移动中的值,因为我不知道如何编写涉及verilog / vhdl中的除法运算的方程式,例如计算32位频率调谐字的方程式(FTW)
)对于DDS这个词是, FTW = round((2 ^ 32)*(fout / fclk)) 在微控制器中,我可以使用inbuild math.h函数轻松地在C中编写上述方程的舍入,指数和除法运算,但是如何在HDL中编写相同的逻辑呢? Plz说这是可能的:) 我正在使用50MHz晶振并从DCM产生200MHz。 对不起,我不确定你对输出数据的实际时间要求是什么意思,但我更喜欢所有输出都应该在我切换一个表示17位输入数据读取开始的选通信号后200ns内出现。 只是为了更新你正在进行的合成过程,我到目前为止还没有停止 - 它仍在运行,控制台上的最后一次更新如下, ================================================== ======================= * HDL综合* ======================= ================================================== 合成单位。 相关的源文件是“e:/documents/biswas/fsyn618control/fsyn618control.v”。 initialize = 2'b00 ready = 2'b01 hold = 2'b11 countmax = 99999 countmax2 = 9999警告:Xst:647 - 从不使用输入。 如果该端口属于顶级块或者属于子块,则该端口将被保留并保持未连接状态,并保留该子块的层次结构。警告:Xst:647 - 从不使用输入。 如果该端口属于顶级块或者属于子块并保留该子块的层次结构,则该端口将被保留并保持未连接状态。 找到32位寄存器用于信号。 找到8位寄存器用于信号。 找到5位寄存器用于信号。 找到1位寄存器用于信号。 找到1位寄存器用于信号。 找到1位寄存器用于信号。 找到32位寄存器用于信号。 找到4位寄存器用于信号。 找到2位寄存器用于信号。 找到1位寄存器用于信号。 找到1位寄存器用于信号。 找到32位寄存器用于信号。 找到32位寄存器用于信号。 找到信号的有限状态机。 -------------------------------------------------- --------------------- | 国家| 2 | | 过渡| 3 | | 输入| 1 | | 输出| 2 | | 时钟| o_dcm200(rising_edge)| | 重置| init_step [31] _GND_1_o_equal_781_o(正面)| | 重置类型| 同步| | 重置状态| 01 | | 通电状态| 00 | | 编码| 汽车| | 实施| LUT | -------------------------------------------------- ---------------------找到信号的有限状态机。 -------------------------------------------------- --------------------- | 国家| 260 | | 过渡| 907 | | 输入| 3 | | 输出| 270 | | 时钟| o_dcm200(rising_edge)| | 通电状态| 00000000000000000000000000000000 | | 编码| 汽车| | 实施| LUT | -------------------------------------------------- ---------------------找到信号的有限状态机。 -------------------------------------------------- --------------------- | 国家| 6 | | 过渡| 33 | | 输入| 5 | | 输出| 6 | | 时钟| o_dcm200(rising_edge)| | 通电状态| 00000000000000000000000000000000 | | 编码| 汽车| | 实施| LUT | -------------------------------------------------- ---------------------找到第88行创建的信号的32位加法器。摘要:推断1加法器/减法器。 推断出86个D型触发器。 推断54多路复用器。 推断3个有限状态机。单元合成。 合成单位。 相关的源文件是“e:/documents/biswas/fsyn618control/ipcore_dir/dcm200.v”。 总结:没有macro.Unit合成。 ================================================== ======================= HDL综合报告 宏统计#Adders / Subtractors:1 32位加法器:1#寄存器:10 1位寄存器:5 32位寄存器:2 4位寄存器:1 5位寄存器:1 8位寄存器:1#多路复用器 :54个1位2对1多路复用器:11个32位2对1多路复用器:43#FSM:3 ================================================== ======================= 以上来自于谷歌翻译 以下为原文 Hi, I am not able to calculate the values on-the-go in fpga simply because I dont know how to write an equation that involves a division operation in verilog/vhdl, for example the equation to calculate the 32-bit Frequency Tuning Word (FTW) for the DDS word is, FTW = round ( (2^32) * (fout / fclk) ) In microcontroller I could easily write the round, exponential and division operation of the above equation in C using the inbuild math.h functions, But how do I write the same logic in HDL? Plz say this is possible :) I am using a 50MHz crystal and generating a 200MHz from DCM. Sorry I am not sure what do you mean by actual time requirement for output data but I prefer all the outputs should come out within 200ns after I toggle a strobe signal which indicates the start of reading of the 17-bit input data. Just to update you on my ongoing synthesis process which I have not stopped till now- it is still running and last update on the console is as follows, ========================================================================= * HDL Synthesis * ========================================================================= Synthesizing Unit Related source file is "e:/documents/biswas/fsyn618control/fsyn618control.v". initialize = 2'b00 ready = 2'b01 hold = 2'b11 countmax = 99999 countmax2 = 9999 WARNING:Xst:647 - Input WARNING:Xst:647 - Input Found 32-bit register for signal Found 8-bit register for signal Found 5-bit register for signal Found 1-bit register for signal Found 1-bit register for signal Found 1-bit register for signal Found 32-bit register for signal Found 4-bit register for signal Found 2-bit register for signal Found 1-bit register for signal Found 1-bit register for signal Found 32-bit register for signal Found 32-bit register for signal Found finite state machine ----------------------------------------------------------------------- | States | 2 | | Transitions | 3 | | Inputs | 1 | | Outputs | 2 | | Clock | o_dcm200 (rising_edge) | | Reset | init_step[31]_GND_1_o_equal_781_o (positive) | | Reset type | synchronous | | Reset State | 01 | | Power Up State | 00 | | Encoding | auto | | Implementation | LUT | ----------------------------------------------------------------------- Found finite state machine ----------------------------------------------------------------------- | States | 260 | | Transitions | 907 | | Inputs | 3 | | Outputs | 270 | | Clock | o_dcm200 (rising_edge) | | Power Up State | 00000000000000000000000000000000 | | Encoding | auto | | Implementation | LUT | ----------------------------------------------------------------------- Found finite state machine ----------------------------------------------------------------------- | States | 6 | | Transitions | 33 | | Inputs | 5 | | Outputs | 6 | | Clock | o_dcm200 (rising_edge) | | Power Up State | 00000000000000000000000000000000 | | Encoding | auto | | Implementation | LUT | ----------------------------------------------------------------------- Found 32-bit adder for signal Summary: inferred 1 Adder/Subtractor(s). inferred 86 D-type flip-flop(s). inferred 54 Multiplexer(s). inferred 3 Finite State Machine(s). Unit Synthesizing Unit Related source file is "e:/documents/biswas/fsyn618control/ipcore_dir/dcm200.v". Summary: no macro. Unit ========================================================================= HDL Synthesis Report Macro Statistics # Adders/Subtractors : 1 32-bit adder : 1 # Registers : 10 1-bit register : 5 32-bit register : 2 4-bit register : 1 5-bit register : 1 8-bit register : 1 # Multiplexers : 54 1-bit 2-to-1 multiplexer : 11 32-bit 2-to-1 multiplexer : 43 # FSMs : 3 ========================================================================= |
|
|
|
我不确定什么函数是圆的,但是2 ^ 32只是向量的左移32位。
IEEE Numeric标准库提供了左移功能,因此是指数整理的。 你为fout和fclk得到了什么样的值(它们是什么类型 - 整数,std_logic_vectors,unsigned?)? 如果除法器的幂为2(右移,不小于!),则可以直接在VHDL中进行整数除法。 请记住,除法只是重复减法。 除非你有一些非常奇怪的值,否则编写这种操作不应该太复杂。 互联网可能充满了这样的东西。 您是否看过Xilinx从coregen提供的Divider内核,如果这对您来说有点太高级了? 他们可能会帮忙。 我的意思是你的数据吞吐量要求是多少? 您的系统向FPGA提供数据,稍后,它会预期系统中的某些输出或事件。 那么,在FPGA内部执行操作的最大时钟周期数是多少? 你提到200 ns,但这是一个要求吗? 200 MHz是5 ns的周期,200/5 = 40个时钟周期 - 这是很多可能的同步,并行,处理! 密克罗尼西亚联邦的260个州? 在我看来,难怪发动机正在挣扎。 如果它完成并且没有过度分配设备资源,我会感到惊讶。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 I'm not sure what function round is but 2^32 is simply a left shift of a vector by 32 bits. There's a left shift function provided by the IEEE Numeric standard library, so that's the exponential sorted out. What sort of values are you getting for fout and fclk (what type are they - integer, std_logic_vectors, unsigned?)? Integer division is possible directly in VHDL if the divider is a power of 2 (a right shift, no less!). Remember that division is simply repeated subtraction. Unless you have some very odd values, it shouldn't be too complicated to code this kind of operation. The internet is probably full of things like this. Have you looked at the Divider cores that Xilinx provide from coregen, if this seems a bit too advanced for you? They may help. What I meant was what is your data throughput requirement? Your system provides data to the FPGA and, some point later, it expects some output or event elsewhere in the system. So what is the maximum number of clock cycles you can use to perform operations inside the FPGA? You mention 200 ns but is that a requirement? 200 MHz is a period of 5 ns, 200/5 = 40 clock cycles - that's a lot of possible synchronous, parallel, processing! 260 states in an FSM? Small wonder that the engine is struggling, in my opinion. I'd be amazed if it completes and doesn't over allocate device resources. ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
是的我需要将FTW舍入到最接近的整数,如果它是一个浮点值,我不知道如何做到这一点。
该等式中的'fout'类型为'long double',并且对于每个输入情况都是可变的,而'fclk'始终是固定的整数值(3500)。 不,我还没有看过分频器核心。 我的印象是涉及分工操作的代码不可合成。 这就是为什么我预先计算了这些值并将其作为一个查找表。 我现在就检查一下。 我可能没有正确解释但我不认为我在40个时钟周期内进行了太多的并行处理。 用于锁存17位并行数据的输入选通信号是ASYNCHRONOUS,我系统中任意两个这样的选通脉冲之间的最小时间差是1us,即它可能是100ms,1s,1hr等。每次选通后和 fpga读取相应的17位数据,我需要输出ddata [31:0],pdata [7:0],adata [4:0]和对应于该输入数据的开关的特定组合。 一旦该数据输出,fpga就可以闲置并等待下一个选通脉冲,直到它没有进行任何处理。 200MHz时钟周期应该花多长时间? 以上来自于谷歌翻译 以下为原文 Yes I need to round off FTW to the nearest integer if it comes as a float value and I have no idea how to do this. 'fout' in that equation is of type 'long double' and is variable for each input case whereas 'fclk' is a fixed integer value (3500) all the time. No I have not seen divider cores yet. I was under the impression that codes involving division operation are not synthesizable. Thats why I precalculated the values and put up as a look-up table. I will check them now. I may not have explained properly but I dont think I am doing too much parallel processing within that 40 clock cycles. The input strobe signal which is used to latch the 17-bit parallel data is ASYNCHRONOUS and MINIMUM time difference between any two such strobe-pulse in my system is 1us, i.e it may be 100ms, 1s, 1 hr etc. After each strobe and the corresponding 17-bit data is read by the fpga, I need to output that specific combination of ddata[31:0], pdata[7:0], adata[4:0] and switch which corresponts to that input data. Once that data is out the fpga can sit idle and wait for the next strobe pulse, until that time it is not doing any processing. How much time this cycle should take with 200MHz clock? |
|
|
|
好的,这里有一些东西,所以让我们一次拿一个。
1.异步输入 这非常重要。 在我看来,比你设计的任何其他部分更重要。 在您的逻辑中使用它之前,绝对必须确保任何异步输入与FPGA时钟正确同步。 当比特流下载到FPGA时,异步输入是导致许多性能问题的原因。 单比特输入的常用技术是双击它们。 2. FTW方程 现在我们知道了theftw = 2 ^ 32 *(fout / 3500)。 因此,在等式中有两个常量,这样您就可以让编译器为您完成工作。 该等式可以重写为ftw =(2 ^ 32/3500)* fout。 您可以在法律和合成方面在代码中声明一个常量 常量MULT_CONSTANT:整数:= 2 ** 32/3500; 编译器将为您处理舍入,为您留下可用的整数。 现在你的等式只是ftw = MULT_CONSTANT * fout。 简单,不是吗? 啊,不太好。 对于VHDL中的整数类型,2 ^ 32实际上太大了。 让我们再试一次。 整数的最大值是2 ^ 31-1。 我们可以给自己一些计算余量并说2 ^ 30(即2 ^ 32/4)。 所以现在我们也需要将fclk除以4。 因此,我们的常量可以声明为 常量FCLK_REFACTOR:整数:= 3500/4; 常量MULT_CONSTANT:整数:= 2 ** 30 / FCLK_REFACTOR; 当然,这是一个真正的值,但编译器会为你完成它(内置圆函数!)。 现在我们可以简单地得到一个乘法ftw = fout * MULT_CONSTANT。 “*”是VHDL中完美可合成的运算符。 您也可能会从为乘法器方程添加流水线阶段中受益,以帮助合成器充分利用DSP切片。 3.周期时间 我对这种风格进行了测试并模拟了它。 我可以运行乘法器(有6个流水线级,如合成器所建议的那样 - 因此实际输出在输入之后的7个时钟周期内)。 我使用了25 MHz的时钟频率(40 ns的周期); 从输入到输出,40 ns * 7 = 280 ns。 您提到输入之间的最小循环时间为1 us,因此即使在25 MHz的低频率下,我们也可以在很短的时间内计算和输出。 为什么200 MHz? 后记: 我知道你的原始文件是在Verilog中,但我是VHDL人,这对我来说解释起来要容易得多。 我确定有一个Verilog等价物,可能使用localparam或类似的东西。 希望你能想到简化你的除法方程。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 OK, there's a few things here so let's take them one at a time. 1. Asynchronous inputs This is REALLy important. More so, in my opinion than any other part of your design. You absolutely MUST ensure that ANY asynchronous input is correctly synchronised to the FPGA clock BEFORE you use it in your logic. Asynchronous inputs are the cause of lots of performance issues when the bitstream is downloaded to the FPGA. The usual technique for single bit inputs is to double flip-flop them. 2. FTW equation Now we know that ftw = 2^32*(fout/3500). So you have TWO constants in your equation so you can let the compiler do the work for you. This equation may be rewritten as ftw = (2^32/3500)*fout. You can, quite legally and synthesisably, declare a constant in your code which is constant MULT_CONSTANT : integer := 2**32/3500; The compiler will take care of the rounding for you, leaving you with a usable integer. Now your equation is simply ftw = MULT_CONSTANT*fout. Simples, no? Ah, not quite. 2^32 is actually too big for an integer type in VHDL. Let's try again. The maximum value for an integer is 2^31-1. We can give ourselves some calculation margin and say 2^30 (which is 2^32/4). So now we need to divide fclk by 4, too. Therefore, our constant can be declared as constant FCLK_REFACTOR : integer := 3500/4; constant MULT_CONSTANT : integer := 2**30/FCLK_REFACTOR; Absolutely, this is a real value but the compiler will round it out for you (built in round function!). Now we can simply have a multiplication ftw = fout*MULT_CONSTANT. "*" is a perfectly synthesisable operator in VHDL. You'll also probably benefit from adding pipeline stages to the multiplier equation to help the synthesiser get the best out of the DSP slices. 3. Cycle times I ran a test on this style and simulated it, too. I can run the multiplier (with 6 pipeline stages, as advised by the synthesiser - so the actual output comes 7 clock cycles AFTER the input). I used a clock frequency of 25 MHz (period of 40 ns); 40 ns*7 = 280 ns from input to output. You mentioned that the MINIMUM cycle time between inputs was 1 us, so even at a low frequency as 25 MHz, we can calculate and output in a fraction of that time. Why 200 MHz? POSTSCRIPT: I know your original files were in Verilog but I'm a VHDL man and it was all a lot easier for me to explain that way. I'm sure there's a Verilog equivalent, probably using localparam or something like that. Hopefully, you get the idea of simplifying your division equation. ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
1.异步输入 - 是的,我读过它来翻转它在某处,但我忘了在我的代码中实现它。
非常感谢你提出这个问题。 2.我知道基本的VHDL,因为我的Verilog代码不起作用,我也对VHDL开放。 由于您正在努力测试一些代码片段,我可以尝试使用您的小帮助在VHDL中编写整个代码,对我来说没问题。 关于等式的编码仍然存在一些疑问。 'fout'可以取这样的值:312.503125,437.7375等,即小数点后最多6个显着位置。 然后它必须乘以(2 ^ 32/3500)。 在这个乘法之后,它必须最终舍入,而不是在它们之间。 例如,'fout'值也取决于17位输入值,例如以下关系 如果17位输入值= 77750,则fout =(6000 +(77750/10))/ 32 = 430.468750 如果17位输入值= 120000,则fout =(6000 +(120000/10))/ 32 = 562.5 从这些类型的fout值我必须计算FTW,围绕它并输出相应的32位。这可能吗? 哦,200MHz只是我选择的一个随机数。 我想尽可能快地编写代码,因为最初只需要处理所需的是评估单个case语句/ LUT以响应异步输入,我认为它足够快。 以上来自于谷歌翻译 以下为原文 1. Asynchronous Inputs- Yes I read about it to double flip-flop it somewhere but I forgot to implement it in my code. Thanks a lot for bringing this up. 2. I know basic VHDL and since my Verilog code is not working I am open to VHDL also. Since you are taking effort to test some code snippets I can try to code the whole code in VHDL also with your little help, no problem for me. There is still some doubt regarding the coding of the equation. 'fout' can take up values like this : 312.503125, 437.7375 etc. i.e. upto 6 significant places after decimal point. Then it has to be multiplied with (2^32/3500). After this multiplication only it has to be finally rounded, not in between. And also the 'fout' value is dependant on the 17-bit input value by the following relation, for example if 17-bit input value = 77750, fout = (6000 + (77750/10)) / 32 = 430.468750 if 17-bit input value = 120000, fout = (6000 + (120000/10)) / 32 = 562.5 From these type of fout values I have to calculate FTW, round it and output corresponding 32 bits. Can this be possible? 3. Oh 200MHz was just a random number i have chosen. I wanted to make the code as fast as possible and since originally only processing required was to evaluate a single case statement/ LUT in response to an asynchronous input I thought it will be fast enough. |
|
|
|
1.准确性
您的输出需要具有多大的准确度? 或者输出与实际计算值相比可以接受的百分比差异(假设您正在舍入最终数字)? 2. FTW方程 好吧,让我们打破这个方程式。 ftw = 2 ^ 32 *(fout / fclk),其中 fclk = 3500和 fout =((输入/ 10)+6000)/ 32 所以我们可以写 ftw =(2 ^ 32/3500)*((输入/ 10)+6000)* 1/32 注意32可以写成2 ^ 5,所以我们可以简化为 ftw =(2 ^ 27/3500)*((输入/ 10)+6000) 这可以写成 ftw =(2 ^ 27/3500)*((输入/ 10)+(60000/10)) 要么 ftw =(2 ^ 27/3500)*((输入+ 60000)/ 10) ftw =(2 ^ 27/35000)*(输入+ 60000) 在那里,现在我们在开始时有一个简单的常量,我们的输入只是在乘法之前添加到另一个常量。 举一个你提供的例子,让输入= 77750.ftw(r)的实际计算值是528242629,49。 我的快速原型输出(在启用输入的280 ns内,以25 MHz运行)ftw(c)= 528133500。 ftw(c)/ ftw(r)* 100 = 99,979%准确。 这够近了吗? 我附上了一个Modelsim截图,以便您可以看到流水线计算的实际效果。 哦,200MHz只是我选择的一个随机数。 我想尽可能快地编写代码 在我看来,这不是特别好的设计实践。 必须有一些真正的数据吞吐量要求。 简单地选择一个随机数并尝试设计它可能会在你实施时最终将你绑在一起。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 1. Accuracy What accuracy does your output need to have? Or what percentage variance is acceptable for the output to have compared to the real calculated value (given that you are rounding the final number anyway)? 2. FTW equation OK, let's break this equation down again. ftw = 2^32*(fout/fclk), where fclk = 3500 and fout = ((input/10)+6000)/32 so we can write ftw = (2^32/3500)*((input/10)+6000)*1/32 notice that 32 may be written as 2^5, so we can simplify to ftw = (2^27/3500)*((input/10)+6000) this may be written as ftw = (2^27/3500)*((input/10)+(60000/10)) or ftw = (2^27/3500)*((input+60000)/10) ftw = (2^27/35000)*(input+60000) There, now we have an easy constant at the start and our input is simply added to another constant before the multiplication. To take an example that you provided, let the input = 77750. The real calculated value for ftw(r) is 528242629,49. My rapid prototype outputs (within 280 ns of the enabled input, running at 25 MHz) ftw(c) = 528133500. ftw(c)/ftw(r)*100 = 99,979 % accurate. Is this close enough? I've attached a Modelsim screenshot so you can see the pipelined calculation in action. 3. Oh 200MHz was just a random number i have chosen. I wanted to make the code as fast as possible This is not particularly good design practice, in my opinion. There must be some genuine requirement for data throughput. Simply picking a random number and trying to design to it could end up tying you in knots when you come to implementation. ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
准确性:嗨,这是准确的但仍不足以满足我的应用需求。
例如,我的应用程序是生成可编程微波频率,我正在使用一个单独的PLL乘法器芯片(8位adata [7:0]从FPGA转到此PLL),如前所述.PLL倍增'fout'值 使用adata [7:0]值固定32(99%的时间)以生成最终输出频率。 现在从ft(r)andftw(c)反算'fout'值, ftw(r)= round(528242629,49)= 528242630 fout(r)= 430.4687504 Ffinal(r)= 430.4687504 * 32 = 13775.0000128 ftw(c)= 528133500 fout(c)= 430.3798196 Ffinal(c)= 430.3798196 * 32 = 13772.1542272 因此频率误差:Ffinal(r)-Ffinal(c)= 13775.0000128 -13772.1542272 = 2.8459016或此情况下约为3 MHz。我希望此误差值在100 kHz以内。 我希望这能解释所需的准确性。 附: 对于2 ^ 17个案例中只有5个输入条件,我可能必须将adata [7:0]值设为36,而在其他所有时间它都是32。 是的,我没有太多计算就选择了200MHz的值,我会小心的。 以上来自于谷歌翻译 以下为原文 Accuracy : Hi, that was accurate but still not enough for my application. For example, my application is to generate programmable microwave frequencies and I am using a separate PLL multiplier chip (8-bit adata[7:0] is going to this PLL from FPGA) as mentioned earlier.The PLL multiplies the 'fout' value with the adata[7:0] value which is fixed 32 (99% of the time) to generate a final output frequency. Now back-calculating 'fout' values from ftw(r) and ftw(c), ftw(r) = round (528242629,49) = 528242630 fout(r) = 430.4687504 Ffinal(r) = 430.4687504 * 32 = 13775.0000128 ftw(c) = 528133500 fout(c) = 430.3798196 Ffinal(c) = 430.3798196 * 32 = 13772.1542272 So frequency error : Ffinal(r) - Ffinal(c) = 13775.0000128 - 13772.1542272 = 2.8459016 or approximately 3 MHz for this case. I want this error value to be within 100 kHz. I hope this explains the accuracy required. P.S. For only 5 input conditions out of 2^17 cases I MAY have to make adata[7:0] value as 36, at all other times it is 32. 3. Yes I picked 200MHz value without much calculations, I will be careful. |
|
|
|
为了清楚这一点:我(快速)向您展示了一种计算方法,您可以获得百万分之300(10 GHz的3MHz)的精度,但这还不够准确。
您想要(这是一项要求还是您想要的?)精度为百万分之10(10 GHz时为100kHz)。 为实现这一目标,您选择了没有快速外部存储器的低端FPGA? 对我来说,这听起来像是DSP的理想应用。 这是商业应用,学术或爱好吗? 除非您想使用定点库进行调查,否则我认为您已经回到在Microblaze中进行计算(这可能非常好用但是我看不到将嵌入式uP换成专用uP的好处, 除了Microblaze可能会以更高的频率运行)。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 Just to be clear on this: I (rapidly) presented you with a method of calculation whereby you would have an accuracy of 300 parts per million (3MHz in 10's of GHz) but that's not accurate enough. You want (is this a requirement or just what you want?) an accuracy of 10 parts per million (100kHz in 10's of GHz). To achieve this, you selected a low end FPGA with no fast external memory? This sounds like an ideal application for a DSP, to me. Is this a commerical application, academic or hobby? Unless you want to investigate using the fixed point libraries, I think you are back to doing the calculations in the Microblaze (which may very well work out OK but I can't see the benefit of swapping out a dedicated uP for an embedded one, other than the Microblaze might run at a higher frequency). ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
嗨,我是fpga上这种数字设计的初学者,我承认我选择了错误的设备而没有给出任何外部资源。
正如我之前所说,我已经测试了简化代码,我从未想过只增加输入位可能需要这么多资源。 但是多亏了你,你已经展示了实现相同逻辑的另一种方法(这对我以后肯定会有用)。 但是现在我的应用程序(这是一个工业应用程序,而不是学术或爱好)具有模块的最终规范,即频率精度应该在+/- 100kHz之内(这不仅仅是我想要的)需要针对的目标。 也许我必须从为应用选择合适的设备和资源开始。 您是否可以帮我为这个应用程序选择合适的低功耗设备,我的原始代码或修改后的代码可以运行,或者任何其他具有外部存储器,ROM等的解决方案? 以上来自于谷歌翻译 以下为原文 Hi, I am a beginner in this kind of digital design on fpga and I admit I have chosen the wrong device and not given any provision for external resources. As I have said earlier that I already tested the simplified code, I never thought that just increasing the input bits could need this much resources. But thanks to you you have shown this other method of implementing the same logic (which will be certainly be useful for me later). But for now my application (which is an industrial one, not academic or hobby) has end specifications for the modue that frequency accuracy should be within +/-100kHz (it is not just what I want) which needs to be targeted. Maybe I have to start over agin from selecting the right device and resources for the applicaton. Can you just help me in selecting a proper low-power device for this application in which my original code or your modified code can run or any other solutions with external memories, ROMs etc? |
|
|
|
我从未在这个工业领域工作过,所以我给出的任何设备选择建议都应该用一点盐。
由于这是一个工业项目,您的项目团队对此事项有何评价? 您之前提到过,您已经为LX9提供了硬件,因此再次启动会浪费大量资金。 我觉得有点奇怪的是,你会故意为这样一个项目选择一个不熟悉的架构(看似没有涉及任何咨询级别)。 无论如何,这只是我的意见,坦率地说,没有我的业务。 那么,那么,然后。 正如我所见,你有两个基本选择: 1.使用专用DSP。 有些负载都以相当快的频率运行。 2.如果更改电路板设计太多并且您使用Spartan 6,那么还有更多选择: 一个。 了解VHDL中的定点算法,并以这种方式实现方程式。 湾 嵌入一个Microblaze并让它做你的数字运算。 第三种选择,可能是Xilinx员工强调的选择,就是使用Zynq设备 - 处理器和逻辑的组合可以非常强大和快速。 据我了解Vivado的HLS方面,您可以编写软件代码,然后告诉工具将设计的这一部分卸载到FPGA架构中以获得性能优势。 我从未尝试过这个。 作为(首先是)工程师,在我看来,你应该为这项任务选择合适的资源。 鉴于您对定点精度的经验和要求,这不是FPGA。 ----------“我们必须学会做的事情,我们从实践中学习。” - 亚里士多德 以上来自于谷歌翻译 以下为原文 I have never worked in this industrial area so any device selection advice I give should be taken with a pinch of salt. As this is an industrial project, what does your project team say on the matter? You mentioned earlier that you had already committed hardware to the LX9 so it would be a pretty large waste of money to start again. I find it slightly odd that you would deliberately select an unfamiliar architecture for such a project (with seemingly no level of consultancy involved, either). Anyway, that's just my opinion and, frankly, none of my business. So, onwards, then. You have, as I see, 2 basic options: 1. Use a dedicated DSP. There are loads out there all operating at decently quick frequencies. 2. If changing the board design is too much and you're stuck with your Spartan 6, then there are a further range of choices: a. Learn about fixed point arithmetic in VHDL and implement your equations that way. b. Embed a Microblaze and get that to do your number crunching. A third option, probably one that would be highlighted by Xilinx employees, would be to use a Zynq device - the combination of the processor and the logic can be very powerful and fast. As I understand the HLS side of Vivado, you can code software and then tell the tools to off-load that part of the design to the FPGA fabric to get the performance benefit. I have never tried this. As (first and foremost) an engineer, you should, in my opinion, be selecting the right resources for the task. Given your experience and requirement on fixed point accuracy, that would not be an FPGA. ---------- "That which we must learn to do, we learn by doing." - Aristotle |
|
|
|
嗨,我只有两块板用于原型设计,所以我并没有真正坚持斯巴达6.正如你之前计算的那样,我的原始代码适合95%的LX150设备和4824kb RAM,所以我可以安全使用
Virtex 6器件如XC6VLX760具有25,920kB RAM?我更喜欢将FPGA用于DSP处理器,因为我已经有一些开发板,我觉得定制板设计更舒适。 无论如何,我现在必须在任何平台上测试完整代码,然后再进行自定义电路板开发。 感谢您的建议。 我也有一个Zynq-7000板,我会尝试相同的。 以上来自于谷歌翻译 以下为原文 Hi, I have got only two boards made for prototyping purpose, so I am not really stuck up with spartan 6. As you have calculated earlier that my original code fits into 95% of LX150 device with 4824kb RAM, so can I safely use a Virtex 6 device like XC6VLX760 with 25,920kB RAM? I prefer to use an FPGA to a DSP processor because I already have some development boards and I feel custom board design more comfortable. Anyway now I have to test the complete code on any platform before commiting to custom board development. Thank you for the suggestions. I also have a Zynq-7000 board on which I will try the same. |
|
|
|
只有小组成员才能发言,加入小组>>
2396 浏览 7 评论
2811 浏览 4 评论
Spartan 3-AN时钟和VHDL让ISE合成时出现错误该怎么办?
2279 浏览 9 评论
3357 浏览 0 评论
如何在RTL或xilinx spartan fpga的约束文件中插入1.56ns延迟缓冲区?
2445 浏览 15 评论
有输入,但是LVDS_25的FPGA内部接收不到数据,为什么?
784浏览 1评论
请问vc707的电源线是如何连接的,我这边可能出现了缺失元件的情况导致无法供电
559浏览 1评论
求一块XILINX开发板KC705,VC707,KC105和KCU1500
408浏览 1评论
1986浏览 0评论
707浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-12-5 04:38 , Processed in 1.683662 second(s), Total 112, Slave 96 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号