完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
嗨,
我设计了一个程序来查找数字的平方根。 它使用非恢复方法。 但我的设计问题是它具有N / 2时钟延迟(N输入宽度)。 我注意到在xilinx cordaic ip core(带有流水线)在2个时钟周期内放弃了。 那么有什么方法可以减少时钟延迟。 我认为使用nonrestoring algoritham不可能减少延迟。 如果有人有任何与此相关的文件,我怎么能用两个时钟cyle得到输出请与我分享。 谢谢& 问候 以上来自于谷歌翻译 以下为原文 Hi, i have designed a program to find square root of a number. it is using nonrestoring methode. But problem with my design is it has a N/2 clock latency(N- input width). I have noticed that in xilinx cordaic ip core (with out pipelining) got out put with in 2 clockcycle . so is there any way to reduce the clock latency. i think using nonrestoring algoritham it is not possible to reduce the latency. how can i get output with in two clock cyle if anybody had any doccument related to this please share it with me. thanks & regards |
|
相关推荐
5个回答
|
|
如果您的解决方案不是太长,请在此处发布,其他人可以评论并行化解决方案的能力。
在数学上,我不认为你可以根据你的方法更快地减少问题。丹尼尔 以上来自于谷歌翻译 以下为原文 If your solution is not too long, post it here and others can comment on the ability to parallelize the solution. Mathematically, I do not think you can reduce the problem more quickly given your method. Daniel |
|
|
|
嗨,
谢谢你的重播 如果(!reset&amp;&amp; count == 0)开始,则结束 temp_in_data [N + 1:2] 0)开始 temp_in_data = temp_sub_result [N-1:0])开始 temp_in_data [2 * N-1:2 * Q + 2] 这是该计划的主体。 这里Q = N / 2并且将在count = 0时得到输出 实际上这不是流水线版本。在此我们将仅在第N / 2个时钟周期中输出。 有没有办法减少延迟 谢谢&amp; 问候 以上来自于谷歌翻译 以下为原文 hi , thanks for the replay end else if(!reset && count==0 ) begin temp_in_data[N+1:2] <= in_data[N-1:0]; temp_in_data[2*N-1:2*Q+2] <= 0; temp_sub_result<=1; count <= Q; a_out_data<=0; end else if(count>0) begin temp_in_data <= temp_in_data << 2; if(temp_in_data[2*N-1:2*Q]>= temp_sub_result[N-1:0]) begin temp_in_data [2*N-1:2*Q+2] <= temp_in_data[2*N-1:2*Q] - temp_sub_result[N-1:0]; a_out_data <= {a_out_data ,1'b1 }; count <= count-1; temp_sub_result[Q+2:2] <= {a_out_data[Q-1:0], 1'b1}; end else begin a_out_data <= {a_out_data ,1'b0 }; count <= count-1; temp_sub_result[Q+2:2] <= {a_out_data[Q-1:0],1'b0}; end end this is body of the program. here Q=N/2 and will get output at count =0 actually this is not pipelined version .in this we will get out put only in N/2 th clock cycle . is there any way to decrease the latency thanks & regards |
|
|
|
这减少了行走二叉树。
在每个级别,temp_in_data大于或等于temp_sub_result,或者不是。 对于N为2的任何N,可能存在通用解,但是对于给定N值的特定解是简单的.Say N是8,这使得Q,4,您进行GTE比较的次数。 写出所有16个可能的结果方程,你有15个并行的比较,其输出提供一个case语句。 case语句选择将16个计算值中的哪一个分配给a_out_data。 由于temp_sub_result对于给定的分支总是相同的,你可以并行进行16次减法,然后并行进行15次比较,在两个时钟周期内返回结果,使用DSP进行减法。现在很难或不可能实现这种方法 直接用N = 16,你可以做的是将任何问题分成连续的N = 8个问题。 N = 16然后需要四个时钟周期才能完成,两次穿过N = 8链。 N = 24将需要6个时钟周期。其他分组是可能的,例如基链的N = 10,具体取决于您可用的资源数量以及您使用的最终N值。这有意义吗?Daniel 以上来自于谷歌翻译 以下为原文 This reduces down to walking a binary tree. At each level the temp_in_data is greater than or equal to temp_sub_result or it is not. There is probably a generic solution for any N that is a factor of 2, but the specific solution for a given value of N is simple. Say N is 8, which makes Q, 4, the number of times you make a GTE compare. Writing out all 16 possible outcome equations, you have 15 compares in parallel whose output feeds a case statement. The case statement picks which of the 16 calculated values is assigned to a_out_data. Since the temp_sub_result is always the same for a given branch, you can do 16 subtractions in parallel and then fifteen compares in parallel return the result in two clock cycles, using DSP's to do your subtractions. Now it would difficult or impossible to implement this method directly with N = 16, what you can do is divide any problem into successive N = 8 problems. N = 16 then takes four clock cycles to complete, walking through the N = 8 chain twice. N = 24 would take six clock cycles. Other groupings are possible, such as N = 10 for the base chain, depending on how many resources you have available and what final N value you are using. Does that make sense? Daniel |
|
|
|
喜
我附上了ilnk im refering。 http://telkomnika.ee.uad.ac.id/n9/files/Vol.8No.1Apr10/8.1.4.10.01.pdf 在此,每个比特id被分成两个比特的组,然后执行操作。 所以ech阶段需要前一阶段的结果。 所以我认为不可能缩短时钟周期。 对于8位,4组每组包含两位。所以第二组需要第一组的减法结果。 是否有可能减少时钟周期? 是否有任何algoritham使用较少的时钟周期??? ((小于N / 2) 感谢致敬 以上来自于谷歌翻译 以下为原文 hi, i have attached the ilnk im refering . http://telkomnika.ee.uad.ac.id/n9/files/Vol.8No.1Apr10/8.1.4.10.01.pdf in this, each bit id divided in to group of two bits and then perfoming the operation. so ech stage requires the result of the previous stage. so i think its not possible to reduce the clock cycle. for 8 bit, 4 groups contain two bit each .so 2nd group require the result of substraction of first group. is it possible to reduce the no of clock cycle?? is there any algoritham which do this using less no of clock cycle??? ((less than N/2) thanks and regards |
|
|
|
您的解决方案是将sqrt算法实现为顺序(有状态)解决方案。
但是,快速浏览一下这篇论文就会发现,没有什么能阻止你为每个阶段(n / 2个拷贝)抛弃所有逻辑,并且在0个时钟周期内完成所有工作(纯粹的组合)。 事实上,作者的论文根本没有谈论时钟和FF,所以他的解决方案可能就是这样做的。 对我来说并不是那么清楚。 当然,根据位大小,您的时钟频率可能会发臭。 这可能适合您,也可能不适合您。 如果时钟频率不合理(并且您的顺序版本太慢),那么您可以探索流水线操作阶段。 或者实现替代算法(想到BRAM中的LUT)。 或者上面的一些巧妙组合。 剥皮这种猫的方法很多,但它们都取决于你的要求。 - 标记 以上来自于谷歌翻译 以下为原文 Your solution is implementing the sqrt algorithm as a sequential (stateful) solution. But a very quick look at the paper shows me that there's nothing preventing you from just throwing down all the logic for every stage (n/2 copies), and doing everything in 0 clock cycles (purely combinational). In fact, the author's paper really doesn't talk about clocks and FFs at all, so his solution may have done just that. It isn't all that clear to me. Of course your clock rates may stink depending on bit sizes. This may or may not be ok for you. If the clock rates aren't working out, (and your sequential version is too slow), then you can explore pipelining the stages. Or implementing an alternative algorithm (a LUT inside a BRAM comes to mind). Or some clever combination of the above. There's many way's of skinning this cat, but they all depend on your requirements. --Mark |
|
|
|
只有小组成员才能发言,加入小组>>
2389 浏览 7 评论
2804 浏览 4 评论
Spartan 3-AN时钟和VHDL让ISE合成时出现错误该怎么办?
2272 浏览 9 评论
3346 浏览 0 评论
如何在RTL或xilinx spartan fpga的约束文件中插入1.56ns延迟缓冲区?
2440 浏览 15 评论
有输入,但是LVDS_25的FPGA内部接收不到数据,为什么?
769浏览 1评论
请问vc707的电源线是如何连接的,我这边可能出现了缺失元件的情况导致无法供电
551浏览 1评论
求一块XILINX开发板KC705,VC707,KC105和KCU1500
389浏览 1评论
1976浏览 0评论
693浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-11-30 22:53 , Processed in 1.457367 second(s), Total 85, Slave 68 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号