完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
各位大家好,我需要您对以下问题发表意见。
假设我们有以下代码: int Array [10000] = {0}; for(int i = 0; i 这个循环有10000次迭代,我只需要一次迭代来获取我的数据,但我不知道哪个迭代是正确的。 所以我没有任何理由有很多延迟。 有没有办法减少指令无用的迭代(例如array_partition)...... ??? 我也使用展开和管道,但没有任何积极的结果。 提前致谢...!!! 以上来自于谷歌翻译 以下为原文 Hello everyone, I need your opinion about the following issue. Let's say that we have the following code: int Array[10000]={0};for(int i=0; i<10000;i++){ //fixed bound loop if(...limitation....){ .............. .............. }} This loop has 10000 iterations and I need only one iteration to take my data but I don't know which iteration is correct. So I have a lot of latency without any reason. Is there any way to reduce the useless iterations with directives (for example array_partition)...??? I also use unroll and pipeline but without any positive results. Thanks in advance...!!! |
|
相关推荐
8个回答
|
|
嗯,这很烦人。
我能看到的唯一解决方案是将所有内容转换为整数或定点; 至少它会明显更快(应该能够获得单周期性能)。 在原帖中查看解决方案 以上来自于谷歌翻译 以下为原文 Well, that's pretty annoying. The only solution I can see is to convert everything to integer or fixed-point; at least then it'll be significantly faster (should be able to get single-cycle performance). View solution in original post |
|
|
|
所以这个想法是“(...限制......)”只适用于10,000次迭代中的一次?
而你无法确定提前进行哪次迭代? 你可以做的并不多。 如果你需要读取10,000个数组元素来找到哪个是“正确的”,并且你每个周期只能读一个(来自单端口RAM,或者一个端口永久连接在别处的RAM)那么那就是 采取(最多)10,000个周期。 如果对数据有某种排序,那么就有办法改进它。 例如,如果数据已排序并且您正在寻找大于某个常量值的第一个元素,那么二进制搜索将保证您在大约十五次迭代中找到它(尽管这些不能被流水线化,因为每个元素都依赖于 在前一个)。 以上来自于谷歌翻译 以下为原文 So the idea is that "(...limitation...)" will only be true for one of the 10,000 iterations? And you can't determine which iteration that will be in advance? There's not really much you can do there. If you need to read 10,000 array elements to find which one is the "right" one, and you can only read one per cycle (from a single-port RAM, or a RAM where one port is permanently connected elsewhere) then that's going to take (up to) 10,000 cycles. If there's some sort of ordering to the data then there are ways to improve it. For example, if the data is sorted and you're looking for the first element larger than some constant value, then a binary search will guarantee that you find it in about fifteen iterations (although these can't be pipelined, because each one depends on the previous one). |
|
|
|
我有以下限制:
for(int i = 0; i if循环内的函数每800次迭代只能正确一次。 以上来自于谷歌翻译 以下为原文 I have the following limitation: for(int i = 0; i < 800; i++){if (inTriangle(x,current,xCoor,yCoor,zCoor)){ .............. ..............}}The function inside the if loop can be correct only one time every 800 iterations. |
|
|
|
你能准确解释代码在做什么吗?
从函数名称,这是我的猜测:你有一个三角形棱镜阵列(“当前”),你正在检查这些点(xCoor,yCoor,zCoor)中的哪一个落入。 它只能落入一个,因为它们不重叠。 问题是如何确定哪一个是相关的。 显而易见的问题是“三角形是否以任何有意义的方式排列?” 举一个简单的例子,其中“当前”只是一个大小相等(每边S单位)立方体的阵列,它们被安排在一个10 * 10 * 8的棱镜中。 找出一个点所在的立方体是微不足道的:(xCoor / S,yCoor / S,zCoor / S)处的立方体是正确的。 然后你根本不需要循环。 三角形更难,但不是更难。 如果三角形的大小不一样,或者排列得不那么整齐,那么你可能会使用类似的东西。 存储一个单独的立方体贴图,并为每个立方体存储每个与该立方体相交的三角形棱镜(如果您知道棱镜的顶点,这很容易)。 这需要更多的内存,但是(如上所述)找出一个点所在的立方体是微不足道的,一旦你知道你只需要检查与该立方体相交的棱镜。 以上来自于谷歌翻译 以下为原文 Can you explain exactly what the code is doing? From the function names, here's my guess: you've got an array of triangular prisms ("current"), and you're checking which one of these a point (xCoor, yCoor, zCoor) falls into. It can only fall into one because they don't overlap. The question is how to determine which one is relevant. The obvious question is "are the triangles arranged in any meaningful way?" Take a simplified example where "current" is just an array of equally-sized (S units per side) cubes and they're arranged in a 10*10*8 prism. Finding out which cube a point is in is trivial: the cube at (xCoor/S, yCoor/S, zCoor/S) is the correct one. Then you don't need the loop at all. It's harder for triangles, but not much harder. If the triangles are not all the same size, or not arranged so neatly, you could potentially use something similar. Store a separate map of cubes, and for each cube store every triangular prism that intersects that cube (which is easy if you know the prism's vertices). This takes a bit more memory, but (as above) it's trivial to find out which cube a point is in, and once you know that you only have to check for prisms that intersect that cube. |
|
|
|
代码很简单。
我有一个随机点“x”。 这一点有三个维度(float xcoor,float ycoor,float zcoor)。 'current'是一个包含三个成员整数的类数组。 所以,三个点形成一个三角形。 在“当前”里面有一堆三角形。 所以,我想找到点“x”在里面的三角形“current ”。 三角形在“当前”内是随机的。 所以我必须逐个检查才能找到正确的。 没有办法匹配两个或更多的三角形。 以上来自于谷歌翻译 以下为原文 The code is simple. I am having a random point ' x '. This point has three dimensions (float xcoor, float ycoor, float zcoor). The 'current' is a class array with three member integers. So, three points make a triangle. Inside the "current" there is a bunch of triangles. So, I want to find the triangle "current" that the point 'x' is inside. The triangles are random inside the "current". So I have to check one by one to find the correct. There is no way to match two triangles or more. |
|
|
|
嗯,这很烦人。
我能看到的唯一解决方案是将所有内容转换为整数或定点; 至少它会明显更快(应该能够获得单周期性能)。 以上来自于谷歌翻译 以下为原文 Well, that's pretty annoying. The only solution I can see is to convert everything to integer or fixed-point; at least then it'll be significantly faster (should be able to get single-cycle performance). |
|
|
|
根据你的建议将浮点数转换为整数,你是100%正确的。
我减少了很多周期的延迟。 你能不能给我一个把浮点数转换成整数的最佳方法......? 非常感谢...!!! 以上来自于谷歌翻译 以下为原文 According to your advise to convert floats to integers you are 100% correct. I reduce the latency a lot of cycles. Could you please give me the best way to convert floats to integers...?? Thanks a lot...!!! |
|
|
|
通常的方法是确定您需要的分辨率,以及您需要的范围。
例如,您的坐标可能在0到1000的范围内,您可能需要0.1个单位的分辨率。 为此,有意义的是使用无符号数据类型(因为值不能为负),具有10个整数位和4个分数位,最大范围为0到1023.9375,步长为0.0625。 您可以选择将其存储在14位整数(即移位值增加4位)或ap_ufixed定点值中。 只要将总宽度保持在17位(或有符号值的18位)下,它就只需要一个DSP48切片进行乘法运算。 以上来自于谷歌翻译 以下为原文 The usual method would be to determine what resolution you need, and what range you need. For example, your coordinates might be in the range of 0 to 1000, and you might require a resolution of 0.1 units. For this it makes sense to use an unsigned data type (as the values can't be negative), with 10 integer bits and 4 fraction bits, for a maximum range of 0 to 1023.9375 in steps of 0.0625. You have the option of storing this in a 14-bit integer (ie shift values up by four bits) or in an ap_ufixed<14,4> fixed-point value. As long as you keep the total width under 17-bit (or 18-bit for signed values) it'll only require a single DSP48 slice to do a multiply. |
|
|
|
只有小组成员才能发言,加入小组>>
2370 浏览 7 评论
2788 浏览 4 评论
Spartan 3-AN时钟和VHDL让ISE合成时出现错误该怎么办?
2255 浏览 9 评论
3331 浏览 0 评论
如何在RTL或xilinx spartan fpga的约束文件中插入1.56ns延迟缓冲区?
2420 浏览 15 评论
有输入,但是LVDS_25的FPGA内部接收不到数据,为什么?
743浏览 1评论
请问vc707的电源线是如何连接的,我这边可能出现了缺失元件的情况导致无法供电
531浏览 1评论
求一块XILINX开发板KC705,VC707,KC105和KCU1500
345浏览 1评论
748浏览 0评论
1949浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-11-15 11:23 , Processed in 1.423283 second(s), Total 91, Slave 74 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号