赛灵思
直播中

李桂香

7年用户 171经验值
私信 关注
[问答]

怎么减少循环延迟

各位大家好,我需要您对以下问题发表意见。
假设我们有以下代码:
int Array [10000] = {0};
for(int i = 0; i
这个循环有10000次迭代,我只需要一次迭代来获取我的数据,但我不知道哪个迭代是正确的。
所以我没有任何理由有很多延迟。
有没有办法减少指令无用的迭代(例如array_partition)...... ???
我也使用展开和管道,但没有任何积极的结果。
提前致谢...!!!

以上来自于谷歌翻译


以下为原文

Hello everyone, I need your opinion about the following issue. Let's say that we have the following code:

int Array[10000]={0};for(int i=0; i<10000;i++){       //fixed bound loop      if(...limitation....){          ..............          ..............      }}  This loop has 10000 iterations and I need only one iteration to take my data but I don't know which iteration is correct. So I have a lot of latency without any reason.

Is there any way to reduce the useless iterations with directives (for example array_partition)...???

I also use unroll and pipeline but without any positive results.
Thanks in advance...!!!   

回帖(8)

姜雨孜

2018-11-1 09:09:42
嗯,这很烦人。
我能看到的唯一解决方案是将所有内容转换为整数或定点;
至少它会明显更快(应该能够获得单周期性能)。
在原帖中查看解决方案

以上来自于谷歌翻译


以下为原文

Well, that's pretty annoying. The only solution I can see is to convert everything to integer or fixed-point; at least then it'll be significantly faster (should be able to get single-cycle performance).
View solution in original post
举报

姜雨孜

2018-11-1 09:15:00
所以这个想法是“(...限制......)”只适用于10,000次迭代中的一次?
而你无法确定提前进行哪次迭代?
你可以做的并不多。
如果你需要读取10,000个数组元素来找到哪个是“正确的”,并且你每个周期只能读一个(来自单端口RAM,或者一个端口永久连接在别处的RAM)那么那就是
采取(最多)10,000个周期。
如果对数据有某种排序,那么就有办法改进它。
例如,如果数据已排序并且您正在寻找大于某个常量值的第一个元素,那么二进制搜索将保证您在大约十五次迭代中找到它(尽管这些不能被流水线化,因为每个元素都依赖于
在前一个)。

以上来自于谷歌翻译


以下为原文

So the idea is that "(...limitation...)" will only be true for one of the 10,000 iterations? And you can't determine which iteration that will be in advance?
 
There's not really much you can do there. If you need to read 10,000 array elements to find which one is the "right" one, and you can only read one per cycle (from a single-port RAM, or a RAM where one port is permanently connected elsewhere) then that's going to take (up to) 10,000 cycles.
 
If there's some sort of ordering to the data then there are ways to improve it. For example, if the data is sorted and you're looking for the first element larger than some constant value, then a binary search will guarantee that you find it in about fifteen iterations (although these can't be pipelined, because each one depends on the previous one).
 
 
 
举报

俞敏东

2018-11-1 09:28:46
我有以下限制:
for(int i = 0; i
if循环内的函数每800次迭代只能正确一次。

以上来自于谷歌翻译


以下为原文

I have the following limitation:
 
for(int i = 0; i < 800; i++){if (inTriangle(x,current,xCoor,yCoor,zCoor)){   ..............   ..............}}The function inside the if loop can be correct only one time every 800 iterations.
举报

姜雨孜

2018-11-1 09:38:01
你能准确解释代码在做什么吗?
从函数名称,这是我的猜测:你有一个三角形棱镜阵列(“当前”),你正在检查这些点(xCoor,yCoor,zCoor)中的哪一个落入。
它只能落入一个,因为它们不重叠。
问题是如何确定哪一个是相关的。
显而易见的问题是“三角形是否以任何有意义的方式排列?”
举一个简单的例子,其中“当前”只是一个大小相等(每边S单位)立方体的阵列,它们被安排在一个10 * 10 * 8的棱镜中。
找出一个点所在的立方体是微不足道的:(xCoor / S,yCoor / S,zCoor / S)处的立方体是正确的。
然后你根本不需要循环。
三角形更难,但不是更难。
如果三角形的大小不一样,或者排列得不那么整齐,那么你可能会使用类似的东西。
存储一个单独的立方体贴图,并为每个立方体存储每个与该立方体相交的三角形棱镜(如果您知道棱镜的顶点,这很容易)。
这需要更多的内存,但是(如上所述)找出一个点所在的立方体是微不足道的,一旦你知道你只需要检查与该立方体相交的棱镜。

以上来自于谷歌翻译


以下为原文

Can you explain exactly what the code is doing? From the function names, here's my guess: you've got an array of triangular prisms ("current"), and you're checking which one of these a point (xCoor, yCoor, zCoor) falls into. It can only fall into one because they don't overlap. The question is how to determine which one is relevant.
 
The obvious question is "are the triangles arranged in any meaningful way?" Take a simplified example where "current" is just an array of equally-sized (S units per side) cubes and they're arranged in a 10*10*8 prism. Finding out which cube a point is in is trivial:  the cube at (xCoor/S, yCoor/S, zCoor/S) is the correct one. Then you don't need the loop at all. It's harder for triangles, but not much harder.
 
If the triangles are not all the same size, or not arranged so neatly, you could potentially use something similar. Store a separate map of cubes, and for each cube store every triangular prism that intersects that cube (which is easy if you know the prism's vertices). This takes a bit more memory, but (as above) it's trivial to find out which cube a point is in, and once you know that you only have to check for prisms that intersect that cube.
举报

更多回帖

×
20
完善资料,
赚取积分