完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
在我的研究工作中,Xilinx FPGA中的大规模并行处理器阵列(例如,100s的32b RISC和6VLX240T中的路由器),我的设计使用分层RPM来平铺(并填充)设备。
这些又是由原始元素和(通常)手工技术映射的LUT构建的。 为了获得最佳的结果质量,我寻找了最小化数据路径原语的方法,例如通过使用lut_map'd LUT将多路复用器和ALU折叠成LUT,并携带合成工具仍未找到的prims。 除了手动技术映射之外,我还使用分层RLOC来管理这些模块的放置,从而获得快速且确定性的PAR运行,并从我的关键路径中削减数十个百分点。 在我的设计中,通常> 50%的基元是手工技术映射和/或手工放置。 自1995年以来,我一直使用这种方法,尽管多年来经历了一些起伏,但它一直很棒。 在准备转向Vivado for 7系列设备时,我一直在审查Vivado实施文档。 注意:我还没有使用过这些新的Vivadotools。 在UG901中,我发现在HDL中不支持lut_map和rloc属性。 在UG903中,我也没有看到对先前ISE约束指南中的RLOC约束或类似概念的支持。 是全新文档的这些mereshortcomings,还是lut_map和rloc消失了? Xilinx是否终止了对7系列及更高版本设备的RPM的支持? (如果RPM是历史记录:也许可以使用XCF文件中的LOC约束(其中100,000个)来克服RLOC的丢失 - 但对于不拥有顶级设计的IP供应商而言,这将无法解决。 我不知道如何解决丢失lut_map以接管关键设计元素的技术映射。) 非常感谢您对此问题的任何指导。 以上来自于谷歌翻译 以下为原文 In my research work in massively parallel processor arrays in Xilinx FPGA (e.g. 100s of 32b RISCs and routers in one 6VLX240T), my designs tile (and fill) the device with hierarchical RPMs. These in turn are built up from primitive elements and (often) hand-technology-mapped LUTs. For very best quality of results I seek out ways to minimize datapath primitives, for example by folding muxes and ALUs into LUTs using lut_map'd LUTs and carry prims that the synthesis tools still don't find. Besides manual technology mapping, I also manage placement of these modules using hierarchical RLOCs to get fast and deterministic PAR runs and to shave many tens of per cent from my critical paths. In my designs often >50% of the primitives are hand technology mapped and/or hand placed. I have used this methodology since 1995 and despite some ups and downs over the years, it's been great. In preparing to move to Vivado for 7 series devices and beyond, I've been reviewing the Vivado implementation docs. Note: I haven't used these new Vivado tools yet. In UG901, I can find no support for lut_map and rloc attributes in HDL. In UG903, I also don't see support for RLOC constraints or similar concepts from previous ISE Constraints guides. Are these mere shortcomings of the brand new documentation, or are lut_map and rloc gone? Is Xilinx ending support for RPMs in 7 Series and later devices? (If RPMs are history: Perhaps one can overcome the loss of RLOCs using LOC constraints in the XCF file (100,000s of them) -- but that won't work fwell for IP vendors who don't own the top level design. And I don't see how one can work around the loss of lut_map to take over technology mapping of critical design elements.) Thank you very much for any guidance on this concern. |
|
相关推荐
27个回答
|
|
Vivado支持RLOC。
我不熟悉lut_map,那是一个综合约束吗? 我的专业知识仅限于实施工具,这是实施工具论坛。 支持LUT约束LOCK_PINS和LUTNM。 在原帖中查看解决方案 以上来自于谷歌翻译 以下为原文 RLOCs are supported in Vivado. I'm not familiar with lut_map, is that a synthesis constraint? My expertise is limited to implementation tools and this is the implementation tool forum. The LUT constraints LOCK_PINS and LUTNM are supported. View solution in original post |
|
|
|
Vivado支持RLOC。
我不熟悉lut_map,那是一个综合约束吗? 我的专业知识仅限于实施工具,这是实施工具论坛。 支持LUT约束LOCK_PINS和LUTNM。 以上来自于谷歌翻译 以下为原文 RLOCs are supported in Vivado. I'm not familiar with lut_map, is that a synthesis constraint? My expertise is limited to implementation tools and this is the implementation tool forum. The LUT constraints LOCK_PINS and LUTNM are supported. |
|
|
|
感谢您的及时答复!
我非常高兴能够继续支持Vivado的RLOC。 是的,lut_map是一个综合约束,它强制将一些组合逻辑的技术映射到一个LUT(然后可以成为RLOC属性的目标)。 我将询问合成论坛中对lut_map的持续支持。 以上来自于谷歌翻译 以下为原文 Thank you for your prompt reply! I am very pleased to hear there is continued support for RLOCs in Vivado. Yes, lut_map is a synthesis constraint that forces technology mapping of some combinatorial logic into one LUT (which can then be the target of an RLOC attribute). I will ask about ongoing support for lut_map in the synthesis forum. |
|
|
|
我应该补充说,作为XDC约束,尚未支持RLOC约束,但即将到来。
您可以在2012.2中的RTL中使用它们。 我认为由于缺乏XDC支持,它还没有记录。 以上来自于谷歌翻译 以下为原文 I should add that RLOC constraints are not yet supported as XDC constraints but that is coming. You can use them in the RTL in 2012.2. I suppose it's not documented yet due to the lack of XDC support. |
|
|
|
似乎RLOC在vivado 2014.4中用于单级别的层次结构,但它们似乎不适用于包含RLOC的组件被实例化。
换句话说,如果在层次组件中有多于一个级别分配的RLOC,则显然仅使用叶节点上的RLOC值。 RLOC似乎没有传播到下一个级别。 是否可以采取措施来强制执行RLOC的分层使用,就像过去20年前的工具一样? 以上来自于谷歌翻译 以下为原文 It appears that the RLOCs work for a single level of hiearchy in vivado 2014.4, however they don't seem to apply to a component containing RLOCs is instantiated. In other words, if there are RLOCs assigned at more than one level in a hiearchical component, apparently only the RLOC values on the leaf nodes are used. The RLOCs don't seem to propagate up to the next level. Is there something that can be done to enforce the hierarchical use of RLOCs as has been done in previous tools dating back the last 20 years? |
|
|
|
嗨雷,
RLOC在不同的层次级别受到支持,并且与ISE工具中的相同。 布雷特 以上来自于谷歌翻译 以下为原文 Hi Ray, RLOCs are supported at different hierarchy levels and accumulate the same as they did in the ISE tools. Bret |
|
|
|
布雷特,谢谢你的快速回复。
我有一个V7宏,只要放置组件,在ISE 14.7下工作正常。 在Vivado 2014.4下,它将基元正确放置在叶片RPM中,但上面的RLOC不会传播到基元。 换句话说,如果我在每个触发器上创建一个包含RLOC的16位放置寄存器,那么在每个实例化中将RLOC放置在下一级中的两个放置它们将它们对齐在相邻列中,放置的结果具有两个 16位寄存器构造正确,但是其中的组件不是相对于彼此放置的。 也许现在需要明确的HU_SET或以前不存在的东西? 以上来自于谷歌翻译 以下为原文 Bret, thanks for your quick reply. I have a V7 macro that works fine under ISE 14.7 as far as placing the components. Under Vivado 2014.4, it is placing the primitives correctly within a leaf RPM, but the RLOC above is not propagating to the primitives. In otherwords, if I create for example a 16 bit placed register containing RLOCs on each flip-flop, then instantiate two of those in the next level up with RLOCs on each instantiation placing them aligned in adjacent columns, the placed result has both of the 16 bit registers constructed properly, but the register components are not being placed relative to one another. Maybe there is a need for explicit HU_SETs or something now that wasn't there before? |
|
|
|
我希望隐式设置机制能够工作,但尝试一个明确的设置是一个好主意。
布雷特 以上来自于谷歌翻译 以下为原文 I would expect the implict set mechanism to work, but trying an explicit set is a good idea. Bret |
|
|
|
布雷特,
谢谢您的帮助。 通过将synthesisflatten层次结构转为无,我能够接受层次结构。 它违约了“重建”。 我不能告诉你我有多高兴我的基于ISE的IP和大量的嵌入式放置仍然可以在Vivado下工作。 以上来自于谷歌翻译 以下为原文 Bret, Thanks for the help. I was able to get it to accept the hierarchy by turning the synthesis flatten hierarchy to none. It had defaulted to "rebuild". I can't tell you how happy I am that my ISE based IP with tons of embedded placement will still work under Vivado. |
|
|
|
很高兴听到它为你工作Ray。
我所知道的任何人都没有比你更好地使用这个功能。 布雷特 以上来自于谷歌翻译 以下为原文 Glad to hear it's working for you Ray. Nobody that I know of puts this feature to better use than you do. Bret |
|
|
|
我很高兴randraka确定如果你对synth_design使用-flatten_ hierarchy none选项,RPM可以工作。
我花了很多时间试图让在XST下运行良好的RPM在Vivado下运行,根本没有运气,直到我发现了这个问题。 如果你使用-flatten_ hierarchy full,它看起来好像RPM正确传播,尽管我还没有测试到足以确保它始终有效。 然而,这似乎是一个缺陷。 即使使用默认构建配置,RLOC属性也应正确传播。 是否有任何Vivado CR针对层次结构中的RLOC传播? 伊恩 以上来自于谷歌翻译 以下为原文 I am glad that randraka determined that RPMs work if you use the -flatten_ hierarchy none option to synth_design. I spent many hours trying to get RPMs that work fine under XST to work under Vivado with no luck at all until I found that suggetion. It also looks as if RPMs propagate correctly if you use -flatten_ hierarchy full, though I have not tested enough to be sure that always works. However, this seems like a defect. The RLOC attributes should propagate correctly even with the default build configuration. Are there any Vivado CRs against RLOC propagation across hierarchy? Ian |
|
|
|
Ian,公平地说,如果youlet综合平衡了层次结构,分层RLOCs在ISE中没有正确传播。很简单,通过在任何层次组件上注册输出的智能分层设计,很少有任何优势(并且经常引入麻烦)
通过允许合成使设计变平。 我的错误是我没有意识到Vivado是扁平的,然后重建层次结构,直到我开始挖掘。 以上来自于谷歌翻译 以下为原文 Ian, to be fair, hierarchical RLOCs didn't propagate correctly in ISE if you let synthesis flatten the hierarchy either. Frankly, with a smartly done hierarchical design with registered outputs on any hierarchical component, there is rarely any advantage (and frequently troubles introduced) by allowing the synthesis to flatten the design. My mistake was I did not realize that Vivado was flattening and then rebuilding the hierarchy until I started digging. |
|
|
|
是的,在XST中使用RLOC会带来很多麻烦,ISE的新版本经常会破坏它们。
举个例子,ISE 11 XST的一些时间开始自己插入HU_SET名称,这给我带来了各种麻烦,我实际上不得不部分扁平化层次结构以获得一些RPM工作。 多年来我在XST下制作了许多RPM,但我从未完全理解RLOC何时正确传播,何时不能正确传播。 而且,确切的细节从发布到发布都有所改变。 虽然,至少在过去的5年左右,在XST中使用RPM的情况相当稳定(至少我们使用的确切结构来自测试,而不是阅读文档)。 我希望Vivado表现得更好,但我发现在整个等级中没有任何作用。 经过几天的捣乱,我无法让他们工作。 我非常高兴你发现消除扁平化可以解决问题。 而且,有了这些信息,我几乎可以肯定地打破我们的构建,这样我就可以让工具在某些部分的层次结构中进行优化,而不是其他部分。 我还尝试了几个小测试,完全展平了层次结构,这似乎也有效,至少对于我尝试过的3或4个测试。 就个人而言,我认为RPM应该比在XST中更容易工作。 它们非常有用,但它们总是比我想象的要敏感得多。 或者,至少,如果工具会告诉您何时由于某种原因无法维护RPM,那将是有用的,这样您就可以更快地看到配置失败的时间。 我甚至无法让Vivado告诉我,就像XST一样,这通常不足以避免使用完整的实现运行并加载到FPGA编辑器中进行每次测试迭代。 并且,至少对我来说,在第一次使用XST时使RPM工作,或者在XST更改时修复结构需要大量的迭代。 无论如何,我感谢你追踪这个变通方法。 在我找到它之前,我会花更多的时间,也许是几周。 最诚挚的问候,伊恩刘易斯 www.mstarlabs.com 以上来自于谷歌翻译 以下为原文 Yes, using RLOCs with XST took a lot of mucking about, and new releases of ISE often broke them. For an example, some time around ISE 11 XST started inserting HU_SET names on its own, which caused me all kinds of trouble, and I actually had to partially flatten the hierarchy to get a few RPMs to work. I have made many RPMs under XST over many years, but I never have fully understood exactly when an RLOC would propagate correctly, and when not. And, the exact details changed from release to release. Though, for at least the past 5 years or so, things have been pretty stable using RPMs in XST (at least with the exact structures we use, which came from test, rather than reading documentation). I was hoping that Vivado would be more well behaved, but what I found was that nothing worked across hierarchy at all. I was not able to get them to work even after several days of mucking about. I am extremely glad you found that eliminating the flattening worked around the problem. And, with that information, I can almost certainly break down our builds so that I can let the tool optimize across hierarchy in some some sections, but not others. I also just tried a few small tests with fully flattening the hierarchy and that seems to work too, at least for the 3 or 4 tests I have tried. Personally, I do think that RPMs should work more easily than they do in XST. They are very useful, but they have always been much touchier than I think they should be. Or, at least, it would be useful if the tool would tell you when it cannot maintain an RPM for some reason so you can see when you have a failing configuration faster. I have been unable to get Vivado to tell me even as much as XST did, and that was often not enough to avoid using a full implementation run and load into FPGA Editor for each test iteration. And, at least for me, getting RPMs to work the first time you use XST, or fixing the structure when XST changes, takes a lot of iterations. Anyhow, I thank you for tracking down this workaround. I would have spent many more days, maybe weeks, before I would have found it. Best Regards, Ian Lewis www.mstarlabs.com |
|
|
|
雷,伊恩,
使用XST,我是LUT_MAP的重度用户,它允许我轻松地编写(经常参数化)LUT模块,其实例化然后适合RLOC约束和RPM组合。 可悲的是,LUT_MAP约束并没有转移到Vivado综合。 您将什么用于手动技术映射到LUT? 你是否构建了显式的十六进制LUT掩码? 谢谢。 一月 以上来自于谷歌翻译 以下为原文 Ray, Ian, With XST I was a heavy user of LUT_MAP, which allowed me to easily write (oft times parameterized) LUT modules whose instantiations were then amenable to RLOC constraints and RPM composition. Sadly LUT_MAP constraints were not carried forward to Vivado synthesis. What do you use instead for manual technology mapping to LUTs? Do you build explicit hexadecimal LUT masks? Thank you. Jan. |
|
|
|
一月,
我通常不再需要对LUT进行映射,因为LUT映射历来在工具和工具修订版本之间存在不一致。如果将逻辑保持在寄存器之间的逻辑级别,工具将“通常”将逻辑放入 正确的地方。尽管如此,许多次都可以成为令人沮丧的运动,保持信号的属性,以你想要的方式强制施工。 我认为Synplicity在合成基元时仍然遵循其xc_map属性; 这个属性是我几年前开始使用Synplify的原因。 以上来自于谷歌翻译 以下为原文 Jan, I generally don't bother with mapping the LUTs any more, as the LUT map has historically been inconsistent across tools and tool revisions. If you keep the logic to one level of logic between registers, the tools will "usually" put the logic in the right place. Though, many times it can get to be a frustrating exercise with keep attributes on signals to force the construction the way you want it. I think Synplicity still obeys its xc_map attribute when it synthesizes to primitives; that property was the reason I started using Synplify in the first place years ago. |
|
|
|
您可以做的是通过实例化它们并使用属性强制初始化字符串,将结构逻辑显式生成到LUT中。
也许显示您试图强制使用的逻辑的代码片段可以帮助我们了解您要执行的操作? 一般来说,编译RTL以推断出你想要的逻辑要好得多。 我真的很好奇你是如何认为你手工绘制的效率要高于工具推断的效果。 Xilinx公司的Greg Daughtry Vivado产品营销总监 以上来自于谷歌翻译 以下为原文 What you can do is explicitly generate structural logic into LUTs by instantiating them and forcing the initialization string with attributes. Perhaps a code snippet showing the logic you are trying to force would help us understand what you are trying to do? Generally you are far better off coding your RTL to infer the logic you want. I'm really curious how you think you are getting more efficient lut mapping by hand than what the tool will give you with inference. Greg Daughtry Vivado Product Marketing Director, Xilinx, Inc. |
|
|
|
格雷格
一般来说,是的。 但是,在高度优化的设计中,通常需要将LUT相对于其他设计元素放置。 lut_map属性将逻辑强制转换为LUT,然后可以在RTL代码中将RLOC和BEL属性附加到它。 如果你让工具进行映射,那么你也不能指定LUT的位置,如果有超过4个(对于4-lut)输入,你对逻辑的方式只有有限的控制。 分布在LUT之间,这可能是高度优化设计中的重大障碍。 一个特别麻烦的领域是在进位链之前控制LUT,如果用手完成,则可以导致比允许工具自由行进的LUT更紧凑或更快的设计。 一月, 一个可能的解决方法可能是一个被调用的函数,它将一个逻辑方程转换为一个LUT初始字符串。有人在互联网上的某个地方有一个VHDLcode片段,我在想10到15年前。 它有效,但很麻烦。 可能会尝试搜索它。 以上来自于谷歌翻译 以下为原文 Greg, Generally, yes. However, in a highly optimized design, it is often necessary to place the LUTs relative to other design elements. The the lut_map attribute the logic is forced into a LUT that can then have RLOC and BEL attributes attached to it in the RTL code. If you let the tools do the mapping, then you can't also specify the placement of the LUTs, and in cases where there are more than 4 (for a 4-lut) inputs, you have only limited control on how the logic is distributed between the LUTs, which can be a significant obstacle in highly optimized designs. A particularly troublesome area is control of the LUTs preceding a carry chain, which if done by hand can result in a more compact or faster design than one where the tools are allowed free reign. Jan, A possible work-around might be a called function that converts a logic equation to a LUT init string. Someone had a VHDL code snippet that did that somewhere on the internet, I'm thinking 10-15 years ago. It worked, but was cumbersome. Might try a search for that. |
|
|
|
格雷格,非常感谢你的提问。
这是一个例子。 请参阅我的演讲“FPGA软处理器的过去与未来”中的幻灯片22-33。 https://fpgacpu.wordpress.com/2014/12/31/the-past-and-future-of-fpga-soft-processors/ https://fpgacpu.files.wordpress.com/2014/12/reconfig-14-the-past-and-future-of-fpga-soft-processors.pdf 这是对如何在一个Virtex-7 690T中实现1000个MicroBlaze子集处理器和250个2D环面路由器的NOCshared内存互连结构的限制研究。 幻灯片26显示了紧凑型MicroBlaze子集CPU内核的数据路径。 幻灯片27-28显示它是技术映射和布局规划为RPM。 幻灯片33(和60)显示了690T平面图的一个版本(完整的数据路径,但部分PE控制单元)。 一个RPM,设计的关键,ALU,在一个6-LUT /位中使用RPM组成32b +, - ,&,|,^,...,每个切片组成8个5-LUT和一个CARRY4。 加法和减法促进进位和逻辑运算零进位。 我最后一次看,Vivado合成不会合成一个LUT /比特结果,例如 如果写为a + b,a-b,a& b,a | b,a ^ b的case语句。 (还有更多,特殊的RPM MSB逻辑实现了MicroBlaze CMP / CMPU,我在ALU以北更多地携带逻辑来削减快速条件分支的关键路径,但无论如何,合成可能是面积的2倍,而且速度更慢。) PC加法器是多路复用器(a + b,a + c,b + c,a + k)或类似的东西,一个LUT /位。 我不记得Vivado是否将其合成到一个LUT /位。 操作数多路复用器为8b 2-1多路复用器,每个片有8b个寄存器。 综合应该是正确的。 但它不会为我制作RPM。 所有这些都被组合成一个更大的RPM,这意味着当你获得3.3 ns的时序收敛时,瓦片大小为1,然后复制瓦片1000次,它仍然会在整个设计中使用4x nstiming闭包。 一个图块或4个图块或16个图块的集群没有拥塞,意味着1000个图块没有拥塞。 (当然,在较少的QOR敏感设计中,使用pblock等进行基于区域的布局规划就足够了) 在其他设计中,我有多个实例化的解码器和其他逻辑,每个都在一个LUT中实现,用于不同的参数值#(。I(0)),#(。I(1))等,这些参数很方便地被吸收到 LUT_MAP约束模块的LUT。 关于LUT_MAP的另一个好处是生成的Verilog仍然可移植和可模拟。 然而,如果我实例化具有十六进制INIT真值表属性的explicitLUT原语,我必须保持两个版本(非Xilinx和Xilinx LUT实例化)同步,这是一个维护陷阱。 如果没有用于挂起RLOC约束的原始实例化,就无法构建RPM,而LUT_MAP(xc_map,FMAP,回到FPGA的曙光)是执行LUT实例化的官方非十六进制真值表方式。 根据要求,这是一个简单的代码片段。这是Python到Veriliog RPM生成器的输出。 (当用普通的Verilog编写时,我尽可能使用'generate for'。) (* LUT_MAP =“是”*)模块mux2(sel,a,b,o); 输入sel; 输入a; 输入b; 输出o; 指定o =(~sel& a)| (sel& b); endmodule (* KEEP_HIERARCHY =“true”*)模块mux2x8p8(sel,a,b,o); 输入sel; 输入[7:0] a; 输入[7:0] b; 输出[7:0] o; (* RLOC =“X0Y0”*)mux2 m0(.sel(sel),. a(a [0]),. b(b [0]),. o(o [0])); (* RLOC =“X0Y0”*)mux2 m1(.sel(sel),. a(a [1]),. b(b [1]),. o(o [1])); (* RLOC =“X0Y0”*)mux2 m2(.sel(sel),. a(a [2]),. b(b [2]),. o(o [2])); (* RLOC =“X0Y0”*)mux2 m3(.sel(sel),. a(a [3]),. b(b [3]),. o(o [3])); (* RLOC =“X0Y0”*)mux2 m4(.sel(sel),. a(a [4]),. b(b [4]),. o(o [4])); (* RLOC =“X0Y0”*)mux2 m5(.sel(sel),. a(a [5]),. b(b [5]),. o(o [5])); (* RLOC =“X0Y0”*)mux2 m6(.sel(sel),. a(a [6]),. b(b [6]),. o(o [6])); (* RLOC =“X0Y0”*)mux2 m7(.sel(sel),. a(a [7]),. b(b [7]),. o(o [7])); endmodule Rayis在VHDL中应该可以合成INIT真值表,但我承认我更喜欢Verilog / SystemVerilogto VHDL的std_logic等。 此外,对于深度流水线设计,可能需要转速FF,行为合成LUT,并让MAP将LUT捕捉到放置的FF。 但这并不总是奏效。 通常阻力最小的路径只是研究RPM,至少你知道结果不会让你感到惊讶或失望。 鉴于可用的解决方法,我知道Xilinx不会带回LUT_MAP。 但Xilinx似乎不幸在这一领域取得了竞争优势。 另一个选择可能是为Vivado综合的技术映射引擎提供可扩展的插件接口,因此专家可以教它新的技术映射技巧。 也就是说,我很满意所有这些伟大的工具,为我的工作提供动力。 再次感谢。 以上来自于谷歌翻译 以下为原文 Greg, thank you very much for asking. Here's an example. Please see slides 22-33 in my talk The Past and Future of FPGA Soft Processors. https://fpgacpu.wordpress.com/2014/12/31/the-past-and-future-of-fpga-soft-processors/ https://fpgacpu.files.wordpress.com/2014/12/reconfig-14-the-past-and-future-of-fpga-soft-processors.pdf This is a limit study of how to implement 1000 MicroBlaze subset processors and a NOC shared memory interconnect fabric of 250 2D torus routers, in one Virtex-7 690T. Slide 26 shows a datapath for a compact MicroBlaze subset CPU core. Slide 27-28 show how it is technology mapped and floorplanned as an RPM. Slide 33 (and 60) show one version the 690T floorplan (complete datapaths but partial PE control units). One RPM, the linchpin of the design, the ALU, implements 32b +,-,&,|,^,... in one 6-LUT/bit using an RPM that composes 8 5-LUTs and a CARRY4 per slice. Add and subtract promote carry and logical operations zero the carry. The last time I looked, Vivado synthesis will not synthesize a one LUT/bit result for this e.g. if written as a case statement of a+b, a-b, a&b, a|b, a^b. (There's more to it, special RPM MSB logic implements MicroBlaze CMP/CMPU and I do more carry logic north of the ALU to shave the critical path for fast conditional branches, but anyway, synthesis is probably 2X the area, and slower.) The PC adder is a mux(a+b,a+c,b+c,a+k) or something like that, one LUT/bit. I don't recall if Vivado synthesizes this to one LUT/bit or not. The operand muxes are 8b 2-1 muxes with 8b registers per slice. Synthesis should get this right. But it won't make an RPM from it for me. All of this is composd into a larger RPM, which means when you get 3.3 ns timing closure for a tile size of one, then replicate the tile 1000 times, it will still make 4.x ns timing closure across the whole design. No congestion for one tile or clusters of 4 tiles, or 16 tiles, means no congestion for 1000 tiles. (Of course in less QOR sensitive designs it suffices to do area based floorplanning with pblocks, etc.) In other designs, I have multiply instantiated decoders with other logic, implemented in one LUT each, for different parameter values #(.I(0)), #(.I(1)) etc. and these parameters are conveniently absorbed into the LUT of the LUT_MAP constrained module. The other great thing about LUT_MAP is the resulting Verilog remains portable and simulatable. Whereas if I instantiate an explicit LUT primitive with a hex INIT truth table attribute, I have to have to keep two versions (non-Xilinx, and Xilinx LUT instantiation) in sync, a maintenance pitfall. You can't build RPMs without primitive instantiations on which to hang the RLOC constraints, and LUT_MAP (xc_map, FMAP, back to the dawn of FPGAs) was the official non-hex-truth-table way to do LUT instantiations. As requested here's a simple code snippet. This is the output of a Python to Veriliog RPM generator. (When writing in plain Verilog I use 'generate for' when possible.) (* LUT_MAP="yes" *) module mux2(sel, a, b, o); input sel; input a; input b; output o; assign o = (~sel&a) | (sel & b); endmodule (* KEEP_HIERARCHY="true" *) module mux2x8p8(sel, a, b, o); input sel; input [7:0] a; input [7:0] b; output [7:0] o; (* RLOC="X0Y0" *) mux2 m0(.sel(sel), .a(a[0]), .b(b[0]), .o(o[0])); (* RLOC="X0Y0" *) mux2 m1(.sel(sel), .a(a[1]), .b(b[1]), .o(o[1])); (* RLOC="X0Y0" *) mux2 m2(.sel(sel), .a(a[2]), .b(b[2]), .o(o[2])); (* RLOC="X0Y0" *) mux2 m3(.sel(sel), .a(a[3]), .b(b[3]), .o(o[3])); (* RLOC="X0Y0" *) mux2 m4(.sel(sel), .a(a[4]), .b(b[4]), .o(o[4])); (* RLOC="X0Y0" *) mux2 m5(.sel(sel), .a(a[5]), .b(b[5]), .o(o[5])); (* RLOC="X0Y0" *) mux2 m6(.sel(sel), .a(a[6]), .b(b[6]), .o(o[6])); (* RLOC="X0Y0" *) mux2 m7(.sel(sel), .a(a[7]), .b(b[7]), .o(o[7])); endmodule Ray is right that in VHDL it is should be possible to synthesize the INIT truth tables but I confess I prefer Verilog / SystemVerilog to VHDL's std_logic etc. Also for deeply pipelined designs it may suffice to RPM the FFs, behaviorally synthesize the LUTs, and let MAP snap the LUTs to the placed FFs. But this doesn't always work. Often the path of least resistance is just to grind out the RPM and at least you know the result will not surprise or disappoint you. Given the availability of workarounds, I know Xilinx is not going to bring back LUT_MAP. But it seems unfortunate for Xilinx to abandon its competitive advantage in this area. Another option might be to provide an extensible plug-in interface to Vivado synthesis's technology mapping engine so experts can teach it new technology mapping tricks. That said, I am grateful for all these great tools that power my work. Thanks again. |
|
|
|
一月,
我还没有任何有用的建议来取代LUT_MAP。 到目前为止,我无法让Vivado综合工作得很好地构建大型设计。 而且,我还没有弄清楚如何隔离问题,以便我可以解决它们并报告它们。 至少最重要的问题是,跨实体的通用传播失败,我还没有在一个小例子中出现。 我已经很努力地解决了这个问题。 我试图让2015.1看看是否有任何修复(目前我无法下载工作)。 就目前而言,我已经退回到与Vivado一起发布的XST并通过Vivado实现的综合。 这看起来效果很好,尽管使用XST并不是很吸引人,因为它已经消失了。 对于我们几乎所有的RPM,我们在任何触发器之前保留条件为4个或更少的输入。 有了它,XST几乎总是把LUT放在正确的位置。 以几百ps为代价,在少数情况下,我们使用了一个锁存器,在运行时配置为直通,以便在正确的位置获得LUT(在RPMed锁存器之前再次使用4个或更少的输入)。 我没有Vivado工作得很好,能够说出我们的方法在Vivado下的运作情况。 我们使用LUT_MAP的程度足够小,以至于我们很可能会在少数地方切换到使用适当的INIT掩码显式实例化基元,我们确实需要强制独立于FF的LUT的RPM位置。 如果我们想出任何有用的方法将方程放在特定的LUT中,我会在这里发布。 伊恩 以上来自于谷歌翻译 以下为原文 Jan, I do not have any useful suggestions yet for replacing LUT_MAP. So far I have been unable to get Vivado synthesis to work well enough to build big designs. And, I have yet to figure out how to isolate the problems so that I can work around them and report them. At least the most important problem, a failure of generic propagation across entities, I have yet to make show up in a small example. I have tried quite hard to isolate the issue. I am trying to get 2015.1 to see whether that fixes anything (currently I cannot get the download to work). For now, I have fallen back to synthesis with the XST that ships with Vivado and implementation through Vivado. That appears to be working pretty well, though using XST is not very appealing since it is going away. For almost all of our RPMs we keep the terms before any flip-flop to 4 or fewer inputs. With that, XST almost always placed the LUT in the right place. At the cost of a few hundred ps, in a few cases we have used a latch, configured at run-time as pass-through, to get a LUT in the right place (again with 4 or fewer inputs before the RPMed latch). I do not have Vivado working well enough to be able to tell how well our approach is going to work under Vivado. We use LUT_MAP little enough that most likely we will switch to explicit instantiation of the primitive with an appropriate INIT mask in the few places where we really need to force a LUT's RPM location independently of a FF. If we come up with anything useful for placing equations in a specific LUT I will post it here. Ian |
|
|
|
只有小组成员才能发言,加入小组>>
2380 浏览 7 评论
2797 浏览 4 评论
Spartan 3-AN时钟和VHDL让ISE合成时出现错误该怎么办?
2262 浏览 9 评论
3335 浏览 0 评论
如何在RTL或xilinx spartan fpga的约束文件中插入1.56ns延迟缓冲区?
2428 浏览 15 评论
有输入,但是LVDS_25的FPGA内部接收不到数据,为什么?
756浏览 1评论
请问vc707的电源线是如何连接的,我这边可能出现了缺失元件的情况导致无法供电
545浏览 1评论
求一块XILINX开发板KC705,VC707,KC105和KCU1500
366浏览 1评论
1963浏览 0评论
682浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-11-22 22:54 , Processed in 1.478024 second(s), Total 83, Slave 76 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号