发帖

电子发烧友论坛
› 小组 › > 厂商 > Microchip
/ PIC32MZ2048EFH144浮点32乘浮法运算32应该花多长时间？ ...

[问答]

PIC32MZ2048EFH144浮点32乘浮法运算32应该花多长时间？

1661 PIC32 乘法器 FPU

问答对人有帮助，内容完整，我也想知道答案 0 嗨，我使用ApIC32 MZ2048 EFH144运行@ 200 MHz。我需要在我的代码中实现多个浮点乘法。在数据表中提到，这个MCU有硬件单周期乘法器和FPU。我想知道浮点32乘浮法运算32应该花多长时间？我使用的示例代码如下：浮点A，B，C；LeD1＝1；C= AB；ReD1＝0；在我的测试中，上面乘法的执行大约140NS（虽然每个周期是5NS），所以代码需要大约70个周期来运行。我做了什么错事？以上来自于百度翻译以下为原文 Hi everyone I'm using a PIC32MZ2048EFH144 running @ 200MHz. I need to implement a number of float multiplications in my code. The multiplications take much longer than it should. As mentioned in the datasheet this MCU has hardware single cycle multiplier and also an FPU. I want to know how long should a float_32 by float_32 multiplication take? a sample code that I used is as follows: float a,b,c; LED1 = 1; c = a b; LED1 = 0; In my test the execution of above multiplication took about 140nS (although each cycle is 5nS) so the code takes about 70 cycles to run. What am I doing wrong? 0
2018-12-12 15:06:16　　评论淘帖0 邀请回答您可以邀请以下用户，快速回答问题 × qun333 该类别下有 57 个回答。邀请回答技术发广告该类别下有 56 个回答。邀请回答 caoguiqun 该类别下有 56 个回答。邀请回答 psw30 该类别下有 44 个回答。邀请回答伊丹姿子该类别下有 41 个回答。邀请回答 uwufwjrw 该类别下有 39 个回答。邀请回答 hetao1111 该类别下有 38 个回答。邀请回答 lc123617 该类别下有 35 个回答。邀请回答欧阳大大该类别下有 32 个回答。邀请回答作死不止该类别下有 31 个回答。邀请回答 asd010 该类别下有 31 个回答。邀请回答 60user92 该类别下有 29 个回答。邀请回答 wy8719 该类别下有 29 个回答。邀请回答 haikitty 该类别下有 28 个回答。邀请回答 ju978779 该类别下有 28 个回答。邀请回答嘻嘻爱哈哈该类别下有 28 个回答。邀请回答 xiuzhen122 该类别下有 27 个回答。邀请回答 pol666 该类别下有 27 个回答。邀请回答 ueywyrsdfs 该类别下有 26 个回答。邀请回答 lhly23 该类别下有 25 个回答。邀请回答举报喻唯相关推荐 • PIC32MZ(EC) Board + PIC32MZ2048EFH144出现错误 2357 • 如何将例如PIC32MZ2048EFH144连接到RN4677 BlueTooth模块 1201 • 怎么从ech pic32转换为efh pic32 1282 • PIC32MZ2048EFH144图形和遗留MLA问题 875 • PIC32 MZ2048 EFH144I无法识别目标设备 3381 • PIC 32 MZ2048 EFH脚印是否兼容？ 935 • PIC32MZ入门工具包UDP通信怎么编程 1072 • PIC32 MCU的PMP并行口，8bits data，驱动LCD(HX8352 driver)显示，使用的是PIC32MZ2048EFH144 3240 • 闪存页擦除是否会暂停PIC32MZ EF上的CPU？ 1871 • PiC32MZ外部沿触发的中断保持时间？ 699 5个回答

答案对人有帮助，有参考价值 0 140/5＝28，而不是70。可能是将变量移到FPU的时间。外围接入时间也会被关断。以上来自于百度翻译以下为原文 140 / 5 = 28, not 70. Probably time taken to move variables to/from FPU. also peripheral access time to turn LED off.

2018-12-12 15:25:47 评论举报杨晓静

答案对人有帮助，有参考价值 0 你有优化吗？完整的IEEE浮点支持通常会增加一些开销。GCC有一些优化开关（例如FFAST数学，-FUNACT数学优化），允许更好的优化，但会导致意想不到的结果。以上来自于百度翻译以下为原文 Do you have optimizations turned on? Full IEEE floating point support usually adds some overhead. GCC has some optimization switches (eg. -ffast-math, -funsafe-math-optimizations) that allow better optimization, but can lead to unexpected results.

2018-12-12 15:36:10 评论举报唐红菊

答案对人有帮助，有参考价值 0 你只看第一次跑步吗？缓存应该第一次填充，随后的运行应该快得多。以上来自于百度翻译以下为原文 Are you looking at this on the first run only? The cache should be filling the first time around, and then subsequent runs should be much faster.

2018-12-12 15:42:28 评论举报罗玉婧

答案对人有帮助，有参考价值 0 嗯，它需要一个周期。但是，（一个大的但是）在打开LED的指令和关闭LED的指令之间的代码序列不止一个机器指令。步骤1：创建一个LST文件并计算指令。您可以在MPLABX中查看两种方式的指令，但我喜欢有一个真正的文档，我可以检查和复制/粘贴部分，以便发送给其他人解释（或让他们向我解释）正在发生的事情。下面是我如何在MPLABX中创建一个.lST文件：在项目中&属性：IVICE框，点击“执行行后”框。Windows用户粘贴到下一个框（所有一行）：${MPyCccdidi}xC32 ObjDIP-S $ {IMADIDER } /${PrimeNeX} .${IMAGEYType }。ELF＆GT；${IMADIDER } /${PrimeNeX}。${IMAGEYType }。LSTLinux用户将该反斜杠更改为前斜杠。步骤2：现在构建项目，您将在包含H.EX文件的同一目录中找到.LST文件。我喜欢这样做，这样我就可以比较不同配置的结果。第3步：现在你有一个.LST文件（源代码散布在汇编代码中），寻找你做乘法的地方。在XC32版本1.44和优化级别设置为零的情况下，这里的部分（FX、FY和FZ被声明为浮动浮点，并且DEL2被定义为特定端口的特定锁存比特）：从LED打开时起的十一个指令。l关闭LED（第一个SH指令）（第二个SH指令）。将优化级别设置为1，并且可以消除四的机器指令在仪表化序列中。我认为这样的事情很酷。无论如何，到目前为止，所有的事情都是完全合乎逻辑的和确定性的，但这里有一个大问题：因为等待状态和流水线、指令和数据缓存，从指令中直接计算运行时间不是那么容易的。离子计数。（至少，我还没有找到一个黄金规则。）[/开始编辑]，使用端口集、CLR和IV寄存器，而不是显式地设置、清除和反转LAT寄存器的位，不仅在指令数量和运行时间方面更有效，而且它们是原子的，允许。LED在中断例程中被设置和清除，而不会中断同一端口上的其他位。我建议你应该习惯用PIC32做“大男孩”的事情。稍后你会感谢我。[//EddieDe]同样，由于流水线和其他所有因素，指令周期的数量可以从一个代码段改变为另一个，这取决于这是否是一个紧密的循环，或者是在与其他用途完全断开的上下文中。一个优化编译器可以改变事物的方式，从源头上看，它甚至更不可预测。底线，它可能（或者，也许，不）帮助你越过这一点，并进入你的应用程序：这是我的RothigWaGOS“经验法则---带着一点盐”。实际运行时间相当于指令周期的两倍和三倍之间的机器周期，我接受它为“正常”并继续我的生活。假设没有中断程序占用一个有意义的周期数，并且在这种情况下，假设外围总线时钟没有从其默认值Fsys显著减慢（2.140个NS（与上面的代码一起得到））对应于140E-9200 E6＝28机。循环=（大约）指令数量的2.54倍。用我的PIC32 MZ2018EF PIM在我的Explorer 16/32板上测试。MPLABX版本4.05，XC32版本1.44注意到，对于这个特殊的简单测试，将等待状态的数量从默认值（7）变为2（最小允许的200 MHz系统时钟）没有改变时序，但这是理所当然的。打开预取操作并没有改变时间，但我通常也这样做。[/开始免责声明]虽然我已经完成了几个PIC32 MX项目（性能不是问题；“32 MX只是闲逛”），我没有一个“MZ设备的实际项目经验”。我只是想让自己熟悉一下。我的“经验法则”可能对所有的应用程序都不是很好，但我已经看到它足够的时间来缓解我的忧虑。如果性能真的，真的，非常关键，那么好的测量总是胜过抽象，特别是，它胜过其他人的意见/猜测。参见脚注[ [结束免责声明] ]问候，DaveFootnote：“做你自己的研究”--- Richard Feynman 以上来自于百度翻译以下为原文 Well, it should take one cycle. But, (a big but) the code sequence between the instruction that turns on the LED and the instruction that turns off the LED is more than one machine instruction. Step 1: Create a .lst file and count the instructions. You can look at instructions a couple of ways in MPLABX, but I like to have a real document that I can inspect and copy/paste sections to send to others to explain (or to have them explain to me) what's happening. Here's how I create a .lst file in MPLABX: in the Project->Properties->Building dialogue box, click in the "Execute line after build" box. Windows users paste into the next box (All one line): ${MP_CC_DIR}xc32-objdump -S ${ImageDir}/${PROJECTNAME}.${IMAGE_TYPE}.elf > ${ImageDir}/${PROJECTNAME}.${IMAGE_TYPE}.lst Linux users change that backslash into a forward slash. Step 2: Now build the project, and you will find a .lst* file in the same directory that contains the .hex file. I like to do it this way so that I can compare results from different configurations. Step 3: Now that you have a .lst file (with the source code interspersed among the assembly code), look for the place where you do the multiplication. Count the instructions between the place where the LED was turned on and the place where it is turned off. With XC32 version 1.44 and optimization level set to zero, here's that section (fx, fy, and fz were declared volatile floats, and LED2 was defined to be a particular Latch bit of a particular port): LED2 = 1; 9d001684: 3c03bf86 lui v1,0xbf86 9d001688: 94620030 lhu v0,48(v1) 9d00168c: 24040001 li a0,1 9d001690: 7c820844 ins v0,a0,0x1,0x1 9d001694: a4620030 sh v0,48(v1) fz = fx * fy; 9d001698: 8f838050 lw v1,-32688(gp) 9d00169c: 8f828048 lw v0,-32696(gp) 9d0016a0: 44830000 mtc1 v1,$f0 9d0016a4: 44820800 mtc1 v0,$f1 9d0016a8: 46010002 mul.s $f0,$f0,$f1 9d0016ac: 44020000 mfc1 v0,$f0 9d0016b0: af82804c sw v0,-32692(gp) LED2 = 0; 9d0016b4: 3c03bf86 lui v1,0xbf86 9d0016b8: 94620030 lhu v0,48(v1) 9d0016bc: 7c020844 ins v0,zero,0x1,0x1 9d0016c0: a4620030 sh v0,48(v1) Eleven instructions from the time the LED is turned on until the LED(the first sh instruction) is turned off (the second sh instruction). Set optimization level to 1 and you can eliminate four of the machine instructions in the instrumented sequence. I think it's kind of cool to look at stuff like this. Anyhow, everything up until now has been completely logical and deterministic, but here's the biggie: Because of wait states and pipelining and instruction and data caches, it's not so easy to make a direct computation of run time from an instruction count. (At least, I haven't found a gold rule for this.) [/Begin Edit] Also, use of a port's SET, CLR, and INV registers rather than explicitly setting and clearing and inverting bits of a LAT register are not only more efficient in terms of number of instructions and run time, but they are atomic, allowing LEDs to be set and cleared in interrupt routines without disrupting other bits on the same port. I suggest that you should get used to doing things the "Big Boy" way with your PIC32. You will thank me later. [/End Edit] Also, due to pipelining and all of the rest, the number of instruction cycles can change from one section of code to another, depending on, say, whether this is in a tight loop or is in a completely disconnected context from other uses. An optimizing compiler can change things in such a way that it is even less predictable from just looking at the source code. Bottom line, which might (or, maybe, not) help you get past this point and on to your application: Here's my ROTTIWAGOS "Rule Of Thumb --- Take It With A Grain Of Salt" If the actual run time corresponds to a number of machine cycles between two and three times the number of instructions, I accept it as "normal" and get on with my life. That's assuming there are no interrupt routines taking a meaningful number of cycles, and, in this case, assuming that the peripheral bus clock has not been slowed down significantly from its default value of Fsys/2. 140 ns (which I got with the above code) corresponds to 140e-9 * 200e6 = 28 machine cycles = (approximately) 2.54 times the number of instructions. Tested with my PIC32MZ2018EF PIM on my Explorer 16/32 board. MPLABX version 4.05, XC32 version 1.44 Note that for this particular simple test, changing the number of wait states from the default (7) to two (the minimum allowed for a 200 MHz system clock) didn't change the timing, but I do this as a matter of course. Turning on the prefetch operation did not change the timing, but I usually do this also. [/Begin Disclaimer] Although I have completed a couple of PIC32MX projects (performance wasn't an issue; the '32MX was just loafing along), I don't have real project experience with a 'MZ device. I'm just trying to familiarize myself. My "Rule of Thumb" may not be very good for all applications, but I have seen it enough times to ease my worries. If performance is really, really, really critical, well measurement always trumps abstractions, and, in particular, it outvotes other people's opinions/guesses. See Footnote. [/End Disclaimer] Regards, Dave Footnote: "Do your own research." ---Richard Feynman

2018-12-12 15:52:51 评论举报郑雅颖

答案对人有帮助，有参考价值 0 正如建议的那样，即使在一个严格的循环中，你也只能在30MHz的频率下切换一个引脚。在开始测量的时候，然后在测量结束时获取值。然后我使用一个调试Prtf来告诉我使用了多少个核心循环。以上来自于百度翻译以下为原文 As suggested - even in a tight loop you can only toggle a pin at something like 30ish MHz. When I do timing I use the core timer. Get the value at start and then at end of whatever you are measuring. Then I use a debug printf to tell me how many core cycles were used. HTH

2018-12-12 16:08:03 评论举报唐芳

只有小组成员才能发言，加入小组>>

291个成员聚集在这个小组

精选推荐

能够在单个SPI接口上运行两个SPI设备吗？

4825 浏览 9 评论
和谐项目创建中的Java错误该怎么解决？

1831 浏览 8 评论
请问有没有办法通过UART闪存代码？

1748 浏览 10 评论
请问是否能把一个ADC值转换成两个字节用来设置PWM占空比？

2955 浏览 3 评论
请问电源和晶体值之间有什么关系吗？PIC在正常条件下运行4MHz需要多少电压？

2060 浏览 5 评论

最新话题

热门话题

SPI-MISO---ATSAMG55J19

461浏览 1评论
KSZ8863MLL查找不到MAC地址，求解！

1111浏览 1评论
PIC Kit3出现目标设备ID(00000000)与预期的设备ID(02c20000)不匹配。是什么原因

364浏览 0评论
bm64蓝牙模块更换后不能读写数据问题

263浏览 0评论
LAN9252使用SQI通信，进行数字复位后读BYTE_TEST异常

1799浏览 0评论

创建小组步骤

创建小组创建自己的地盘
个性设置精心打造小组空间
邀请好友邀请好友加入我的小组
小组升级小组积分升级赢得社区推荐

创建属于自己的小组

快速回复 返回顶部 返回列表

关注微信公众号

电子发烧友网

电子发烧友论坛

社区合作: 刘勇; 联系电话：15994832713; 邮箱地址：liuyong@huaqiu.com

社区管理: elecfans短短; 微信：elecfans_666; 邮箱：users@hauqiu.com

返回 Microchip

回复

关闭

站长推荐 /6

快速回复 返回顶部 返回列表

- 技术社区: HarmonyOS技术社区

RISC-V MCU技术社区

FPGA开发者技术社区

- OpenHarmony开源社区: OpenHarmony开源社区

- 嵌入式论坛: ARM技术论坛

STM32/STM8技术论坛

嵌入式技术论坛

单片机/MCU论坛

RISC-V技术论坛

瑞芯微Rockchip开发者社区

FPGA|CPLD|ASIC论坛

DSP论坛

- 电路图及DIY: 电路设计论坛

DIY及创意

电子元器件论坛

专家问答

- 电源技术论坛: 电源技术论坛

无线充电技术

- 综合技术与应用: 机器人论坛

USB论坛

电机控制

模拟技术

音视频技术

综合技术交流

上位机软件（C/Python/Java等）

- 无线通信论坛: WIFI技术

蓝牙技术

天线|RF射频|微波|雷达技术

- EDA设计论坛: PCB设计论坛

DigiPCBA论坛

Protel|AD|DXP论坛

PADS技术论坛

Allegro论坛

multisim论坛

proteus论坛|仿真论坛

KiCad EDA 中文论坛

DFM|可制造性设计论坛

- 测试测量论坛: LabVIEW论坛

Matlab论坛

测试测量技术

传感技术

- 招聘/交友/外包/交易/杂谈: 项目外包

供需及二手交易

工程师杂谈|交友

招聘|求职|工程师职场

- 官方社区: 发烧友官方/活动

华秋商城

华秋电路

time

recommend

hot

post

—
—
—

版
块
导
航