完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
扫一扫,分享给好友
你好!我有一个问题,已经成为压倒性的。经过三个星期的努力解决问题后,我筋疲力尽了。我使用的是PIC32 MZ2048 EFH处理器。我的问题是:我有很多模块,如SPI、UART、PWM、定时器ADC等。每个模块都通过每个模块的测试框架工作进行了广泛的测试。有些模块甚至一起测试过。这很好。这些模块已经放在了库中。现在,我正在构建一个PID调节器应用程序。我把所有的模块放在一起。我使用独立的解决方案,不使用RTOS。应用程序的时序和内存需求非常适中。我使用MPLAB IDE的所有最新版本XC32。大多数模块如SPI、UART等都使用带有回调函数的中断。但大部分时间都在“等待中断”循环中。现在,我运行应用程序,得到完全随机的异常。它可以在一秒钟之内或几分钟之内。我得到所有类型的异常,包括应用程序完全崩溃和异常处理程序未被触发的情况。在这种情况下,恢复的唯一方法是重新启动应用程序。如果我得到一个异常,对于异常给出的地址也是完全随机的。我不能在行为中得到任何模式。当然,我通过控制台终端(UART6)进行跟踪,并直接尝试到内存区域来掌握发生的事情。但是,根本不可能得到关于这个根本问题的任何明智的信息。跟踪日志为最后已知的执行点提供了或多或少的随机位置。现在,如果我举例注释一个模块,例如UART1,它就可以工作。我仔细检查代码中的一些指针错误和常见错误,但什么也找不到。因此,我评论了对PWM库调用的调用,并取消了对UART 1代码的注释。现在它也起作用了!我也找不到PWM代码中的蚂蚁缺陷。现在我决定恢复UART1和PWM代码,并删除对SPI的调用。现在代码又如预期般工作了。这就好像这些模块不喜欢一起工作。所以,我怀疑基本处理器设置(初始化)的错误。一个更为可疑的原因是中断的设置。我仔细检查,设置正确的向量,优先级等。我仔细检查文档,确保我做的正确,但是找不到任何错误。问题是,如果中断设置有什么问题,为什么不同的模块组合应该与中断一起工作。我现在甚至开始认为这是HW错误,但似乎不太可能。在过去的15年中,我从来没有出现过故障PIC芯片。为什么应用程序可以与模块的一部分一起工作,而没有问题,包括不同的配置,包括中断,如果这是一个错误的芯片?我甚至花了一天的时间来解决这个问题,这通常是有帮助的,你在一个小时左右解决问题。但10个小时后,我撞上了石墙,并没有取得任何进展。甚至连一英寸也没有。我甚至做了一个记忆练习程序,试图找出任何错误的记忆位置。这个程序也许不考虑所有的可能性,但是做一些基本的测试。没问题。如果我有内存问题,为什么要在调用模块时工作呢?我在同一个空间内,在不同的配置,当它工作的正确或崩溃。任何想法或建议,将不胜感激!问候,博城,SM6FIE
以上来自于百度翻译 以下为原文 Hi! I have a problem that has become overwhelming. After more or less three week trying to crack the problem I’m exhausted. I’m using a PIC32MZ2048EFH processor. My problem is the following: I have a number of modules like SPI, UART, PWM, timers ADC etc. Every module has been tested extensively via a test frame work for each module. Some of the modules have been tested even together. This works fine. These modules have been placed in libraries. Now, I’m building a PID regulator application. I put all the modules together. I use a standalone solution and no RTOS. The timing and memory demands are very moderate for the application. I use all the latest releases of MPLAB IDE an XC32. Most of the modules like SPI, UART etc. use interrupts with callback functions. But most of the time is in the main “waiting for interrupt” loop. Now then I run the application I get totally randomly exceptions. It could be within a second or within minutes. I get all types of exceptions including cases where the application totally crashes and the exception handler is not triggered. In this case the only way to recover is to restart the application. If I get an exception the address given for the exception is also totally random. I cannot get any pattern in the behavior. Of course I put in tracing via a console terminal (UART6) as well as trying directly to a memory area to get a grip of what is happening. But is has been impossible to get any sensible information about the fundamental problem. The trace log gives more or less random locations for the last known execution point. Now if I for example comment one module, for example UART1, it works. I examine the code closely for some pointer errors and the usual mistakes but can’t find anything. So I comment out the calls to the PWM library calls and uncomment the UART 1 code. Now it also works! I can’t find ant flaws in the PWM code either. Now I decide to reinstate the UART1 and PWM code and remove the calls to SPI. And now the codes work as expected again. It’s like the modules don’t like to work together so to speak. Then I suspect something wrong with the basic processor settings (initialize) no problem there either as far as I can see. An even more suspected cause is the setup of the interrupts. I check this very carefully that correct vector, priorities etc. are set. I double check the documentation to make sure I do it correctly but can’t find anything wrong. And the question arises why different combinations of modules should work with interrupts if something was wrong with the interrupt setup. I now even start to think this is HW error but it seem very unlikely. I have never ever had a malfunction PIC chip during the last 15 years. Why can the application work with parts of the modules without problem with different configuration including interrupt if this is a faulty chip? I even took a day off to get some distance to the problem, usually this helps, and you fix the problems within an hour or so. But after 10 hours I hit the stone wall and hadn’t done any progress. Not even an inch. I even did a memory exercise program to try to find any faulty memory location. This program does perhaps not taking all possibilities into account but doing some basic test. No problems. And if I had a memory problem why should it work when I remove calls to a module. I’m well within the same space in different configurations when it works correct or crashes. Any thought or advices would be much appreciated! Regards, Bo, SM6FIE |
|
相关推荐
7个回答
|
|
你可以尝试不同组合的模块,看看哪些组合式工作,哪些不。一些纯粹的猜测包括:-变量(特别是指针)在ISRs中改变的是干扰-一些阻碍堆栈(例如,没有足够的堆栈空间,指针变为野生)…
以上来自于百度翻译 以下为原文 Can you try different combinations of modules and see which combinatorics work and which don't. Some pure guesses include: - variables (especially pointers) that are altered in ISRs that are interferring - something clobbering the stack (e.g. not enough stack space, pointers going wild...) Susan |
|
|
|
你遇到了什么例外?坏指针可以给你一个没有目标的异常。或者如果你超过了堆栈。一种调试技术是将数字保存到异常处理程序中处理的变量,或者通过引导上的持久变量来保存。如果您有多个IPEUT级别,它们可以影响每个。特别是如果你不使用原子变量或技术,这对于在模块间共享的中断控制寄存器来说是非常重要的。
以上来自于百度翻译 以下为原文 What kind of exceptions are you getting? Bad pointers can give you an exception with no target. Or if you over run the stack. One debugging technique is to save number to a variable that you handle in the exception handler, or via a persistent variable on boot. If you multiple interupt levels, they can affect each. Especially if you’re not using atomic variables or techniques this is very important for in a interrupt control registers which are shared between the modules. |
|
|
|
您正在编写“……”,并且异常处理程序未被触发。您是否知道有一个以上的异常处理程序?或者更具体的,您是否实现了所有的下列处理程序?在设备复位(应用程序重新启动)后,你会评估RCON吗?它说什么?正如苏珊所说,没有足够的堆栈空间也是我的第一猜测。忘记数据结构中的原子性或模块间数据需要一致的变量将是我的另一个猜想。例如缓冲区中的位置或缓冲区中的条目数。
以上来自于百度翻译 以下为原文 You are writing "[...] and the exception handler is not triggered". Are you aware that there is more than one exception handler? Or more specific, did you implement all of the following handlers? _general_exception_handler () _simple_tlb_refill_exception_handler() _cache_err_exception_handler() _bootstrap_exception_handler() _nmi_handler() _DefaultInterrupt() And do you evaluate RCON after device reset (/ application restart)? What does it say? As Susan said, not enough stack space would also be one of my first guesses. And forgetting about atomicity at the data structures or variables which need to be consistent for the data between the modules would be my other guess. For example position in a buffer or number of entries in a buffer. |
|
|
|
一定要检查前面提到的堆栈溢出,这是自定义板吗?如果是这样,检查电源噪声,布局限制,怪胎上限规则等。你可以运行你的代码在一个已知的好开发/EVE板?
以上来自于百度翻译 以下为原文 Definitely check for stack overrun as previously suggested. Is this a custom board? If so, check for power supply noise, layout restrictions, weirdo cap rules, etc. Can you run your code on a known good development/eval board? |
|
|
|
我会把所有的ISR声明复制到一个空白文件中,将所有的ISR初始化代码复制到同一个文件中。然后盯着它看一会儿。如果正确的话,一切都有意义。优先级,影子寄存器,ISR声明等。我不使用一个PIC与不止一个影子寄存器,但即使是错误的,将导致重大问题很快。我想用7,你有更多的机会无意中误用它们。我可以想象一些IRQ的优先级和不同的IRQ优先级,如果影子集设置在某个地方是不正确的,你可能会遇到一些奇怪的问题。也许要尝试的东西-使所有IRQ优先级相同,看看原因是否与嵌套IRQ有关。我也会离开。仅在ISR声明中使用e IRQ级别——让ISR代码计算是否使用阴影集并从CP0获得请求的运行级别(对于大多数应用来说,这不是一个大的交易,让ISR将其全部排除,并且避免了ISR声明中的错误)。也许,“力量”根本不适合你,Princess Leia:
以上来自于百度翻译 以下为原文 I would copy all the isr declarations to a blank file, copy all the isr init code to the same file. Then stare at it a while. It should all make sense if correct. Priorities, shadow registers, isr declarations, etc. I don't use a pic with more than one shadow register, but even getting that wrong will cause major problems soon enough. I imagine with 7, you have more opportunity to inadvertently misuse them. I can imagine with a number of irq's with various irq priorities you could run into strange problems that do not make sense if the shadow set setup is incorrect somewhere. Maybe things to try- make all irq priorities the same to see if the cause is even related to nested irq's. I would also leave the irq level alone in the isr declarations- let the isr code figure out if a shadow set is in use and get the requested run level from cp0 (not that big of a deal for most applications to let the isr figure it all out, and it saves one from getting it wrong in the isr declarations). Just a thought. or, maybe the 'force' is simply not with you, Princess Leia :) |
|
|
|
内存损坏和/或寄存器集不匹配将是我的猜测。如果您的测试在PC上运行,则在Valgrind下运行它们。
以上来自于百度翻译 以下为原文 Memory corruption and/or register set mismatch would be my guess. If your tests are run on a PC, run them under Valgrind. |
|
|
|
只是为了结束这个问题,谢谢大家的建议。这使我又精力充沛。我遵循了建议。检查和双重检查堆栈空间,指针等。我也遵循苏珊的建议与不同组合的模块。然后,我来到了一个系统崩溃的一个新组合的模块。这一次每次都有可能以相同的结果重新运行应用程序,然后我发现了一件非常特别的事情。函数A称为函数B。在给定的情况下,函数结果应该是完全相同的。但是它没有。CPU正确地在B中执行,但在A中没有执行。经过仔细分析,很明显,这不是任何指针错误等。显然是CPU在计算中出错。这指示了一个错误的MCU(芯片),我自己制造了这个板。我使用“Sood WIKE”方法来安装/移动PIC32 MZ(100PIN)芯片。这种方法的缺点是芯片容易过热。因此,我做了一个新的板(我有4个备用PCB)。当新的板准备好了,我连接它并下载完全相同的测试框架工作。它没有任何问题。经过几个小时的广泛而深入的测试,我确信这个问题已经解决了。这个问题过去几分钟或几秒钟后就出现了,所以问题是CPU故障(芯片),很可能是在制造板时过热造成的。谢谢大家的帮助!关于BMO,SM6FIE
以上来自于百度翻译 以下为原文 Just to conclude this issue. Thanks for all advices. It made me energetic again. I followed the given advises. Checked and double checked stack space, pointers etc. I also followed Susan’s advice with different combination of modules. Then I came to a point where the system did crash with a new combination of modules. This time it was possible to rerun the application with the same result more or less every time. Then I discovered a most extraordinary thing. Function A called function B. In the given case the function result should be exactly the same. But it didn’t. The CPU executed it correctly in B but not in A. After carefully analyzing it was clear that this was not any pointer error etc. It was clearly the CPU that did an error in the computation. This pointed to a faulty MCU (chip). I have manufactured the board myself. I use the “Soder-Wick” method to mount/solider the PIC32MZ (100pin) chip. This method has the drawback that it easy to overheat the chip. Therefore I made a new board (I had 4 spare PCB’s). When the new board was ready I connected it and downloaded exactly the same test frame work. It worked without any problems. After a couple of hours with extensive and intensive testing I am convinced that the problem is solved. The problem used to show up after just after a couple of minutes or seconds. So the problem was a faulty CPU (chip), most probably caused by overheating the chip during manufacturing the board. Thanks for all help! Regards Bo, SM6FIE |
|
|
|
只有小组成员才能发言,加入小组>>
5158 浏览 9 评论
1997 浏览 8 评论
1926 浏览 10 评论
请问是否能把一个ADC值转换成两个字节用来设置PWM占空比?
3169 浏览 3 评论
请问电源和晶体值之间有什么关系吗?PIC在正常条件下运行4MHz需要多少电压?
2222 浏览 5 评论
723浏览 1评论
606浏览 1评论
有偿咨询,关于MPLAB X IPE烧录PIC32MX所遇到的问题
494浏览 1评论
PIC Kit3出现目标设备ID(00000000)与预期的设备ID(02c20000)不匹配。是什么原因
620浏览 0评论
519浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-11-19 10:20 , Processed in 1.196013 second(s), Total 61, Slave 54 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号