完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
扫一扫,分享给好友
嗨,我有一个项目与32 x128 RGB LED矩阵。我已经使用比特敲击和DMA/PMP发送数据给它。我认为使用DMA/PMP会更快,但情况并非如此。我必须为每个帧发送8个“子帧”——每一个颜色的子帧,因为LED锁存电路是开或关,不是PWMD或任何东西。它的工作,但似乎PMP将不会输出任何比8HMZ快,所以我不能得到相当的帧数我想要的。默认的PBLK2设置是SysCLK/2,因此应该是100MHz?所以,它不应该能够用正确的PMP等待状态设置做得比8HMZ快吗?我甚至尝试制作PBLK2 SysCLK/1。100MHz是否太快?它说,EBI和SQI max是50MHz,但它不表示最大的PMP/DMA。这是不是更快的DMA/PMP将去?请让我知道,并提前感谢。这是我的初始化代码:
以上来自于百度翻译 以下为原文 Hi, I have a project with a 32x128 RGB LED matrix. I've sent data to it using both bit-banging and DMA/PMP. I thought using DMA/PMP would be faster, but it doesn't look like that's the case. I have to send 8 "sub-frames" for every frame - one sub-frame for each bit of color, because the LED latch circuitry is either on or off, not PWMd or anything. It's working, but it appears the PMP won't output any faster than 8HMz, so I can't get quite the framerate I wanted. The default PBCLK2 setting is SYSCLK/2, so it should be 100MHz?, so shouldn't it be able to do faster than 8HMz with the right PMP wait state settings? I even tried making PBCLK2 SYSCLK/1. Is 100MHZ too fast? It says EBI and SQI max is 50MHz, but it doesn't state a max for PMP/DMA. Is this all the faster DMA/PMP will go? Please let me know and thanks in advance. Here's my init code: void InitDMA() { DMACONbits.ON = 1; // enable the DMA controller // DMA Channel 0... IEC4bits.DMA0IE = 0; // disable DMA Channel 0 interrupts IFS4bits.DMA0IF = 0; // clear any existing DMA Channel 0 interrupt flag DCH0CON = 0x3; // turn channel off, set to priority 3, no chaining DCH0ECON = 0; // no start or stop IRQ, no pattern match DCH0SSA = _VirtToPhys((const void*)dispfbptr); // transfer source physical address DCH0DSA = _VirtToPhys((const void*)&PMDIN); // transfer destination physical address DCH0SSIZ = H_COLS; // source size DCH0DSIZ = 1; // destination size DCH0CSIZ = 1; // number of byte(s) transferred per event DCH0ECONbits.CHSIRQ = _TIMER_1_VECTOR; // Set Channel Transfer Start IRQ to Timer 1 DCH0ECONbits.SIRQEN = 1; // 1 = Start channel cell transfer if an interrupt matching CHSIRQ occurs DCH0INTCLR = 0x00ff00ff; // clear existing events, disable all interrupts DCH0INTSET = 0x00090000; // enable Block Complete and error interrupts IPC33bits.DMA0IP = 3; // set DMA Channel 0 priority to 3 IPC33bits.DMA0IS = 1; // set DMA Channel 0 sub-priority to 1 IEC4bits.DMA0IE = 1; // enable DMA channel 0 interrupt DCH0CONbits.CHAEN = 1; // turn DMA Channel 0 on DCH0CONbits.CHEN = 1; // turn DMA Channel 0 on } void InitPMP() { // PMP initialization... PMCON = 0; // Stop and configure PMCON register for Address mode. PMCONbits.ON = 0; // Stop and configure PMCON register for Address mode. IEC4bits.PMPEIE = 0; // Disable PMP interrupt in case it is already enabled PMCONbits.PTWREN = 1; // Enable the Write Strobe Port (PTWREN = 1). PMCONbits.PTRDEN = 0; // Enable the Read Strobe Port (PTRDEN = 1). PMCONbits.WRSP = 1; // Write Strobe Polarity active low. PMCONbits.RDSP = 0; // Read Strobe Polarity active low. PMCONbits.CSF = 0; // both Chip Select bits act as address lines. PMCONbits.ADRMUX = 0; // address and data appear on separate pins. PMCONbits.DUALBUF = 0; // Use separate registers for reads and writes. PMMODEbits.IRQM = 1; // interrupt generated at end of read/write cycle. PMMODEbits.INCM = 0; // Set the Increment Mode to increment (INCM = 00) PMMODEbits.MODE16 = 0; // Set the mode to 8 bits (MODE16 = 0). PMMODEbits.MODE = 2; // Set the mode Master Mode 2 - separate read/write lines (MODE = 10). PMMODEbits.WAITM = 0; // Set the Data Read/Write Strobe Wait State to 1 Tpb (WAITM = 0000) PMMODEbits.WAITB = 0; // Set the Data Setup To Read/Write Strobe Wait State to 1 Tpb (WAITB = 00) PMMODEbits.WAITE = 0; // Set the Data Hold After Read/Write Strobe Wait States to 1 Tpb (WAITE = 00) PMAEN = 0x0;// Disable all address and Chip Select lines (they will function as port I/O. PMCONbits.ON = 1; // Enable the PMP module } |
|
相关推荐
19个回答
|
|
嗨,这是关于PIC32 MZ?在PIC32 MZ中,系统总线互连之间的桥接器和PMP外设之间的桥接器有延迟(延迟),关于BITBIN I/O端口引脚可以获得什么频率,但需要将数据传送到端口SF的时钟周期,已经有一些线程。R寄存器和PMP SFR寄存器至少在PBCK频率相同的情况下是相同的。现在,数据表表示PBLK4,到端口SFR寄存器,时钟可以在200 MHz MAX,而类似于PMP的时钟是100 MHz最大。DMA不会比使用CPU控制传输更快。直接。DMA控制器必须使用相同的系统总线和桥作为CPU,从RAM内存中获取数据,并在PMP.DMA中向SFR寄存器写入数据。这使得CPU可以在显示更新的同时进行其他工作,而不必对显示器进行调羹。线程:HTTP://www. McCHIP.COM/FoMss/FordPase/92539,HTTP://www. McCHIP.COM/FUMMS/FANDPOST/782144,Mysil
以上来自于百度翻译 以下为原文 Hi, This is about PIC32MZ? In PIC32MZ, there are delays (latency) in the bridges between the System Bus interconnect, and the peripheral bus going to PMP peripheral. There have been some threads about what frequency may be obtained when bitbanging I/O Port pins, but the number of clock cycles needed to transfer data to Port SFR register, and to PMP SFR register, will be the same at least as long as the PBCLK frequency is the same. Now, Datasheet say that PBCLK4, to Port SFR registers, may be clocked at 200 MHz max, while the similar clock to PMP is 100 MHz max. DMA will not be faster than using the CPU to control transfer directly. The DMA controller have to use the same system bus and bridges as the CPU, to get data from RAM memory, and to write data to SFR register in the PMP. DMA will make it possible for the CPU to do other work at the same time as display is updated, without having to spoonfeed the display. Here are some of the threads: http://www.microchip.com/forums/FindPost/925398 http://www.microchip.com/forums/FindPost/782144 Regards, Mysil |
|
|
|
好的,谢谢。是的,它是PIC32 MZ。我寻找线索,并没有找到很多-我猜错误的搜索条款。我认为DMA/PMP等应该把负载从CPU上取下来,但是两者之间似乎没有任何明显的速度差异,这使我感到惊讶。还有一个问题:由于DMA/PMP似乎在LED显示器上产生2字节的数据移位,所以我坚持使用BIT键,但是它能很好地配合BIT。无论如何,我想看看我能否得到更多的速度,所以我试图转换C代码发送数据到程序集。C代码产生一个8MHz的时钟信号。内联程序集创建了大约14MHz,但我不能让指针代码工作。如何使用内联汇编做指针?或者我需要创建单独的汇编文件吗?感谢所有知道的人。C代码是:内联汇编(指针不工作):
以上来自于百度翻译 以下为原文 OK, thanks. Yes, it's a PIC32MZ. I looked for threads and didn't find a lot - I guess the wrong search terms. I figured the DMA/PMP, etc. should take the load off the CPU, but there doesn't seem to be any discernible speed difference between the two, which surprises me. One additional question: I'm sticking with bit-banging since DMA/PMP seems to produce a 2-byte data shift on the LED display, but it works perfectly with bit-banging. Anyway, I want to see if I can get more speed, so I'm trying to convert the C code for sending data to assembly. The C code creates about an 8MHz clock signal. The inline assembly creates about 14MHz, but I can't get the pointer code to work. How do I do a pointer with inline assembly? Or do I need to create separate assembly file? Thanks in advance to anyone who knows. The C code is: unsigned char *dispfbptr; // repeat for every column PORT_DATA = *dispfbptr++; // send data, increment pointer PORT_CLOCK_SET = PIN_CLOCK_MASK; // clock on PORT_CLOCK_CLEAR = PIN_CLOCK_MASK; // clock off Inline assembly (pointer doesn't work): asm volatile("lbu %[LATE], (dispfbptr)":[LATE] "=r" (LATE):[dispfbptr] "d" (dispfbptr)); asm volatile("ori %[LATC], %[LATC], 0x8":[LATC] "=r" (LATC):); asm volatile("andi %[LATC], %[LATC], 0xf7":[LATC] "=r" (LATC):); asm volatile("addi %[dispfbptr], %[dispfbptr], 1":[dispfbptr] "+r" (dispfbptr)); |
|
|
|
对于DMA,DMA缓冲器的位置很重要。在C中,它应该声明为一致的,以确保它不使用相同的内部总线。
以上来自于百度翻译 以下为原文 Note for DMA the placement of the DMA buffer matters. In C it should be declared coherent to in sure it is not using the same internal bus. |
|
|
|
NKurzman,你是说下面的代码吗?我有帧缓冲区与相干属性。我真正想知道的是,如果我可以在内联程序集中使用*DISFBPBR指针(见上面的帖子)。时钟信号在工作,但是它把一个充满垃圾的屏幕传送到显示器上,但是时钟信号几乎快两倍。我想它是使用指针作为一个值,并将它传递到屏幕,并递增每列的值。
以上来自于百度翻译 以下为原文 NKurzman, do you mean this code below? I do have the frame buffer with the coherent attribute. What I would really like to know is if I can use the *dispfbptr pointer in the inline assembly (see above post). The clock signal is working, but it transfers a screen full of garbage to the display, but the clock signal is almost twice as fast. I'm thinking it's using the pointer as a value and transferring it to the screen and incrementing the value every column. extern volatile unsigned char __attribute__((coherent, aligned(32))) frameBuffer[NUM_BUFFERS][V_ROWS_DIV_2][H_COLS * COLOR_BITS]; |
|
|
|
内部总线是共享的。但是不止一个。DMA使用与CPU相同的总线,如果你在同一内存中(下半部分),将缓冲区放入不同的内存段(页面)。
以上来自于百度翻译 以下为原文 The Internal Buses are shared. But there is more than one. DMA used the same bus as the CPU, if you are in the same memory (bottom half) coherent puts the buffer in a different segment of memory (Page). I assume you set the PMP Timing to minimum. |
|
|
|
嗨,这里有几个与编译器一起安装的XC32汇编程序的例子。在汇编程序中编写的函数可以从C调用,如果它遵守调用约定。程序集可以在变量名之前用下划线来达到全局变量:函数调用时的变量。这样,调用函数就知道某些CPU寄存器是为临时变量保留的,并且可以通过调用BeNe函数来修改。内联程序集没有任何类似约定,所以使用内联汇编中的CPU寄存器可能会混淆,并导致奇怪的错误。汇编代码,它做了与PMP显示完全不同的事情,它显示了SFR RealStand的一些方法,并可以从C程序调用函数。A0是在C代码中调用参数列表中保持第一个参数的寄存器。XT到h. Mysil
I2C.MigMr.MyH.TXT(10.34 KB)-下载89次 以上来自于百度翻译 以下为原文 Hi, There are a few examples of XC32 assembler programs, installed together with the compiler. A function programmed in assembler may be called from C, if it observe the calling conventions. Assembly may reach global variables, by a underscore before the variable name: _variable When a function call is made, the calling function know that some CPU registers are reserved for temporary variables, and may be modified by the function beeing called. Inline assembly do not have any similar conventions, so using CPU registers in inline assembly may become confusing and cause strange errors. There is some assembly code attached, it do something completely different from your PMP display, it show some way to deeal with SFR regsters, and make functions that may be called from C program. a0 is the register holding the first parameter in argument list when called from C code. Rename the header file from txt to h. Mysil Attachment(s) I2C_Master_MM.S (24.23 KB) - downloaded 54 times I2C_Master_MM_h.txt (10.34 KB) - downloaded 89 times |
|
|
|
谢谢你提供的信息。今晚我来看看这些文件。他们应该给我一些关于如何做到这一点的线索。我查看了XC32编译器用户指南并尝试了一些东西,但它不会编译,还有一些东西需要编译,即在文档中没有任何约束的字母。所以,我没有线索去哪里去获取准确的参考信息。至少在C中,手册中有很多例子,大部分的参考文献都是正确的。从6502 CPU的日子起,我就理解汇编语言的打开和关闭,但是我对MPLAB/XC32的所有语法都没有线索。如果有关于联机汇编和汇编文件语法的准确信息来源,我想我必须找到它。无论如何,感谢文件和信息!
以上来自于百度翻译 以下为原文 Thanks for the info. I'll look at these files tonight. They should give me some clues about how to do this. I looked at the XC32 Compiler User Guide and tried some of the stuff, but it wouldn't compile and there's stuff that does compile, i.e. constraint letters that are nowhere in that document. So, I have no clue where to go for accurate reference information. At least with C, there are plenty of examples in the manuals and the references are correct for the most part. I understand assembly language as I played with it on and off since the days of the 6502 CPU, but I have no clue about all the syntax that goes with MPLAB/XC32. If there's an accurate source of info on inline assembly and assembly file syntax, I guess I'll have to find it. Anyway, thanks for the files and info! |
|
|
|
很抱歉这么痛苦,但是我编译代码遇到了一些问题。我写了一些代码,我的主体H包括XC.H头(这是我在附件MysIL中看到的唯一的内容)。我在.M.S文件中的每一行汇编代码中都有错误:rgBL.S:汇编:警告:文件结尾的注释;NexLeIdEndReGrBuff.S: 10:错误:非法操作数‘LUI T0,%HI(A0)’rgBL.S: 11:错误:非法操作数‘LW T1,%LO(A0)(T0)’rgBLe.s:12:错误:非法操作数‘LuT2’。%HI(PosixDATA)'rgBL.S: 13:错误:非法操作数'sw t1,%Lo(PosixDATA)(t2)'所有操作数都有效。我需要包括或做其他事情吗?如有任何帮助,我们将不胜感激。
以上来自于百度翻译 以下为原文 Sorry to be such a pain, but I'm running into problems with compiling my code. I wrote some code and my main.h includes the xc.h header (that's the only include I saw in the attachments Mysil posted). I get errors on every line of assembly code in my .s file: RGBLED.s: Assembler messages: RGBLED.s: Warning: end of file in comment; newline inserted RGBLED.s:10: Error: Illegal operands `lui t0,%hi(a0)' RGBLED.s:11: Error: Illegal operands `lw t1,%lo(a0)(t0)' RGBLED.s:12: Error: Illegal operands `lui t2,%hi(PORT_DATA)' RGBLED.s:13: Error: Illegal operands `sw t1,%lo(PORT_DATA)(t2)' All the operands are valid. Do I need to include or do something else? Any help would be appreciated. |
|
|
|
WRT组件,XC32是基于GCC的编译器,因此您可能会发现更多关于GCC站点的信息(但是它严重地被偏置到x86)。记住MIPS是一个加载/存储体系结构,并且所有I/O都是内存映射的,所以您所发布的ASM是非常荒谬的。光线的16 UIT88T到POLE的LO字节:注意A0.. A3在ABI AS函数参数中定义(和V0,V1作为返回值),因此通过删除注释中的标记为4的行,以及不注释函数头和返回行,上述成为有效的函数。
以上来自于百度翻译 以下为原文 WRT assembly, XC32 is a GCC based compiler, so you will probably find more information on the GCC site (however it is heavily biased to x86). Remember MIPS is a load/store architecture, and all I/O is memory mapped, so the ASM you posted is pretty much nonsense. For example, the following sends an array of 16 uint8_t's to lo byte of PORTE:- // void WriteToLATE(uint8_t *data,uint32_t len) { uint32_t register a0 asm("a0"); // Define friendly register names uint32_t register a1 asm("a1"); uint32_t register v0 asm("v0"); uint32_t register v1 asm("v1"); static uint8_t data[16]={0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};//# Test Data asm volatile (".set noreorder"); // stops optimiser messing with the following code (Important!, esp. at -O1 and above) asm volatile ("lui %0,%%hi(%1)":"=r"(a0):"i"(data));//# _Load a0 with address of data asm volatile ("ori %0,%%lo(%1)":"=r"(a0):"i"(data));//# / asm volatile ("ori %0,$0,16":"=r"(a1):); //# Load a1 with 16 asm volatile ("lui %0,%%hi(LATE)":"=r"(v0)); // _Load v0 with address of LATE asm volatile ("ori %0,%%lo(LATE)":"=r"(v0)); // / asm volatile (".LOOP1:"); asm volatile ("lbu %0,0(%1)":"=r"(v1):"r"(a0)); // v1 = *a0 Load v1 with byte pointed to by a0 asm volatile ("*** %0,0(%1)"::"r"(v1),"r"(v0)); // *v0 = a1 Store v1 at address pointed to by v0 (LATE) asm volatile ("addiu %0,%0,-1":"=r"(a1)); // a1-- Decrement a1 asm volatile ("bnel %0,$0,.LOOP1"::"r"(a1)); // if (a1!=0) goto .LOOP1 asm volatile (" addiu %0,%0,1":"=r"(a0)); // a0++ Branch Delay Slot - Increment a0 // asm volatile ("J $31"); //return // asm volatile ("nop"); //Important - fill Branch Delay Slot (gets executed before jump) asm volatile (".set reorder"); // re-enable optimiser } Note that a0..a3 are defined in the abi as function parameters (and v0,v1 as return values), so by removing the 4 lines marked # in the comments, and un-commenting the function header and return lines, the above becomes a valid function. |
|
|
|
HI,LealRask.S或大写.S文件中的汇编源文件有不同之处,我所发布的示例文件是:{包含.lt;xc.h & gt;in,有文件类型:.c编译器生成的sSealFrm文件有文件类型。STO表示它们在组装前不应被预处理。迈西尔
以上来自于百度翻译 以下为原文 Hi, lowercase .s or UPPERCASE .S in file type of assembly source file make a difference, the example file I posted, with: #include Assembly files produced by the C compiler have filetype .s to signify that they shall not be preprocessed before assembly. Mysil |
|
|
|
嗨,Simong,我不是在编写内联程序集(我假设这就是你所发布的代码),我将通过上面Mysil发布的示例文件来创建一个.s或.s文件。虽然我真的不在乎用哪种方式来完成最快的数据,无论是内联的还是单独的文件。我写的几行使用了一些应该在XC.H中声明的名称,所以即使它没有正确的工作,它至少应该被编译。我还尝试了% 0% 1,它仍然不喜欢它的一些东西。就这么多。我会试试你的方式。谢谢你的榜样。
以上来自于百度翻译 以下为原文 Hi Simong, I'm not writing inline assembly (I'm assuming that's what the code you posted is), I'm going by the example files that Mysil has posted above and creating a .s or .S file. Although I don't really care which way I can accomplish writing the data the fastest, either inline or separate file. The few lines I wrote use some names that are supposed to be declared in xc.h, so even though it may not have exactly worked, it should have at least compiled. I also tried the %0 %1 and it still didn't like something about it. So much for that. I'll try the way you posted. Thanks for the example. |
|
|
|
Mysil,Hmmm.…我刚刚添加了一个新的汇编文件。对不起,我不知道上下位有关系。我怎样改变这个?在创建文件后,我是否只是手动将其更改为.s?我会试试的,谢谢,托尼
以上来自于百度翻译 以下为原文 Mysil, Hmmm... I just added a new assembly file. Sorry, I didn't know upper/lower case mattered. How do I change this? Do I just manually change it to .S after I create the file? I'll try that. Thanks, Tony |
|
|
|
嗨,文件名被记录在MPLAB X和Maxfile文件中,并且Windows文件系统接口在打开文件时忽略上/下的差异,所以可能有陷阱。可能会有不同的方法,我会尝试:从MPLAB X项目中删除文件。在Windows文件系统资源管理器中扩展。然后再将文件添加到MPLAB X项目。问候,Mysil
以上来自于百度翻译 以下为原文 Hi, The filename is recorded in MPLAB X and Makefiles with case, and Windows filesystem interface ignore upper/lower case differences when opening files, so there may be pitfalls. There may be different ways that may work, I would try: Remove the file from MPLAB X project. Change the case of the filename extension in Windows filesystem explorer. Then Add the file to MPLAB X project again. Regards, Mysil |
|
|
|
三个问题:1。寄存器应该是$T0和$002。不能在寄存器值上执行%HE(),只能在即时(如地址)3上执行。LUI只有一个即时值,所以将不再用寄存器作为源,%ROR()不会在寄存器上工作。而且,没有索引偏移寻址模式,只有立即偏移,应该是$T1,$T2。如果PoToLoad数据是C数组,它应该是我的前一个例子,它是如何迭代的。通过数组
以上来自于百度翻译 以下为原文 Three problems:- 1. The registers should be $t0 and $a0 2. You cannot do %hi() on a register value, only on immediates (like addresses) 3. lui takes an immediate value only, so won't work with a register as a source Again,%lo() won't work on registers. Also, there is no indexed offset addressing mode, only immediate offset Again, should be $t1,$t2. If PORT_DATA is a C array, it should be _PORT_DATA See my previous example for how to iterate through an array |
|
|
|
嘿,Simong,以防万一你还在看这个帖子,我做了大量的研究,得到了一些代码。我也打开和关闭时钟来锁定每一个新的数据。原来我试图更快地运行的代码是:时钟引脚在120纳秒或8MHz脉冲。拆分显示13条指令。我在ASM中写的函数将它减少到每个循环的8,但是时钟脉冲仍然需要110纳秒或9MHz。我本以为从回路中切下5个指令会把脉冲时间缩短到-75纳秒。奇怪为什么它没有比它更快地加速脉冲频率。我甚至试着展开这个循环,但是速度还是一样的奇怪。不管怎样,谢谢你的帮助。我不认为重新学习装配会花费这么多的努力。我曾经为6502个CPU编写程序集,谢谢!
以上来自于百度翻译 以下为原文 Hey Simong, Just in case you're still looking at this thread, I did a ton of research and got some code to work. I'm also turning a clock on and off to latch each new piece of data. The original code I was trying to get to run faster was: #define wrt LATE = *dispfbptr++; LATCSET = PIN_CLOCK_MASK; LATCCLR = PIN_CLOCK_MASK; wrt wrt wrt wrt wrt wrt... 128 times. The clock pin was pulsing at 120nS or ~8MHz. The dissassembly showed 13 instructions. The function I wrote in asm reduces it to 8 for each loop, but a clock pulse still takes 110nS or ~9MHz. I would have thought cutting 5 instructions out of the loop would have cut the pulse time down to ~75nS. Wondering why it didn't speed the pulse frequency more than it did. I even tried unrolling the loop, but still same speed - weird. Anyway, thanks for your help. I didn't think re-learning assembly would take so much effort. I used to write assembly for 6502 CPUs. Thanks! |
|
|
|
说明书之间可能有摊位。例如,当你“LW”的东西,你不能立即使用它或CPU将失速。为了避免这种情况,可以旋转寄存器。您加载一个寄存器,然后使用前一个周期加载的寄存器进行写入。这些是坏的,但你仍然应该得到超过8兆赫。8指令太多。所有您需要的是LW/SW/SW/SW/BNE/ADDI(在延迟时隙中)。它是6。
以上来自于百度翻译 以下为原文 There may be stalls between instructions. Such as when you "lw" something, you cannot use it immediately or the CPU will stall. To avoid this, you can rotate registers. You load one register and then you use the register loaded on the previous cycle to write. There also will be stalls on LAT writes. These are bad but you still should get more than 8 MHz. 8 instructions is too many. All you need is lw/sw/sw/sw/bne/addi(in the delay slot). It's 6. |
|
|
|
作为几乎规范的“RISC”架构,MIPS组件都更容易(只有1个寻址模式!)而且更复杂(你必须做(几乎)所有你自己!)-下面的内嵌汇编脉冲输出160MHz的时钟“引脚”(LATB2),CPU频率为19MHz,MFC0/MTC0仪表。用于设置/清除性能计数器来计数CPU档位。内环为6英寸(如NorthGuy所说)。每128次迭代,我测量750个档位;每次迭代6个档位。每次写入2个档位到端口LAT。@ NordGuy在MIPS R2上直接加载存储OPS似乎没有负载使用惩罚。重新安排以上移动LBU和SB之间的添加没有什么区别。对于负载ALU OPS可能会受到惩罚。
以上来自于百度翻译 以下为原文 Being the almost canonical 'RISC' architecture, MIPS assembly is both easier (only 1 addressing mode!), and more complicated (you have to do (almost) everyting yourself!) :-D The following inline assembly pulses the output 'clock' pin (LATB2) at 16.5MHz with a CPU clk of 198MHz. PRECONbits.PFMWS=2; PRECONbits.PREFEN=1; PRISS=0x76543210; TRISCbits.TRISC15=0; TRISE=0; TRISBbits.TRISB2=0; _mtc0(25,0,18<<5); { uint32_t register a0 asm("a0"); // Define friendly register names uint32_t register a1 asm("a1"); uint32_t register v0 asm("v0"); uint32_t register v1 asm("v1"); uint32_t register t0 asm("t0"); uint32_t register t1 asm("t1"); static uint8_t data[128]={0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; asm volatile (".set noreorder"); // stops optimiser messing with the following code (Important!, esp. at -O1 and above) asm volatile ("lui %0,0xbf86":"=r"(v0)); // Load v0 with hi word of address of Port SFR's asm volatile ("ori %0,$0,0x04":"=r"(t0)); // t0=0x04 (bit 2) asm volatile (".LOOP2:"); // while(1) asm volatile ("lui %0,%%hi(%1)":"=r"(a0):"i"(data)); //# _Load a0 with address of data asm volatile ("ori %0,%%lo(%1)":"=r"(a0):"i"(data)); //# / asm volatile ("addiu %0,%1,0x80":"=r"(a1):"r"(a0)); //# Load a1 with a0+128 asm volatile ("mtc0 $0,$25,1"); //clear stall counter asm volatile (".LOOP1:"); asm volatile ("lbu $v1,0(%0)"::"r"(a0)); // v1 = *a0 Load v1 with byte pointed to by a0 asm volatile ("*** %0,%%lo(LATE)(%1)"::"r"(v1),"r"(v0)); // LATE = v1 Store v1 at address pointed to by v0 (LATE) asm volatile ("addiu %0,%0,1":"=r"(a0)); // a0++ Increment a0 asm volatile ("sw %0,%%lo(LATBSET)(%1)"::"r"(t0),"r"(v0)); // LATBSET=t0 asm volatile ("bne %0,%1,.LOOP1"::"r"(a0),"r"(a1)); // if (a1!=a0) goto .LOOP1 asm volatile (" sw %0,%%lo(LATBCLR)(%1)"::"r"(t0),"r"(v0));// Branch Delay Slot - LATBCLR=t0 asm volatile ("mfc0 %0,$25,1":"=r"(t1)); //read stall counter into t1 asm volatile ("J .LOOP2"); // while(1) asm volatile ("nop"); asm volatile (".set reorder"); // re-enable optimiser } The mfc0/mtc0 instr. are for setting/clearing the performance counters to count cpu stalls. The inner loop is 6 instr (as NorthGuy said). I am measuring 750 stalls per 128 iterations of the inner loop ~ 6 stalls per iteration ~ 2 stalls per write to the port LAT's. (6instr + 6 stalls)*16.5MHz = 198MHz, so everything adds up. @NorthGuy There appears to be no Load to Use penalty for straight load-store ops on MIPS R2. Re-arranging the above to move the addiu between the lbu and *** makes no difference. There may be a penalty for load-alu ops. |
|
|
|
他们改进技术的速度比我快得多:在LAT.的每一次写作中,似乎有2个摊位,我认为你可以通过将数据的写入和时钟的写入相结合来进一步改善这一点。这将需要将时钟线移动到LATB的高8位(或用于数据的任何端口)之一,因此硬件需要改变。在写入数据之前,它还需要一个额外的“ORI”指令来设置时钟位。但它仍然应该是更快的整体。此外,我记得有人从Microchip说,你可以运行的端口时钟在200兆赫(1∶1)。我不知道这是否更快。
以上来自于百度翻译 以下为原文 They improve technology faster than I can follow :) There appear to be 2 stalls on every write to LAT. I think you can further improve this by combining the write of the data with write of the clock. This will require moving the clock line to one of the high 8 bits of LATB (or whatever port is used for data), so hardware needs to be changed. It will also require one extra "ori" instruction to set the clock bit before writing data. But it still should be faster overall. Also, I remember someone from Microchip said that you can run the peripheral clock for ports at 200MHz (1:1). I don't know if this makes it faster or not. |
|
|
|
好的,所以我修改了Simong的代码(见下文)。现在是6条指令。但是,这没什么区别!还有~110纳秒!这是怎么发生的?我做了NorthGuy所说的,把时钟移动到第八位(我只需要6个数据位)。这就降到了90nS,但这仍然比应该慢。显示器不喜欢数据和时钟在同一时间打开-颜色变化的开始/结束闪烁。于是,我回到Simong的密码,看着摊位柜台。它读了247(?)??)它从诺斯盖尔的变化中读出了122。我做错什么了?抱歉这么痛苦,但这应该工作得更快。谢谢。
以上来自于百度翻译 以下为原文 OK, so I adapted Simong's code (see below). Now, it's down to 6 instructions. But, it didn't make ANY difference! Still ~110nS! How does that happen? I did what NorthGuy said and moved the clock to the 8th bit (I only need 6 of the data bits). That got it down to ~90nS, but that's STILL slower than it should be. The display didn't like the data and clock turning on at the exact same time - the start/end of color changes flickered. So, I went back to Simong's code and looked at the stall counter. It read 247 (?!?). It read 122 with the change from NorthGuy. What am I doing wrong? Sorry to be such a pain, but this should be working faster. Thanks. // config PRECONbits.PREFEN = 0b01; PRECONbits.PFMWS = 2; // function call to write data stallctr = writeData(dispfbptr, H_COLS); int writeData(FRAME_BUFFER_DATA_TYPE *data, int len) { /****************************************************** * * Write pixel data to RGB LED display * * *data will be loaded into register a0 * len will be loaded into a1 * * - write 1 byte of data to LATE * bit 7 6 5 4 3 2 1 0 * color x x B2 B1 G2 G1 R2 R1 * Note: 1=rows 0-15, 2=rows 16-31 * - turn clock on (RC3) * - turn clock off (RC3) * - increment pointer for next column of data * - repeat number of times specified by len * *****************************************************/ _mtc0(25, 0, 18 << 5); asm volatile (".set noreorder"); // stops optimizer messing with the following code (Important, esp. at -O1 and above) asm volatile ("ori $t0, $0, 8"); // load 8 into t0 asm volatile ("addu $a1, $a0, $a1"); // add the pointer address to len to determine the stop point asm volatile ("lui $v1, 0xBF86"); // load v1 with I/O port base address BF86h asm volatile ("mtc0 $0, $25, 1"); // clear stall counter asm volatile (".LOOP1:"); // bnel (branch not equal likely) loops here asm volatile ("lbu $v0, 0($a0)"); // v0 = *a0 load v0 with frame buffer byte pointed to by a0 asm volatile ("*** $v0, 0x430($v1)"); // *v1 = v0 Store frame buffer byte at LATE, BF86h base + 430h offset asm volatile ("addiu $a0, $a0, 1"); // a0++ increment a0 asm volatile ("sw $t0, 0x238($v1)"); // write t0 (value 8) to LATCSET, BF86h base + 238h offset asm volatile ("bnel $a0, $a1, .LOOP1"); // if (a0!=a1), goto .LOOP1 asm volatile ("sw $t0, 0x234($v1)"); // write t0 (value 8) to LATCCLR, BF86h base + 234h offset (branch delay slot) asm volatile ("sw $t0, 0x234($v1)"); // write t0 (value 8) to LATCCLR, BF86h base + 234h offset so clock isn't left on asm volatile ("mfc0 $v0, $25, 1"); // read stall counter into v0 for return value asm volatile (".set reorder"); // re-enable optimizer } |
|
|
|
只有小组成员才能发言,加入小组>>
5250 浏览 9 评论
2037 浏览 8 评论
1958 浏览 10 评论
请问是否能把一个ADC值转换成两个字节用来设置PWM占空比?
3218 浏览 3 评论
请问电源和晶体值之间有什么关系吗?PIC在正常条件下运行4MHz需要多少电压?
2266 浏览 5 评论
790浏览 1评论
682浏览 1评论
有偿咨询,关于MPLAB X IPE烧录PIC32MX所遇到的问题
612浏览 1评论
PIC Kit3出现目标设备ID(00000000)与预期的设备ID(02c20000)不匹配。是什么原因
686浏览 0评论
584浏览 0评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-12-29 12:05 , Processed in 1.499800 second(s), Total 84, Slave 78 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号