Microchip
直播中

李丽

8年用户 304经验值
私信 关注
[问答]

LAMEMP3移植到PIC32 MZ DA芯片上性能很差

我尝试将LAMEMP3移植到PIC32 MZ DA芯片上,性能相当令人失望。以前的硬件是在1GZ上运行的I.Mx6单核,它运行在小于10%的CPU上运行实况编码器。新硬件使用PIC32 MZ2064 DAH176,它应该具有FPU,etc. Ho。我发现,LAME将花费大约200%的时间来编码单44.1kHz信号。对性能差异有何看法?从臀部猜测说1GHZ/200兆赫=5,所以10%×5是50% CPU,这不是在这里加起来。对LAME代码的剖析显示了许多循环。LAME被设置为从缓存的DDR空间中占大多数。作为参考,闪光灯编码器位于大约10%个CPU,或者每1152个采样周期为700000个。

以上来自于百度翻译


      以下为原文

    I attempted to port LAME mp3 to a pic32 MZ DA chip, with rather disappointing performance.

The previous hardware is an i.mx6 single core running at 1gz, which operates at less than 10% CPU running the live encoder.

The new hardware uses the PIC32MZ2064DAH176, which is supposed to have an FPU, etc.  However, I find that LAME would take around 200% of real time to encode a mono 44.1khz signal.  

Any ideas on why the performance difference?  Shoot from the hip guess says 1ghz/200mhz = 5, so 10% x 5 is 50% cpu, which isn't adding up here.  Profiling the LAME code shows many cycles everywhere.CYCLES 62275 : snd_pcm_read 7680
CYCLES 241 : @lame_encode_buffer_template
CYCLES 407477 : lame_copy_inbuffer
CYCLES 569180 : Stage 1: vbrpsy_attack_detection
CYCLES 96 : Stage 1: vbrpsy_compute_block_type
CYCLES 1076469 : Stage 1: LONG BLOCK CASE
CYCLES 996 : Stage 1: SHORT BLOCK CASE
CYCLES 4367 : Stage 1: short block pre-echo control
CYCLES 89 : Stage 1: vbrpsy_apply_block_type
CYCLES 560029 : Stage 1: vbrpsy_attack_detection
CYCLES 18 : Stage 1: vbrpsy_compute_block_type
CYCLES 1075629 : Stage 1: LONG BLOCK CASE
CYCLES 757 : Stage 1: SHORT BLOCK CASE
CYCLES 4225 : Stage 1: short block pre-echo control
CYCLES 54 : Stage 1: vbrpsy_apply_block_type
CYCLES 10304 : Stage 1: psychoacoustic model
CYCLES 1669433 : @Stage 2: MDCT
CYCLES 129 : @Stage 3: MS/LR decision
CYCLES 2240782 : @Stage 4: quantization loop
CYCLES 29909 : @Stage 5: bitstream formatting

LAME is set up to malloc the majority from cached DDR space.

As a reference the shine encoder sits at around 10% CPU, or 700000 cycles per 1152 samples.

回帖(8)

陈晨

2019-1-22 07:11:16
解码器使用浮标吗?单人还是双人?FPU被禁用了吗?

以上来自于百度翻译


      以下为原文

    Does the decoder use floats?
Single or double?
Is the FPU disabled?
举报

刘涛

2019-1-22 07:20:51
它是一个编码器。它主要使用浮动,32位。我不知道FPU是禁用的,在哪些地方要检查?

以上来自于百度翻译


      以下为原文

    It is an encoder.  It uses primarily floats, 32 bits.
 
I don't know that the FPU is disabled, where are some points to check?
举报

刘涛

2019-1-22 07:34:19
以下是我最初得到的74个循环129个循环68个周期

以上来自于百度翻译


      以下为原文

    Here is what I am getting initially
 
    
    SYS_Initialize(NULL);
    volatile float test1, test2, val1;
    volatile long double test3, test4, val3, val4;
    test1 = 1.23456;
    test2 = 2.34567;
    test3 = 12345.67891;
    test4 = 98711223.654321;
    T1 = _CP0_GET_COUNT();
    val1 = test2*test1;
    T1 = _CP0_GET_COUNT() - T1;
    PrintUart("%u cyclesrn", T1 - 7);

    T2 = _CP0_GET_COUNT();
    val3 = test3*test4;
    T2 = _CP0_GET_COUNT() - T2;
    PrintUart("%u cyclesrn", T2 - 7);

    T3 = _CP0_GET_COUNT();
    val4 = sqrtl(test4);
    T3 = _CP0_GET_COUNT() - T3;
    PrintUart("%u cyclesrn", T3 - 7);
 
Yields
74 cycles
129 cycles
682 cycles
举报

刘涛

2019-1-22 07:49:36
我不太了解M i p s汇编和f p u,但这并不像4循环F pü汇编…map文件显示0x9d079c2c fpMultuulink内存显示1d07y92c2Load地址操作码标签DISASY 288526 1D07Y9C2C 0043DC2αMulsF3 SRL A3,A0,23 288527 1D07Y9C30 30E7000 FFA3、A3、255、288528、1D07Y99405DC2、SRL T1、A1、23、288529、1D07Y9C38 312900FANDI T1、T1、255 288530 288530 1D779C3C 3C0A8000 LUI T2、-32768 288531 1D07Y9C40 00043200 SLL A2、A0 8 8 288532 1D07Y9C44 00 CA3025或A2、A2、T2 288533 1D07Y9C48、SULL TL0、A1、γ1D07Y9C4C 010A402 NDI5或T0,T0,T2,288535 1D07Y9C50 00856026 XOR T4,A0,A1 288536 1D07Y9C54 014C5024和T2,T2,T4 288537 1D07Y9C58 24ECFFF加T4,A3,-1 288538 288538 1D779C5C 2D8100Fe SLTIU AT,T4,254 288539 1D07Y9C60 10200021 BEQ AT,零,CSPA A 288540 1D07Y9C64 00000000 NOP 00000000 1D07Y9C68 252CFFFF Audiu T4,T1,1 288542 1D07Y9C6C 2D8100Fe SLTIU AT,T4,254 288543 1D07Y9C70 10200036 BEQ AT,零,CSPECB 288544 1D779C74 00000000 NOP 288545 1D07Y9C78 0C800 19 Multu0,A2,T0 288546 1D07Y9C7C 00005812 MFLO T3 288547 288547 1D07Y9C80 288547 BEQ T3,零,I17Y1D07Y9C840 MFHI A2 288549 1D07Y9C88 34 C600 01 ORI A2,A2,1 288550 288550 1D07Y9C8C 04C000 00 3BLTZ A2,I18

以上来自于百度翻译


      以下为原文

    I don't know much about M I P S assembly and the f p u, but this doesn't smell like 4 cycle f p u assembly.
 
.map file shows                 0x9d079c2c                fpmul
 
Execution memory shows
 
1d07_92c2 
 
Line Address Opcode Label DisAssy
288,526 1D07_9C2C 00043DC2 __mulsf3 SRL A3, A0, 23
288,527 1D07_9C30 30E700FF ANDI A3, A3, 255
288,528 1D07_9C34 00054DC2 SRL T1, A1, 23
288,529 1D07_9C38 312900FF ANDI T1, T1, 255
288,530 1D07_9C3C 3C0A8000 LUI T2, -32768
288,531 1D07_9C40 00043200 SLL A2, A0, 8
288,532 1D07_9C44 00CA3025 OR A2, A2, T2
288,533 1D07_9C48 00054200 SLL T0, A1, 8
288,534 1D07_9C4C 010A4025 OR T0, T0, T2
288,535 1D07_9C50 00856026 XOR T4, A0, A1
288,536 1D07_9C54 014C5024 AND T2, T2, T4
288,537 1D07_9C58 24ECFFFF ADDIU T4, A3, -1
288,538 1D07_9C5C 2D8100FE SLTIU AT, T4, 254
288,539 1D07_9C60 10200021 BEQ AT, ZERO, CspecA
288,540 1D07_9C64 00000000 NOP
288,541 1D07_9C68 252CFFFF ADDIU T4, T1, -1
288,542 1D07_9C6C 2D8100FE SLTIU AT, T4, 254
288,543 1D07_9C70 10200036 BEQ AT, ZERO, CspecB
288,544 1D07_9C74 00000000 NOP
288,545 1D07_9C78 00C80019 MULTU 0, A2, T0
288,546 1D07_9C7C 00005812 MFLO T3
288,547 1D07_9C80 11600002 BEQ T3, ZERO, i17
288,548 1D07_9C84 00003010 MFHI A2
288,549 1D07_9C88 34C60001 ORI A2, A2, 1
288,550 1D07_9C8C 04C00003 BLTZ A2, i18
举报

更多回帖

发帖
×
20
完善资料,
赚取积分