STM32

笔画张

12年用户 989经验值

私信关注

[问答]

函数中的相反数/偏移/移位/减法和比例因子是什么？

回帖（1）

张英

2021-11-19 11:40:25

本期教程主要讲基本函数中的相反数，偏移，移位，减法和比例因子。
12.1 初学者重要提示

在这里简单的跟大家介绍一下DSP库中函数的通用格式，后面就不再赘述了。

这些函数基本都是支持重入的。
基本每个函数都有四种数据类型，F32，Q31，Q15，Q7。
函数中数值的处理基本都是4个为一组，这么做的原因是F32，Q31，Q15，Q7就可以统一采用一个程序设计架构，便于管理。更重要的是可以在Q15和Q7数据处理中很好的发挥SIMD指令的作用（因为4个为一组的话，可以用SIMD指令正好处理2个Q15数据或者4个Q7数据）。
部分函数是支持目标指针和源指针指向相同的缓冲区。
为什么定点DSP运算输出的时候容易出现结果为0的情况：http://www.armbbs.cn/forum.php?mod=viewthread&tid=95194

12.2 DSP基础运算指令

本章用到基础运算指令：

相反数函数用到QSUB，QSUB16和QSUB8。
偏移函数用到QADD，QADD16和QADD8。
移位函数用到PKHBT和SSAT。
减法函数用到QSUB，QSUB16和QSUB8。
比例因子函数用到PKHBT和SSAT。

这里特别注意饱和运算问题，在第11章的第2小节有详细说明
12.3 相反数（Vector Negate）

这部分函数主要用于求相反数，公式描述如下：
pDst[n] = -pSrc[n], 0 <= n < blockSize.
特别注意，这部分函数支持目标指针和源指针指向相同的缓冲区。
12.3.1       函数arm_negate_f32

函数原型：

1. void arm_negate_f32(
2.    const float32_t * pSrc,
3.          float32_t * pDst,
4.          uint32_t blockSize)
5. {
6.          uint32_t blkCnt;                            /* Loop counter */
7.
8. #if defined(ARM_MATH_NEON_EXPERIMENTAL)
9.       float32x4_t vec1;
10.       float32x4_t res;
11.
12.       /* Compute 4 outputs at a time */
13.       blkCnt = blockSize >> 2U;
14.
15.       while (blkCnt > 0U)
16.       {
17.          /* C = -A */
18.
19.          /* Negate and then store the results in the destination buffer. */
20.          vec1 = vld1q_f32(pSrc);
21.          res = vnegq_f32(vec1);
22.          vst1q_f32(pDst, res);
23.
24.          /* Increment pointers */
25.          pSrc += 4;
26.          pDst += 4;
27.
28.          /* Decrement the loop counter */
29.          blkCnt--;
30.       }
31.
32.       /* Tail */
33.       blkCnt = blockSize & 0x3;
34.
35. #else
36. #if defined (ARM_MATH_LOOPUNROLL)
37.
38.    /* Loop unrolling: Compute 4 outputs at a time */
39.    blkCnt = blockSize >> 2U;
40.
41.    while (blkCnt > 0U)
42.    {
43.       /* C = -A */
44.
45.       /* Negate and store result in destination buffer. */
46.       *pDst++ = -*pSrc++;
47.
48.       *pDst++ = -*pSrc++;
49.
50.       *pDst++ = -*pSrc++;
51.
52.       *pDst++ = -*pSrc++;
53.
54.       /* Decrement loop counter */
55.       blkCnt--;
56.    }
57.
58.    /* Loop unrolling: Compute remaining outputs */
59.    blkCnt = blockSize % 0x4U;
60.
61. #else
62.
63.    /* Initialize blkCnt with number of samples */
64.    blkCnt = blockSize;
65.
66. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
67. #endif /* #if defined(ARM_MATH_NEON_EXPERIMENTAL) */
68.
69.    while (blkCnt > 0U)
70.    {
71.       /* C = -A */
72.
73.       /* Negate and store result in destination buffer. */
74.       *pDst++ = -*pSrc++;
75.
76.       /* Decrement loop counter */
77.       blkCnt--;
78.    }
79.
80. }

函数描述：
这个函数用于求32位浮点数的相反数。
函数解析：

第8到35行，用于NEON指令集，当前的CM内核不支持。
第36到61行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
浮点数的相反数求解比较简单，直接在相应的变量前加上负号即可。
第69到78行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是原数据地址。
第2个参数是求相反数后目的数据地址。
第3个参数转换的数据个数，这里是指的浮点数个数。

12.3.2       函数arm_negate _q31

函数原型：

1. void arm_negate_q31(
2.    const q31_t * pSrc,
3.          q31_t * pDst,
4.          uint32_t blockSize)
5. {
6.          uint32_t blkCnt;                            /* Loop counter */
7.          q31_t in;                                     /* Temporary input variable */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11.    /* Loop unrolling: Compute 4 outputs at a time */
12.    blkCnt = blockSize >> 2U;
13.
14.    while (blkCnt > 0U)
15.    {
16.       /* C = -A */
17.
18.       /* Negate and store result in destination buffer. */
19.       in = *pSrc++;
20. #if defined (ARM_MATH_DSP)
21.       *pDst++ = __QSUB(0, in);
22. #else
23.       *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
24. #endif
25.
26.       in = *pSrc++;
27. #if defined (ARM_MATH_DSP)
28.       *pDst++ = __QSUB(0, in);
29. #else
30.       *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
31. #endif
32.
33.       in = *pSrc++;
34. #if defined (ARM_MATH_DSP)
35.       *pDst++ = __QSUB(0, in);
36. #else
37.       *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
38. #endif
39.
40.       in = *pSrc++;
41. #if defined (ARM_MATH_DSP)
42.       *pDst++ = __QSUB(0, in);
43. #else
44.       *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
45. #endif
46.
47.       /* Decrement loop counter */
48.       blkCnt--;
49.    }
50.
51.    /* Loop unrolling: Compute remaining outputs */
52.    blkCnt = blockSize % 0x4U;
53.
54. #else
55.
56.    /* Initialize blkCnt with number of samples */
57.    blkCnt = blockSize;
58.
59. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
60.
61.    while (blkCnt > 0U)
62.    {
63.       /* C = -A */
64.
65.       /* Negate and store result in destination buffer. */
66.       in = *pSrc++;
67. #if defined (ARM_MATH_DSP)
68.       *pDst++ = __QSUB(0, in);
69. #else
70.       *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in;
71. #endif
72.
73.       /* Decrement loop counter */
74.       blkCnt--;
75.    }
76.
77. }

函数描述：
用于求32位定点数的相反数。
函数解析：

第9到54行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第61到75行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
对于Q31格式的数据，饱和运算会使得数据0x80000000变成0x7fffffff，因为最小负数0x80000000（对应浮点数-1），求相反数后，是个正的0x80000000（对应浮点数正1），已经超过Q31所能表示的最大值0x7fffffff，因此会被饱和处理为正数最大值0x7fffffff。
这里重点说一下函数__QSUB，其实这个函数算是Cortex-M7，M4/M3的一个指令，用于实现饱和减法。比如函数：__QSUB(0, in1) 的作用就是实现0 – in1并返回结果。这里__QSUB实现的是32位数的饱和减法。还有__QSUB16和__QSUB8实现的是16位和8位数的减法。

函数参数：

第1个参数是原数据地址。
第2个参数是求相反数后目的数据地址。
第3个参数转换的数据个数，这里是指的定点数个数。

12.3.3       函数arm_negate_q15

函数原型：

1. void arm_negate_q15(
2.    const q15_t * pSrc,
3.          q15_t * pDst,
4.          uint32_t blockSize)
5. {
6.          uint32_t blkCnt;                            /* Loop counter */
7.          q15_t in;                                     /* Temporary input variable */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11. #if defined (ARM_MATH_DSP)
12.    q31_t in1;                                  /* Temporary input variables */
13. #endif
14.
15.    /* Loop unrolling: Compute 4 outputs at a time */
16.    blkCnt = blockSize >> 2U;
17.
18.    while (blkCnt > 0U)
19.    {
20.       /* C = -A */
21.
22. #if defined (ARM_MATH_DSP)
23.       /* Negate and store result in destination buffer (2 samples at a time). */
24.       in1 = read_q15x2_ia ((q15_t **) &pSrc);
25.       write_q15x2_ia (&pDst, __QSUB16(0, in1));
26.
27.       in1 = read_q15x2_ia ((q15_t **) &pSrc);
28.       write_q15x2_ia (&pDst, __QSUB16(0, in1));
29. #else
30.       in = *pSrc++;
31.       *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
32.
33.       in = *pSrc++;
34.       *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
35.
36.       in = *pSrc++;
37.       *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
38.
39.       in = *pSrc++;
40.       *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
41. #endif
42.
43.       /* Decrement loop counter */
44.       blkCnt--;
45.    }
46.
47.    /* Loop unrolling: Compute remaining outputs */
48.    blkCnt = blockSize % 0x4U;
49.
50. #else
51.
52.    /* Initialize blkCnt with number of samples */
53.    blkCnt = blockSize;
54.
55. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
56.
57.    while (blkCnt > 0U)
58.    {
59.       /* C = -A */
60.
61.       /* Negate and store result in destination buffer. */
62.       in = *pSrc++;
63.       *pDst++ = (in == (q15_t) 0x8000) ? (q15_t) 0x7fff : -in;
64.
65.       /* Decrement loop counter */
66.       blkCnt--;
67.    }
68.
69. }

函数描述：
用于求16位定点数的绝对值。
函数解析：

第9到50行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第57到67行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
对于Q15格式的数据，饱和运算会使得数据0x8000求相反数后饱和为0x7fff。因为最小负数0x8000（对应浮点数-1），求相反数后，是个正的0x8000（对应浮点数正1），已经超过Q15所能表示的最大值0x7fff，因此会被饱和处理为正数最大值0x7fff。
__QSUB16用于实现16位数据的饱和减法。

函数参数：

第1个参数是原数据地址。
第2个参数是求相反数后目的数据地址。
第3个参数转换的数据个数，这里是指的定点数个数。

12.3.4       函数arm_negate_q7

函数原型：

1. void arm_negate_q7(
2.    const q7_t * pSrc,
3.          q7_t * pDst,
4.          uint32_t blockSize)
5. {
6.          uint32_t blkCnt;                            /* Loop counter */
7.          q7_t in;                                     /* Temporary input variable */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11. #if defined (ARM_MATH_DSP)
12.    q31_t in1;                                  /* Temporary input variable */
13. #endif
14.
15.    /* Loop unrolling: Compute 4 outputs at a time */
16.    blkCnt = blockSize >> 2U;
17.
18.    while (blkCnt > 0U)
19.    {
20.       /* C = -A */
21.
22. #if defined (ARM_MATH_DSP)
23.       /* Negate and store result in destination buffer (4 samples at a time). */
24.       in1 = read_q7x4_ia ((q7_t **) &pSrc);
25.       write_q7x4_ia (&pDst, __QSUB8(0, in1));
26. #else
27.       in = *pSrc++;
28.       *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
29.
30.       in = *pSrc++;
31.       *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
32.
33.       in = *pSrc++;
34.       *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
35.
36.       in = *pSrc++;
37.       *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
38. #endif
39.
40.       /* Decrement loop counter */
41.       blkCnt--;
42.    }
43.
44.    /* Loop unrolling: Compute remaining outputs */
45.    blkCnt = blockSize % 0x4U;
46.
47. #else
48.
49.    /* Initialize blkCnt with number of samples */
50.    blkCnt = blockSize;
51.
52. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
53.
54.    while (blkCnt > 0U)
55.    {
56.       /* C = -A */
57.
58.       /* Negate and store result in destination buffer. */
59.       in = *pSrc++;
60.
61. #if defined (ARM_MATH_DSP)
62.       *pDst++ = (q7_t) __QSUB(0, in);
63. #else
64.       *pDst++ = (in == (q7_t) 0x80) ? (q7_t) 0x7f : -in;
65. #endif
66.
67.       /* Decrement loop counter */
68.       blkCnt--;
69.    }
70.
71. }

函数描述：
用于求8位定点数的相反数。
函数解析：

第9到47行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第54到69行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
对于Q7格式的数据，饱和运算会使得数据0x80变成0x7f。因为最小负数0x80（对应浮点数-1），求相反数后，是个正的0x80（对应浮点数正1），已经超过Q7所能表示的最大值0x7f，因此会被饱和处理为正数最大值0x7f。
__QSUB8用于实现8位数据的饱和减法。

函数参数：

第1个参数是原数据地址。
第2个参数是求相反数后目的数据地址。
第3个参数转换的数据个数，这里是指的定点数个数。

12.3.5       使用举例

程序设计：

/*
*********************************************************************************************************
* 函数名: DSP_Negate
* 功能说明: 求相反数
* 形参: 无
* 返回值: 无
*********************************************************************************************************
*/
static void DSP_Negate(void)
{
   float32_t pSrc = 0.0f;
   float32_t pDst;

q31_t pSrc1 = 0;
q31_t pDst1;

q15_t pSrc2 = 0;
q15_t pDst2;

q7_t pSrc3 = 0;
q7_t pDst3;

/*求相反数*********************************/
pSrc -= 1.23f;
arm_negate_f32(&pSrc, &pDst, 1);
printf("arm_negate_f32 = %frn", pDst);

pSrc1 -= 1;
arm_negate_q31(&pSrc1, &pDst1, 1);
printf("arm_negate_q31 = %drn", pDst1);

pSrc2 -= 1;
arm_negate_q15(&pSrc2, &pDst2, 1);
printf("arm_negate_q15 = %drn", pDst2);

pSrc3 += 1;
arm_negate_q7(&pSrc3, &pDst3, 1);
printf("arm_negate_q7 = %drn", pDst3);
printf("***********************************rn");
}

实验现象：

12.4 偏移（Vector Offset）

这部分函数主要用于求偏移，公式描述如下：
pDst[n] = pSrc[n] + offset, 0 <= n < blockSize.
注意，这部分函数支持目标指针和源指针指向相同的缓冲区。
12.4.1       函数arm_offset_f32

函数原型：

1. void arm_offset_f32(
2.    const float32_t * pSrc,
3.          float32_t offset,
4.          float32_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.
9. #if defined(ARM_MATH_NEON_EXPERIMENTAL)
10.       float32x4_t vec1;
11.       float32x4_t res;
12.
13.       /* Compute 4 outputs at a time */
14.       blkCnt = blockSize >> 2U;
15.
16.       while (blkCnt > 0U)
17.       {
18.          /* C = A + offset */
19.
20.          /* Add offset and then store the results in the destination buffer. */
21.          vec1 = vld1q_f32(pSrc);
22.          res = vaddq_f32(vec1,vdupq_n_f32(offset));
23.          vst1q_f32(pDst, res);
24.
25.          /* Increment pointers */
26.          pSrc += 4;
27.          pDst += 4;
28.
29.          /* Decrement the loop counter */
30.          blkCnt--;
31.       }
32.
33.       /* Tail */
34.       blkCnt = blockSize & 0x3;
35.
36. #else
37. #if defined (ARM_MATH_LOOPUNROLL)
38.
39.    /* Loop unrolling: Compute 4 outputs at a time */
40.    blkCnt = blockSize >> 2U;
41.
42.    while (blkCnt > 0U)
43.    {
44.       /* C = A + offset */
45.
46.       /* Add offset and store result in destination buffer. */
47.       *pDst++ = (*pSrc++) + offset;
48.
49.       *pDst++ = (*pSrc++) + offset;
50.
51.       *pDst++ = (*pSrc++) + offset;
52.
53.       *pDst++ = (*pSrc++) + offset;
54.
55.       /* Decrement loop counter */
56.       blkCnt--;
57.    }
58.
59.    /* Loop unrolling: Compute remaining outputs */
60.    blkCnt = blockSize % 0x4U;
61.
62. #else
63.
64.    /* Initialize blkCnt with number of samples */
65.    blkCnt = blockSize;
66.
67. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
68. #endif /* #if defined(ARM_MATH_NEON_EXPERIMENTAL) */
69.
70.    while (blkCnt > 0U)
71.    {
72.       /* C = A + offset */
73.
74.       /* Add offset and store result in destination buffer. */
75.       *pDst++ = (*pSrc++) + offset;
76.
77.       /* Decrement loop counter */
78.       blkCnt--;
79.    }
80.
81. }

函数描述：
这个函数用于求32位浮点数的偏移。
函数解析：

第9到36行，用于NEON指令集，当前的CM内核不支持。
第37到62行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第70到79行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是源数据地址。
第2个参数是偏移量。
第3个参数是转换后的目的地址。
第4个参数是浮点数个数，其实就是执行偏移的次数。

12.4.2       函数arm_offset_q31

函数原型：

1. void arm_offset_q31(
2.    const q31_t * pSrc,
3.          q31_t offset,
4.          q31_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11.    /* Loop unrolling: Compute 4 outputs at a time */
12.    blkCnt = blockSize >> 2U;
13.
14.    while (blkCnt > 0U)
15.    {
16.       /* C = A + offset */
17.
18.       /* Add offset and store result in destination buffer. */
19. #if defined (ARM_MATH_DSP)
20.       *pDst++ = __QADD(*pSrc++, offset);
21. #else
22.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
23. #endif
24.
25. #if defined (ARM_MATH_DSP)
26.       *pDst++ = __QADD(*pSrc++, offset);
27. #else
28.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
29. #endif
30.
31. #if defined (ARM_MATH_DSP)
32.       *pDst++ = __QADD(*pSrc++, offset);
33. #else
34.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
35. #endif
36.
37. #if defined (ARM_MATH_DSP)
38.       *pDst++ = __QADD(*pSrc++, offset);
39. #else
40.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
41. #endif
42.
43.       /* Decrement loop counter */
44.       blkCnt--;
45.    }
46.
47.    /* Loop unrolling: Compute remaining outputs */
48.    blkCnt = blockSize % 0x4U;
49.
50. #else
51.
52.    /* Initialize blkCnt with number of samples */
53.    blkCnt = blockSize;
54.
55. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
56.
57.    while (blkCnt > 0U)
58.    {
59.       /* C = A + offset */
60.
61.       /* Add offset and store result in destination buffer. */
62. #if defined (ARM_MATH_DSP)
63.       *pDst++ = __QADD(*pSrc++, offset);
64. #else
65.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
66. #endif
67.
68.       /* Decrement loop counter */
69.       blkCnt--;
70.    }
71.
72. }
————————————————
版权声明：本文为CSDN博主「Simon223」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/Simon223/article/details/105635186

函数描述：
这个函数用于求两个32位定点数的偏移。
函数解析：

第9到50行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第57到70行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
__QADD实现32位数的加法饱和运算。输出结果的范围[0x80000000 0x7FFFFFFF]，超出这个结果将产生饱和结果，负数饱和到0x80000000，正数饱和到0x7FFFFFFF。

函数参数：

第1个参数是源数据地址。
第2个参数是偏移量。
第3个参数是转换后的目的地址。
第4个参数是定点数个数，其实就是执行偏移的次数。

12.4.3 函数arm_offset_q15

函数原型：

1. void arm_offset_q15(2. const q15_t * pSrc,3. q15_t offset,4. q15_t * pDst,5. uint32_t blockSize)6. {7. uint32_t blkCnt; /* Loop counter */8. 9. #if defined (ARM_MATH_LOOPUNROLL)10. 11. #if defined (ARM_MATH_DSP)12. q31_t offset_packed; /* Offset packed to 32 bit */13. 14. /* Offset is packed to 32 bit in order to use SIMD32 for addition */15. offset_packed = __PKHBT(offset, offset, 16);16. #endif17. 18. /* Loop unrolling: Compute 4 outputs at a time */19. blkCnt = blockSize >> 2U;20. 21. while (blkCnt > 0U)22. {23. /* C = A + offset */24. 25. #if defined (ARM_MATH_DSP)26. /* Add offset and store result in destination buffer (2 samples at a time). */27. write_q15x2_ia (&pDst, __QADD16(read_q15x2_ia ((q15_t **) &pSrc), offset_packed));28. write_q15x2_ia (&pDst, __QADD16(read_q15x2_ia ((q15_t **) &pSrc), offset_packed));29. #else30. *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);31. *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);32. *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);33. *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);34. #endif35. 36. /* Decrement loop counter */37. blkCnt--;38. }39. 40. /* Loop unrolling: Compute remaining outputs */41. blkCnt = blockSize % 0x4U;42. 43. #else44. 45. /* Initialize blkCnt with number of samples */46. blkCnt = blockSize;47. 48. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */49. 50. while (blkCnt > 0U)51. {52. /* C = A + offset */53. 54. /* Add offset and store result in destination buffer. */55. #if defined (ARM_MATH_DSP)56. *pDst++ = (q15_t) __QADD16(*pSrc++, offset);57. #else58. *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);59. #endif60. 61. /* Decrement loop counter */62. blkCnt--;63. }64. 65. }
函数描述：
这个函数用于求16位定点数的偏移。
函数解析：

第9到43行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第50到63行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
函数__PKHBT也是SIMD指令，作用是将将两个16位的数据合并成32位数据。用C实现的话，如下：

#define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) << 0) & (int32_t)0x0000FFFF) | (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000) )

函数read_q15x2_ia的原型如下：

__STATIC_FORCEINLINE q31_t read_q15x2_ia ( q15_t ** pQ15){ q31_t val; memcpy (&val, *pQ15, 4); *pQ15 += 2; return (val);}
作用是读取两次16位数据，返回一个32位数据，并将数据地址递增，方便下次读取。

__QADD16实现两次16位数的加法饱和运算。输出结果的范围[0x8000 0x7FFF]，超出这个结果将产生饱和结果，负数饱和到0x8000，正数饱和到0x7FFF。
__SSAT也是SIMD指令，这里是将结果饱和到16位精度。

函数参数：

第1个参数是源数据地址。
第2个参数是偏移量。
第3个参数是转换后的目的地址。
第4个参数是定点数个数，其实就是执行偏移的次数。

12.4.4 函数arm_offset_q7

函数原型：

1. void arm_offset_q7(2. const q7_t * pSrc,3. q7_t offset,4. q7_t * pDst,5. uint32_t blockSize)6. {7. uint32_t blkCnt; /* Loop counter */8. 9. #if defined (ARM_MATH_LOOPUNROLL)10. 11. #if defined (ARM_MATH_DSP)12. q31_t offset_packed; /* Offset packed to 32 bit */13. 14. /* Offset is packed to 32 bit in order to use SIMD32 for addition */15. offset_packed = __PACKq7(offset, offset, offset, offset);16. #endif17. 18. /* Loop unrolling: Compute 4 outputs at a time */19. blkCnt = blockSize >> 2U;20. 21. while (blkCnt > 0U)22. {23. /* C = A + offset */24. 25. #if defined (ARM_MATH_DSP)26. /* Add offset and store result in destination buffer (4 samples at a time). */27. write_q7x4_ia (&pDst, __QADD8(read_q7x4_ia ((q7_t **) &pSrc), offset_packed));28. #else29. *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);30. *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);31. *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);32. *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8);33. #endif34. 35. /* Decrement loop counter */36. blkCnt--;37. }38. 39. /* Loop unrolling: Compute remaining outputs */40. blkCnt = blockSize % 0x4U;41. 42. #else43. 44. /* Initialize blkCnt with number of samples */45. blkCnt = blockSize;46. 47. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */48. 49. while (blkCnt > 0U)50. {51. /* C = A + offset */52. 53. /* Add offset and store result in destination buffer. */54. *pDst++ = (q7_t) __SSAT((q15_t) *pSrc++ + offset, 8);55. 56. /* Decrement loop counter */57. blkCnt--;58. }59. 60. }
函数描述：
这个函数用于求两个8位定点数的偏移。
函数解析：

第9到42行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第49到58行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
函数write_q7x4_ia的原型如下：

__STATIC_FORCEINLINE void write_q7x4_ia ( q7_t ** pQ7, q31_t value){ q31_t val = value; memcpy (*pQ7, &val, 4); *pQ7 += 4;}
作用是写4次8位数据，并将数据地址递增，方便下次继续写。

__QADD8实现四次8位数的加法饱和运算。输出结果的范围[0x80 0x7F]，超出这个结果将产生饱和结果，负数饱和到0x80，正数饱和到0x7F。

函数参数：

第1个参数是源数据地址。
第2个参数是偏移量。
第3个参数是转换后的目的地址。
第4个参数是定点数个数，其实就是执行偏移的次数。

12.4.5 使用举例

程序设计：

/*********************************************************************************************************** 函数名: DSP_Offset* 功能说明: 偏移* 形参: 无* 返回值: 无**********************************************************************************************************/static void DSP_Offset(void){ float32_t pSrcA = 0.0f; float32_t Offset = 0.0f; float32_t pDst; q31_t pSrcA1 = 0; q31_t Offset1 = 0; q31_t pDst1; q15_t pSrcA2 = 0; q15_t Offset2 = 0; q15_t pDst2; q7_t pSrcA3 = 0; q7_t Offset3 = 0; q7_t pDst3; /*求偏移*********************************/ Offset--; arm_offset_f32(&pSrcA, Offset, &pDst, 1); printf("arm_offset_f32 = %frn", pDst); Offset1--; arm_offset_q31(&pSrcA1, Offset1, &pDst1, 1); printf("arm_offset_q31 = %drn", pDst1); Offset2--; arm_offset_q15(&pSrcA2, Offset2, &pDst2, 1); printf("arm_offset_q15 = %drn", pDst2); Offset3--; arm_offset_q7(&pSrcA3, Offset3, &pDst3, 1); printf("arm_offset_q7 = %drn", pDst3); printf("***********************************rn");}
实验现象：

12.5 移位（Vector Shift）

这部分函数主要用于实现移位，公式描述如下：
pDst[n] = pSrc[n] << shift, 0 <= n < blockSize.
注意，这部分函数支持目标指针和源指针指向相同的缓冲区
12.5.1 函数arm_shift_q31

函数原型：

1. void arm_shift_q31(2. const q31_t * pSrc,3. int8_t shiftBits,4. q31_t * pDst,5. uint32_t blockSize)6. {7. uint32_t blkCnt; /* Loop counter */8. uint8_t sign = (shiftBits & 0x80); /* Sign of shiftBits */9. 10. #if defined (ARM_MATH_LOOPUNROLL)11. 12. q31_t in, out; /* Temporary variables */13. 14. /* Loop unrolling: Compute 4 outputs at a time */15. blkCnt = blockSize >> 2U;16. 17. /* If the shift value is positive then do right shift else left shift */18. if (sign == 0U)19. {20. while (blkCnt > 0U)21. {22. /* C = A << shiftBits */23. 24. /* Shift input and store result in destination buffer. */25. in = *pSrc++;26. out = in << shiftBits;27. if (in != (out >> shiftBits))28. out = 0x7FFFFFFF ^ (in >> 31);29. *pDst++ = out;30. 31. in = *pSrc++;32. out = in << shiftBits;33. if (in != (out >> shiftBits))34. out = 0x7FFFFFFF ^ (in >> 31);35. *pDst++ = out;36. 37. in = *pSrc++;38. out = in << shiftBits;39. if (in != (out >> shiftBits))40. out = 0x7FFFFFFF ^ (in >> 31);41. *pDst++ = out;42. 43. in = *pSrc++;44. out = in << shiftBits;45. if (in != (out >> shiftBits))46. out = 0x7FFFFFFF ^ (in >> 31);47. *pDst++ = out;48. 49. /* Decrement loop counter */50. blkCnt--;51. }52. }53. else54. {55. while (blkCnt > 0U)56. {57. /* C = A >> shiftBits */58. 59. /* Shift input and store results in destination buffer. */60. *pDst++ = (*pSrc++ >> -shiftBits);61. *pDst++ = (*pSrc++ >> -shiftBits);62. *pDst++ = (*pSrc++ >> -shiftBits);63. *pDst++ = (*pSrc++ >> -shiftBits);64. 65. /* Decrement loop counter */66. blkCnt--;67. }68. }69. 70. /* Loop unrolling: Compute remaining outputs */71. blkCnt = blockSize % 0x4U;72. 73. #else74. 75. /* Initialize blkCnt with number of samples */76. blkCnt = blockSize;77. 78. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */79. 80. /* If the shift value is positive then do right shift else left shift */81. if (sign == 0U)82. {83. while (blkCnt > 0U)84. {85. /* C = A << shiftBits */86. 87. /* Shift input and store result in destination buffer. */88. *pDst++ = clip_q63_to_q31((q63_t) *pSrc++ << shiftBits);89. 90. /* Decrement loop counter */91. blkCnt--;92. }93. }94. else95. {96. while (blkCnt > 0U)97. {98. /* C = A >> shiftBits */99. 100. /* Shift input and store result in destination buffer. */101. *pDst++ = (*pSrc++ >> -shiftBits);102. 103. /* Decrement loop counter */104. blkCnt--;105. }106. }107. 108. }
函数描述：
这个函数用于求32位定点数的左移或者右移。
函数解析：

第10到73行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 第18到52行，如果参数shiftBits是正数，执行左移。
- 第53到68行，如果蚕食shiftBits是负数，执行右移。
- 第28行，数值的左移仅支持将其左移后再右移相应的位数后数值不变的情况，如果不满足这个条件，那么要对输出结果做饱和运算，这里分两种情况：

out = 0x7FFFFFFF ^ (in >> 31) （in是正数）
= 0x7FFFFFFF ^ 0x00000000
= 0x7FFFFFFF
out = 0x7FFFFFFF ^ (in >> 31) （in是负数）
= 0x7FFFFFFF ^ 0xFFFFFFFF
= 0x80000000

第81到106行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。
- 第88行，函数clip_q63_to_q31的原型如下：

__STATIC_FORCEINLINE q31_t clip_q63_to_q31( q63_t x) { return ((q31_t) (x >> 32) != ((q31_t) x >> 31)) ? ((0x7FFFFFFF ^ ((q31_t) (x >> 63)))) : (q31_t) x; }
函数参数：

第1个参数是源数据地址。
第2个参数是左移或者右移位数，正数是左移，负数是右移。
第3个参数是移位后数据地址。
第4个参数是定点数个数，其实就是执行左移或者右移的次数。

12.5.2 函数arm_shift_q15

函数原型：

1. void arm_shift_q15(2. const q15_t * pSrc,3. int8_t shiftBits,4. q15_t * pDst,5. uint32_t blockSize)6. {7. uint32_t blkCnt; /* Loop counter */8. uint8_t sign = (shiftBits & 0x80); /* Sign of shiftBits */9. 10. #if defined (ARM_MATH_LOOPUNROLL)11. 12. #if defined (ARM_MATH_DSP)13. q15_t in1, in2; /* Temporary input variables */14. #endif15. 16. /* Loop unrolling: Compute 4 outputs at a time */17. blkCnt = blockSize >> 2U;18. 19. /* If the shift value is positive then do right shift else left shift */20. if (sign == 0U)21. {22. while (blkCnt > 0U)23. {24. /* C = A << shiftBits */25. 26. #if defined (ARM_MATH_DSP)27. /* read 2 samples from source */28. in1 = *pSrc++;29. in2 = *pSrc++;30. 31. /* Shift the inputs and then store the results in the destination buffer. */32. #ifndef ARM_MATH_BIG_ENDIAN33. write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),34. __SSAT((in2 << shiftBits), 16), 16));35. #else36. write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),37. __SSAT((in1 << shiftBits), 16), 16));38. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */39. 40. /* read 2 samples from source */41. in1 = *pSrc++;42. in2 = *pSrc++;43. 44. #ifndef ARM_MATH_BIG_ENDIAN45. write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),46. __SSAT((in2 << shiftBits), 16), 16));47. #else48. write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),49. __SSAT((in1 << shiftBits), 16), 16));50. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */51. 52. #else53. *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);54. *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);55. *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);56. *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);57. #endif58. 59. /* Decrement loop counter */60. blkCnt--;61. }62. }63. else64. {65. while (blkCnt > 0U)66. {67. /* C = A >> shiftBits */68. 69. #if defined (ARM_MATH_DSP)70. /* read 2 samples from source */71. in1 = *pSrc++;72. in2 = *pSrc++;73. 74. /* Shift the inputs and then store the results in the destination buffer. */75. #ifndef ARM_MATH_BIG_ENDIAN76. write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),77. (in2 >> -shiftBits), 16));78. #else79. write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),80. (in1 >> -shiftBits), 16));81. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */82. 83. /* read 2 samples from source */84. in1 = *pSrc++;85. in2 = *pSrc++;86. 87. #ifndef ARM_MATH_BIG_ENDIAN88. write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),89. (in2 >> -shiftBits), 16));90. #else91. write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),92. (in1 >> -shiftBits), 16));93. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */94. 95. #else96. *pDst++ = (*pSrc++ >> -shiftBits);97. *pDst++ = (*pSrc++ >> -shiftBits);98. *pDst++ = (*pSrc++ >> -shiftBits);99. *pDst++ = (*pSrc++ >> -shiftBits);100. #endif101. 102. /* Decrement loop counter */103. blkCnt--;104. }105. }106. 107. /* Loop unrolling: Compute remaining outputs */108. blkCnt = blockSize % 0x4U;109. 110. #else111. 112. /* Initialize blkCnt with number of samples */113. blkCnt = blockSize;114. 115. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */116. 117. /* If the shift value is positive then do right shift else left shift */118. if (sign == 0U)119. {120. while (blkCnt > 0U)121. {122. /* C = A << shiftBits */123. 124. /* Shift input and store result in destination buffer. */125. *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);126. 127. /* Decrement loop counter */128. blkCnt--;129. }130. }131. else132. {133. while (blkCnt > 0U)134. {135. /* C = A >> shiftBits */136. 137. /* Shift input and store result in destination buffer. */138. *pDst++ = (*pSrc++ >> -shiftBits);139. 140. /* Decrement loop counter */141. blkCnt--;142. }143. }144. 145. }
函数描述：
这个函数用于求16位定点数的左移或者右移。
函数解析：

第10到115行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 第20到62行，如果参数shiftBits是正数，执行左移。
- 第63到105行，如果蚕食shiftBits是负数，执行右移。
- 第79行，函数write_q15x2_ia的原型如下，用于实现将两个Q15组成合并成一个Q31。

__STATIC_FORCEINLINE void write_q15x2_ia ( q15_t ** pQ15, q31_t value){ q31_t val = value; memcpy (*pQ15, &val, 4); *pQ15 += 2;}
函数__PKHBT也是SIMD指令，作用是将将两个16位的数据合并成32位数据。用C实现的话，如下：
#define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) << 0) & (int32_t)0x0000FFFF) | (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000) )

第118到143行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是源数据地址。
第2个参数是左移或者右移位数，正数是左移，负数是右移。
第3个参数是移位后数据地址。
第4个参数是定点数个数，其实就是执行左移或者右移的次数。

12.5.3       函数arm_shift_q7

函数原型：

函数原型：

1. void arm_shift_q7(
2.    const q7_t * pSrc,
3.          int8_t shiftBits,
4.          q7_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.          uint8_t sign = (shiftBits & 0x80);          /* Sign of shiftBits */
9.
10. #if defined (ARM_MATH_LOOPUNROLL)
11.
12. #if defined (ARM_MATH_DSP)
13.    q7_t in1,  in2,  in3,  in4;                   /* Temporary input variables */
14. #endif
15.
16.    /* Loop unrolling: Compute 4 outputs at a time */
17.    blkCnt = blockSize >> 2U;
18.
19.    /* If the shift value is positive then do right shift else left shift */
20.    if (sign == 0U)
21.    {
22.       while (blkCnt > 0U)
23.       {
24.       /* C = A << shiftBits */
25.
26. #if defined (ARM_MATH_DSP)
27.       /* Read 4 inputs */
28.       in1 = *pSrc++;
29.       in2 = *pSrc++;
30.       in3 = *pSrc++;
31.       in4 = *pSrc++;
32.
33.       /* Pack and store result in destination buffer (in single write) */
34.       write_q7x4_ia (&pDst, __PACKq7(__SSAT((in1 << shiftBits), 8),
35.                                        __SSAT((in2 << shiftBits), 8),
36.                                        __SSAT((in3 << shiftBits), 8),
37.                                        __SSAT((in4 << shiftBits), 8) ));
38. #else
39.       *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
40.       *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
41.       *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
42.       *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
43. #endif
44.
45.       /* Decrement loop counter */
46.       blkCnt--;
47.       }
48.    }
49.    else
50.    {
51.       while (blkCnt > 0U)
52.       {
53.       /* C = A >> shiftBits */
54.
55. #if defined (ARM_MATH_DSP)
56.       /* Read 4 inputs */
57.       in1 = *pSrc++;
58.       in2 = *pSrc++;
59.       in3 = *pSrc++;
60.       in4 = *pSrc++;
61.
62.       /* Pack and store result in destination buffer (in single write) */
63.       write_q7x4_ia (&pDst, __PACKq7((in1 >> -shiftBits),
64.                                        (in2 >> -shiftBits),
65.                                        (in3 >> -shiftBits),
66.                                        (in4 >> -shiftBits) ));
67. #else
68.       *pDst++ = (*pSrc++ >> -shiftBits);
69.       *pDst++ = (*pSrc++ >> -shiftBits);
70.       *pDst++ = (*pSrc++ >> -shiftBits);
71.       *pDst++ = (*pSrc++ >> -shiftBits);
72. #endif
73.
74.       /* Decrement loop counter */
75.       blkCnt--;
76.       }
77.    }
78.
79.    /* Loop unrolling: Compute remaining outputs */
80.    blkCnt = blockSize % 0x4U;
81.
82. #else
83.
84.    /* Initialize blkCnt with number of samples */
85.    blkCnt = blockSize;
86.
87. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
88.
89.    /* If the shift value is positive then do right shift else left shift */
90.    if (sign == 0U)
91.    {
92.       while (blkCnt > 0U)
93.       {
94.       /* C = A << shiftBits */
95.
96.       /* Shift input and store result in destination buffer. */
97.       *pDst++ = (q7_t) __SSAT(((q15_t) *pSrc++ << shiftBits), 8);
98.
99.       /* Decrement loop counter */
100.       blkCnt--;
101.       }
102.    }
103.    else
104.    {
105.       while (blkCnt > 0U)
106.       {
107.       /* C = A >> shiftBits */
108.
109.       /* Shift input and store result in destination buffer. */
110.       *pDst++ = (*pSrc++ >> -shiftBits);
111.
112.       /* Decrement loop counter */
113.       blkCnt--;
114.       }
115.    }
116.
117. }

————————————————
版权声明：本文为CSDN博主「Simon223」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/Simon223/article/details/105635186

函数描述：
这个函数用于求8位定点数的左移或者右移。
函数解析：

第10到87行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 第20到48行，如果参数shiftBits是正数，执行左移。
- 第49到77行，如果蚕食shiftBits是负数，执行右移。
- 第79行，函数write_q7x4_ia的原型如下，作用是写入4次8位数据，并将数据地址递增，方便下次写入。

__STATIC_FORCEINLINE void write_q7x4_ia ( q7_t ** pQ7, q31_t value){ q31_t val = value; memcpy (*pQ7, &val, 4); *pQ7 += 4;}
函数__PACKq7作用是将将4个8位的数据合并成32位数据，实现代码如下：
#define __PACKq7(v0,v1,v2,v3) ( (((int32_t)(v0) << 0) & (int32_t)0x000000FF) | (((int32_t)(v1) << 8) & (int32_t)0x0000FF00) | (((int32_t)(v2) << 16) & (int32_t)0x00FF0000) | (((int32_t)(v3) << 24) & (int32_t)0xFF000000) )

第90到115行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是源数据地址。
第2个参数是左移或者右移位数，正数是左移，负数是右移。
第3个参数是移位后数据地址。
第4个参数是定点数个数，其实就是执行左移或者右移的次数

12.5.4 使用举例

程序设计：

/*********************************************************************************************************** 函数名: DSP_Shift* 功能说明: 移位* 形参: 无* 返回值: 无**********************************************************************************************************/static void DSP_Shift(void){ q31_t pSrcA1 = 0x88886666; q31_t pDst1; q15_t pSrcA2 = 0x8866; q15_t pDst2; q7_t pSrcA3 = 0x86; q7_t pDst3; /*求移位*********************************/ arm_shift_q31(&pSrcA1, 3, &pDst1, 1); printf("arm_shift_q31 = %8xrn", pDst1); arm_shift_q15(&pSrcA2, -3, &pDst2, 1); printf("arm_shift_q15 = %4xrn", pDst2); arm_shift_q7(&pSrcA3, 3, &pDst3, 1); printf("arm_shift_q7 = %2xrn", pDst3); printf("***********************************rn");}
实验现象：

这里特别注意Q31和Q7的计算结果，表示负数已经饱和到了最小值。另外注意，对于负数来说，右移时，右侧补1，左移时，左侧补0。
12.6 减法（Vector Sub）

这部分函数主要用于实现减法，公式描述如下：
pDst[n] = pSrcA[n] - pSrcB[n], 0 <= n < blockSize。
12.6.1       函数arm_sub_f32

函数原型：

1. void arm_sub_f32(
2.    const float32_t * pSrcA,
3.    const float32_t * pSrcB,
4.          float32_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.
9. #if defined(ARM_MATH_NEON)
10.       float32x4_t vec1;
11.       float32x4_t vec2;
12.       float32x4_t res;
13.
14.       /* Compute 4 outputs at a time */
15.       blkCnt = blockSize >> 2U;
16.
17.       while (blkCnt > 0U)
18.       {
19.          /* C = A - B */
20.
21.          /* Subtract and then store the results in the destination buffer. */
22.          vec1 = vld1q_f32(pSrcA);
23.          vec2 = vld1q_f32(pSrcB);
24.          res = vsubq_f32(vec1, vec2);
25.          vst1q_f32(pDst, res);
26.
27.          /* Increment pointers */
28.          pSrcA += 4;
29.          pSrcB += 4;
30.          pDst += 4;
31.
32.          /* Decrement the loop counter */
33.          blkCnt--;
34.       }
35.
36.       /* Tail */
37.       blkCnt = blockSize & 0x3;
38.
39. #else
40. #if defined (ARM_MATH_LOOPUNROLL)
41.
42.    /* Loop unrolling: Compute 4 outputs at a time */
43.    blkCnt = blockSize >> 2U;
44.
45.    while (blkCnt > 0U)
46.    {
47.       /* C = A - B */
48.
49.       /* Subtract and store result in destination buffer. */
50.       *pDst++ = (*pSrcA++) - (*pSrcB++);
51.
52.       *pDst++ = (*pSrcA++) - (*pSrcB++);
53.
54.       *pDst++ = (*pSrcA++) - (*pSrcB++);
55.
56.       *pDst++ = (*pSrcA++) - (*pSrcB++);
57.
58.       /* Decrement loop counter */
59.       blkCnt--;
60.    }
61.
62.    /* Loop unrolling: Compute remaining outputs */
63.    blkCnt = blockSize % 0x4U;
64.
65. #else
66.
67.    /* Initialize blkCnt with number of samples */
68.    blkCnt = blockSize;
69.
70. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
71. #endif /* #if defined(ARM_MATH_NEON) */
72.
73.    while (blkCnt > 0U)
74.    {
75.       /* C = A - B */
76.
77.       /* Subtract and store result in destination buffer. */
78.       *pDst++ = (*pSrcA++) - (*pSrcB++);
79.
80.       /* Decrement loop counter */
81.       blkCnt--;
82.    }
83.
84. }

函数描述：
这个函数用于求32位浮点数的减法。
函数解析：

第9到39行，用于NEON指令集，当前的CM内核不支持。
第40到65行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第73到82行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是减数地址。
第2个参数是被减数地址。
第3个参数是结果地址。
第4个参数是数据块大小，其实就是执行减法的次数。

12.6.2       函数arm_sub_q31

函数原型：

1. void arm_sub_q31(
2.    const q31_t * pSrcA,
3.    const q31_t * pSrcB,
4.          q31_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11.    /* Loop unrolling: Compute 4 outputs at a time */
12.    blkCnt = blockSize >> 2U;
13.
14.    while (blkCnt > 0U)
15.    {
16.       /* C = A - B */
17.
18.       /* Subtract and store result in destination buffer. */
19.       *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
20.
21.       *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
22.
23.       *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
24.
25.       *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
26.
27.       /* Decrement loop counter */
28.       blkCnt--;
29.    }
30.
31.    /* Loop unrolling: Compute remaining outputs */
32.    blkCnt = blockSize % 0x4U;
33.
34. #else
35.
36.    /* Initialize blkCnt with number of samples */
37.    blkCnt = blockSize;
38.
39. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
40.
41.    while (blkCnt > 0U)
42.    {
43.       /* C = A - B */
44.
45.       /* Subtract and store result in destination buffer. */
46.       *pDst++ = __QSUB(*pSrcA++, *pSrcB++);
47.
48.       /* Decrement loop counter */
49.       blkCnt--;
50.    }
51.
52. }

函数描述：
这个函数用于求32位定点数的减法。
函数解析：

这个函数使用了饱和减法__QSUB，所得结果是Q31格式，范围[0x80000000 0x7FFFFFFF]。
第9到34行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第41到50行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是减数地址。
第2个参数是被减数地址。
第3个参数是结果地址。
第4个参数是数据块大小，其实就是执行减法的次数。

12.6.3       函数arm_sub_q15

函数原型：

1. void arm_offset_q31(
2.    const q31_t * pSrc,
3.          q31_t offset,
4.          q31_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11.    /* Loop unrolling: Compute 4 outputs at a time */
12.    blkCnt = blockSize >> 2U;
13.
14.    while (blkCnt > 0U)
15.    {
16.       /* C = A + offset */
17.
18.       /* Add offset and store result in destination buffer. */
19. #if defined (ARM_MATH_DSP)
20.       *pDst++ = __QADD(*pSrc++, offset);
21. #else
22.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
23. #endif
24.
25. #if defined (ARM_MATH_DSP)
26.       *pDst++ = __QADD(*pSrc++, offset);
27. #else
28.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
29. #endif
30.
31. #if defined (ARM_MATH_DSP)
32.       *pDst++ = __QADD(*pSrc++, offset);
33. #else
34.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
35. #endif
36.
37. #if defined (ARM_MATH_DSP)
38.       *pDst++ = __QADD(*pSrc++, offset);
39. #else
40.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
41. #endif
42.
43.       /* Decrement loop counter */
44.       blkCnt--;
45.    }
46.
47.    /* Loop unrolling: Compute remaining outputs */
48.    blkCnt = blockSize % 0x4U;
49.
50. #else
51.
52.    /* Initialize blkCnt with number of samples */
53.    blkCnt = blockSize;
54.
55. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
56.
57.    while (blkCnt > 0U)
58.    {
59.       /* C = A + offset */
60.
61.       /* Add offset and store result in destination buffer. */
62. #if defined (ARM_MATH_DSP)
63.       *pDst++ = __QADD(*pSrc++, offset);
64. #else
65.       *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset);
66. #endif
67.
68.       /* Decrement loop counter */
69.       blkCnt--;
70.    }
71.
72. }

函数描述：
这个函数用于求16位定点数的减法。
函数解析：

第9到48行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 第25行，函数read_q15x2_ia一次读取两个Q15格式的数据，组成一个Q31格式。
- 第32行，函数write_q15x2_ia一次写入两个Q15格式的数据，获得一个Q31格式数据。
- 第32行，函数__QSUB16实现两次16bit的饱和减法。
第55到68行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是减数地址。
第2个参数是被减数地址。
第3个参数是结果地址。
第4个参数是数据块大小，其实就是执行减法的次数。

12.6.4       函数arm_sub_q7

函数原型：

1. void arm_offset_q15(
2.    const q15_t * pSrc,
3.          q15_t offset,
4.          q15_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.
9. #if defined (ARM_MATH_LOOPUNROLL)
10.
11. #if defined (ARM_MATH_DSP)
12.    q31_t offset_packed;                         /* Offset packed to 32 bit */
13.
14.    /* Offset is packed to 32 bit in order to use SIMD32 for addition */
15.    offset_packed = __PKHBT(offset, offset, 16);
16. #endif
17.
18.    /* Loop unrolling: Compute 4 outputs at a time */
19.    blkCnt = blockSize >> 2U;
20.
21.    while (blkCnt > 0U)
22.    {
23.       /* C = A + offset */
24.
25. #if defined (ARM_MATH_DSP)
26.       /* Add offset and store result in destination buffer (2 samples at a time). */
27.       write_q15x2_ia (&pDst, __QADD16(read_q15x2_ia ((q15_t **) &pSrc), offset_packed));
28.       write_q15x2_ia (&pDst, __QADD16(read_q15x2_ia ((q15_t **) &pSrc), offset_packed));
29. #else
30.       *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
31.       *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
32.       *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
33.       *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
34. #endif
35.
36.       /* Decrement loop counter */
37.       blkCnt--;
38.    }
39.
40.    /* Loop unrolling: Compute remaining outputs */
41.    blkCnt = blockSize % 0x4U;
42.
43. #else
44.
45.    /* Initialize blkCnt with number of samples */
46.    blkCnt = blockSize;
47.
48. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
49.
50.    while (blkCnt > 0U)
51.    {
52.       /* C = A + offset */
53.
54.       /* Add offset and store result in destination buffer. */
55. #if defined (ARM_MATH_DSP)
56.       *pDst++ = (q15_t) __QADD16(*pSrc++, offset);
57. #else
58.       *pDst++ = (q15_t) __SSAT(((q31_t) *pSrc++ + offset), 16);
59. #endif
60.
61.       /* Decrement loop counter */
62.       blkCnt--;
63.    }
64.
65. }

函数描述：
这个函数用于求8位定点数的乘法。
函数解析：

第9到35行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 第20行，函数write_q7x4_ia实现一次写入4个Q7格式数据到Q31各种中。

函数__QSUB8实现一次计算4个Q7格式减法。

第42到51行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是减数地址。
第2个参数是被减数地址。
第3个参数是结果地址。
第4个参数是数据块大小，其实就是执行减法的次数。

12.6.5 使用举例

程序设计：

/*********************************************************************************************************** 函数名: DSP_Sub* 功能说明: 减法* 形参: 无* 返回值: 无**********************************************************************************************************/static void DSP_Sub(void){ float32_t pSrcA[5] = {1.0f,1.0f,1.0f,1.0f,1.0f}; float32_t pSrcB[5] = {1.0f,1.0f,1.0f,1.0f,1.0f}; float32_t pDst[5]; q31_t pSrcA1[5] = {1,1,1,1,1}; q31_t pSrcB1[5] = {1,1,1,1,1}; q31_t pDst1[5]; q15_t pSrcA2[5] = {1,1,1,1,1}; q15_t pSrcB2[5] = {1,1,1,1,1}; q15_t pDst2[5]; q7_t pSrcA3[5] = {0x70,1,1,1,1}; q7_t pSrcB3[5] = {0x7f,1,1,1,1}; q7_t pDst3[5]; /*求减法*********************************/ pSrcA[0] += 1.1f; arm_sub_f32(pSrcA, pSrcB, pDst, 5); printf("arm_sub_f32 = %frn", pDst[0]); pSrcA1[0] += 1; arm_sub_q31(pSrcA1, pSrcB1, pDst1, 5); printf("arm_sub_q31 = %drn", pDst1[0]); pSrcA2[0] += 1; arm_sub_q15(pSrcA2, pSrcB2, pDst2, 5); printf("arm_sub_q15 = %drn", pDst2[0]); pSrcA3[0] += 1; arm_sub_q7(pSrcA3, pSrcB3, pDst3, 5); printf("arm_sub_q7 = %drn", pDst3[0]); printf("***********************************rn");}
实验现象：

12.7 比例因子（Vector Scale）

这部分函数主要用于实现数据的比例放大和缩小，浮点数据公式描述如下：
pDst[n] = pSrc[n] * scale, 0 <= n < blockSize.
如果是Q31，Q15，Q7格式的数据，公式描述如下：
pDst[n] = (pSrc[n] * scaleFract) << shift, 0 <= n < blockSize.
这种情况下，比例因子就是：
scale = scaleFract * 2^shift.
   注意，这部分函数支持目标指针和源指针指向相同的缓冲区
12.7.1       函数arm_scale_f32

函数原型：

1. void arm_scale_f32(2.    const float32_t *pSrc,3.          float32_t scale,4.          float32_t *pDst,5.          uint32_t blockSize)6. {7.    uint32_t blkCnt;                            /* Loop counter */8. #if defined(ARM_MATH_NEON_EXPERIMENTAL)9.       float32x4_t vec1;10.       float32x4_t res;11. 12.       /* Compute 4 outputs at a time */13.       blkCnt = blockSize >> 2U;14. 15.       while (blkCnt > 0U)16.       {17.          /* C = A * scale */18. 19.          /* Scale the input and then store the results in the destination buffer. */20.          vec1 = vld1q_f32(pSrc);21.          res = vmulq_f32(vec1, vdupq_n_f32(scale));22.          vst1q_f32(pDst, res);23. 24.          /* Increment pointers */25.          pSrc += 4; 26.          pDst += 4;27.          28.          /* Decrement the loop counter */29.          blkCnt--;30.       }31. 32.       /* Tail */33.       blkCnt = blockSize & 0x3;34. 35. #else36. #if defined (ARM_MATH_LOOPUNROLL)37. 38.    /* Loop unrolling: Compute 4 outputs at a time */39.    blkCnt = blockSize >> 2U;40. 41.    while (blkCnt > 0U)42.    {43.       /* C = A * scale */44. 45.       /* Scale input and store result in destination buffer. */46.       *pDst++ = (*pSrc++) * scale;47. 48.       *pDst++ = (*pSrc++) * scale;49. 50.       *pDst++ = (*pSrc++) * scale;51. 52.       *pDst++ = (*pSrc++) * scale;53. 54.       /* Decrement loop counter */55.       blkCnt--;56.    }57. 58.    /* Loop unrolling: Compute remaining outputs */59.    blkCnt = blockSize % 0x4U;60. 61. #else62. 63.    /* Initialize blkCnt with number of samples */64.    blkCnt = blockSize;65. 66. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */67. #endif /* #if defined(ARM_MATH_NEON_EXPERIMENTAL) */68. 69.    while (blkCnt > 0U)70.    {71.       /* C = A * scale */72. 73.       /* Scale input and store result in destination buffer. */74.       *pDst++ = (*pSrc++) * scale;75. 76.       /* Decrement loop counter */77.       blkCnt--;78.    }79. 80. }
函数描述：
这个函数用于求32位浮点数的比例因子计算。
函数解析：

第8到35行，用于NEON指令集，当前的CM内核不支持。
第36到61行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
第69到78行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是数据源地址。
第2个参数是比例因子
第3个参数是结果地址。
第4个参数是数据块大小，其实就是执行比例因子计算的次数。

12.7.2       函数arm_scale_q31

函数原型：

函数原型：

1. void arm_scale_q31(
2.    const q31_t *pSrc,
3.          q31_t scaleFract,
4.          int8_t shift,
5.          q31_t *pDst,
6.          uint32_t blockSize)
7. {
8.          uint32_t blkCnt;                            /* Loop counter */
9.          q31_t in, out;                               /* Temporary variables */
10.          int8_t kShift = shift + 1;                   /* Shift to apply after scaling */
11.          int8_t sign = (kShift & 0x80);
12.
13. #if defined (ARM_MATH_LOOPUNROLL)
14.
15.    /* Loop unrolling: Compute 4 outputs at a time */
16.    blkCnt = blockSize >> 2U;
17.
18.    if (sign == 0U)
19.    {
20.       while (blkCnt > 0U)
21.       {
22.       /* C = A * scale */
23.
24.       /* Scale input and store result in destination buffer. */
25.       in = *pSrc++;                               /* read input from source */
26.       in = ((q63_t) in * scaleFract) >> 32;       /* multiply input with scaler value */
27.       out = in << kShift;                         /* apply shifting */
28.       if (in != (out >> kShift))                /* saturate the result */
29.          out = 0x7FFFFFFF ^ (in >> 31);
30.       *pDst++ = out;                            /* Store result destination */
31.
32.       in = *pSrc++;
33.       in = ((q63_t) in * scaleFract) >> 32;
34.       out = in << kShift;
35.       if (in != (out >> kShift))
36.          out = 0x7FFFFFFF ^ (in >> 31);
37.       *pDst++ = out;
38.
39.       in = *pSrc++;
40.       in = ((q63_t) in * scaleFract) >> 32;
41.       out = in << kShift;
42.       if (in != (out >> kShift))
43.          out = 0x7FFFFFFF ^ (in >> 31);
44.       *pDst++ = out;
45.
46.       in = *pSrc++;
47.       in = ((q63_t) in * scaleFract) >> 32;
48.       out = in << kShift;
49.       if (in != (out >> kShift))
50.          out = 0x7FFFFFFF ^ (in >> 31);
51.       *pDst++ = out;
52.
53.       /* Decrement loop counter */
54.       blkCnt--;
55.       }
56.    }
57.    else
58.    {
59.       while (blkCnt > 0U)
60.       {
61.       /* C = A * scale */
62.
63.       /* Scale input and store result in destination buffer. */
64.       in = *pSrc++;                               /* read four inputs from source */
65.       in = ((q63_t) in * scaleFract) >> 32;       /* multiply input with scaler value */
66.       out = in >> -kShift;                      /* apply shifting */
67.       *pDst++ = out;                            /* Store result destination */
68.
69.       in = *pSrc++;
70.       in = ((q63_t) in * scaleFract) >> 32;
71.       out = in >> -kShift;
72.       *pDst++ = out;
73.
74.       in = *pSrc++;
75.       in = ((q63_t) in * scaleFract) >> 32;
76.       out = in >> -kShift;
77.       *pDst++ = out;
78.
79.       in = *pSrc++;
80.       in = ((q63_t) in * scaleFract) >> 32;
81.       out = in >> -kShift;
82.       *pDst++ = out;
83.
84.       /* Decrement loop counter */
85.       blkCnt--;
86.       }
87.    }
88.
89.    /* Loop unrolling: Compute remaining outputs */
90.    blkCnt = blockSize % 0x4U;
91.
92. #else
93.
94.    /* Initialize blkCnt with number of samples */
95.    blkCnt = blockSize;
96.
97. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
98.
99.    if (sign == 0U)
100.    {
101.       while (blkCnt > 0U)
102.       {
103.       /* C = A * scale */
104.
105.       /* Scale input and store result in destination buffer. */
106.       in = *pSrc++;
107.       in = ((q63_t) in * scaleFract) >> 32;
108.       out = in << kShift;
109.       if (in != (out >> kShift))
110.             out = 0x7FFFFFFF ^ (in >> 31);
111.       *pDst++ = out;
112.
113.       /* Decrement loop counter */
114.       blkCnt--;
115.       }
116.    }
117.    else
118.    {
119.       while (blkCnt > 0U)
120.       {
121.       /* C = A * scale */
122.
123.       /* Scale input and store result in destination buffer. */
124.       in = *pSrc++;
125.       in = ((q63_t) in * scaleFract) >> 32;
126.       out = in >> -kShift;
127.       *pDst++ = out;
128.
129.       /* Decrement loop counter */
130.       blkCnt--;
131.       }
132.    }
133.
134. }


函数描述：
这个函数用于求32位定点数的比例因子计算。
函数解析：

第13到92行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 第18行到56行，如果函数的移位形参shift是正数，那么执行左移。
- 第57行到87行，如果函数的移位形参shift是负数，那么执行右移。
- 这里特别注意一点，两个Q31函数相乘是2.62格式，而函数的结果要是Q31格式的，所以程序里面做了专门处理。

第26行，左移32位，那么结果就是2.30格式。
第27行，kShift = shift + 1，也就是out = in <<（shift + 1）多执行了一次左移操作。
相当于2.30格式，转换为2.31格式。

- 第28到29行，做了一个Q31的饱和处理，也就是将2.31格式转换为1.31。

数值的左移仅支持将其左移后再右移相应的位数后数值不变的情况，如果不满足这个条件，那么要对输出结果做饱和运算，这里分两种情况：
out = 0x7FFFFFFF ^ (in >> 31) （in是正数）
   = 0x7FFFFFFF ^ 0x00000000
   = 0x7FFFFFFF
out = 0x7FFFFFFF ^ (in >> 31) （in是负数）
   = 0x7FFFFFFF ^ 0xFFFFFFFF
   = 0x80000000

第99到132行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是数据源地址。
第2个参数是比例因子。
第3个参数是移位参数，正数表示右移，负数表示左移。
第4参数是结果地址。
第5参数是数据块大小，其实就是执行比例因子计算的次数。

12.7.3       函数arm_scale_q15

函数原型：

1. void arm_shift_q15(
2.    const q15_t * pSrc,
3.          int8_t shiftBits,
4.          q15_t * pDst,
5.          uint32_t blockSize)
6. {
7.          uint32_t blkCnt;                            /* Loop counter */
8.          uint8_t sign = (shiftBits & 0x80);          /* Sign of shiftBits */
9.
10. #if defined (ARM_MATH_LOOPUNROLL)
11.
12. #if defined (ARM_MATH_DSP)
13.    q15_t in1, in2;                               /* Temporary input variables */
14. #endif
15.
16.    /* Loop unrolling: Compute 4 outputs at a time */
17.    blkCnt = blockSize >> 2U;
18.
19.    /* If the shift value is positive then do right shift else left shift */
20.    if (sign == 0U)
21.    {
22.       while (blkCnt > 0U)
23.       {
24.       /* C = A << shiftBits */
25.
26. #if defined (ARM_MATH_DSP)
27.       /* read 2 samples from source */
28.       in1 = *pSrc++;
29.       in2 = *pSrc++;
30.
31.       /* Shift the inputs and then store the results in the destination buffer. */
32. #ifndef ARM_MATH_BIG_ENDIAN
33.       write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),
34.                                        __SSAT((in2 << shiftBits), 16), 16));
35. #else
36.       write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),
37.                                        __SSAT((in1 << shiftBits), 16), 16));
38. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
39.
40.       /* read 2 samples from source */
41.       in1 = *pSrc++;
42.       in2 = *pSrc++;
43.
44. #ifndef ARM_MATH_BIG_ENDIAN
45.       write_q15x2_ia (&pDst, __PKHBT(__SSAT((in1 << shiftBits), 16),
46.                                        __SSAT((in2 << shiftBits), 16), 16));
47. #else
48.       write_q15x2_ia (&pDst, __PKHBT(__SSAT((in2 << shiftBits), 16),
49.                                        __SSAT((in1 << shiftBits), 16), 16));
50. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
51.
52. #else
53.       *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
54.       *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
55.       *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
56.       *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
57. #endif
58.
59.       /* Decrement loop counter */
60.       blkCnt--;
61.       }
62.    }
63.    else
64.    {
65.       while (blkCnt > 0U)
66.       {
67.       /* C = A >> shiftBits */
68.
69. #if defined (ARM_MATH_DSP)
70.       /* read 2 samples from source */
71.       in1 = *pSrc++;
72.       in2 = *pSrc++;
73.
74.       /* Shift the inputs and then store the results in the destination buffer. */
75. #ifndef ARM_MATH_BIG_ENDIAN
76.       write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),
77.                                        (in2 >> -shiftBits), 16));
78. #else
79.       write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),
80.                                        (in1 >> -shiftBits), 16));
81. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
82.
83.       /* read 2 samples from source */
84.       in1 = *pSrc++;
85.       in2 = *pSrc++;
86.
87. #ifndef ARM_MATH_BIG_ENDIAN
88.       write_q15x2_ia (&pDst, __PKHBT((in1 >> -shiftBits),
89.                                        (in2 >> -shiftBits), 16));
90. #else
91.       write_q15x2_ia (&pDst, __PKHBT((in2 >> -shiftBits),
92.                                        (in1 >> -shiftBits), 16));
93. #endif /* #ifndef ARM_MATH_BIG_ENDIAN */
94.
95. #else
96.       *pDst++ = (*pSrc++ >> -shiftBits);
97.       *pDst++ = (*pSrc++ >> -shiftBits);
98.       *pDst++ = (*pSrc++ >> -shiftBits);
99.       *pDst++ = (*pSrc++ >> -shiftBits);
100. #endif
101.
102.       /* Decrement loop counter */
103.       blkCnt--;
104.       }
105.    }
106.
107.    /* Loop unrolling: Compute remaining outputs */
108.    blkCnt = blockSize % 0x4U;
109.
110. #else
111.
112.    /* Initialize blkCnt with number of samples */
113.    blkCnt = blockSize;
114.
115. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
116.
117.    /* If the shift value is positive then do right shift else left shift */
118.    if (sign == 0U)
119.    {
120.       while (blkCnt > 0U)
121.       {
122.       /* C = A << shiftBits */
123.
124.       /* Shift input and store result in destination buffer. */
125.       *pDst++ = __SSAT(((q31_t) *pSrc++ << shiftBits), 16);
126.
127.       /* Decrement loop counter */
128.       blkCnt--;
129.       }
130.    }
131.    else
132.    {
133.       while (blkCnt > 0U)
134.       {
135.       /* C = A >> shiftBits */
136.
137.       /* Shift input and store result in destination buffer. */
138.       *pDst++ = (*pSrc++ >> -shiftBits);
139.
140.       /* Decrement loop counter */
141.       blkCnt--;
142.       }
143.    }
144.
145. }

函数描述：

这个函数用于求16位定点数的比例因子计算。

函数解析：

  第10到110行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
  第20到62行，如果函数的移位形参shiftBits是正数，执行左移。
第63到105行，如果函数的移位形参shiftBits是负数，执行右移。
  第33行，函数__PKHBT也是SIMD指令，作用是将将两个16位的数据合并成32位数据。用C实现的话，如下：
  #define __PKHBT(ARG1, ARG2, ARG3) ( (((int32_t)(ARG1) << 0) & (int32_t)0x0000FFFF) |
                                    (((int32_t)(ARG2) << ARG3) & (int32_t)0xFFFF0000)  )

函数write_q15x2_ia的原型如下：

__STATIC_FORCEINLINE void write_q15x2_ia (
  q15_t ** pQ15,
  q31_t value)
{
  q31_t val = value;

  memcpy (*pQ15, &val, 4);
  *pQ15 += 2;
}

作用是写入两次Q15格式数据，组成一个Q31格式数据，并将数据地址递增，方便下次写入。

  第118到143行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理
函数参数：

  第1个参数是数据源地址。
  第2个参数是比例因子。
  第3个参数是移位参数，正数表示右移，负数表示左移。
  第4参数是结果地址。
  第5参数是数据块大小，其实就是执行比例因子计算的次数。
12.7.4       函数arm_scale_q7
函数原型：

1. void arm_scale_q7(
2.    const q7_t * pSrc,
3.          q7_t scaleFract,
4.          int8_t shift,
5.          q7_t * pDst,
6.          uint32_t blockSize)
7. {
8.          uint32_t blkCnt;                            /* Loop counter */
9.          int8_t kShift = 7 - shift;                   /* Shift to apply after scaling */
10.
11. #if defined (ARM_MATH_LOOPUNROLL)
12.
13. #if defined (ARM_MATH_DSP)
14.    q7_t in1,  in2,  in3,  in4;                   /* Temporary input variables */
15.    q7_t out1, out2, out3, out4;                /* Temporary output variables */
16. #endif
17.
18.    /* Loop unrolling: Compute 4 outputs at a time */
19.    blkCnt = blockSize >> 2U;
20.
21.    while (blkCnt > 0U)
22.    {
23.       /* C = A * scale */
24.
25. #if defined (ARM_MATH_DSP)
26.       /* Reading 4 inputs from memory */
27.       in1 = *pSrc++;
28.       in2 = *pSrc++;
29.       in3 = *pSrc++;
30.       in4 = *pSrc++;
31.
32.       /* Scale inputs and store result in the temporary variable. */
33.       out1 = (q7_t) (__SSAT(((in1) * scaleFract) >> kShift, 8));
34.       out2 = (q7_t) (__SSAT(((in2) * scaleFract) >> kShift, 8));
35.       out3 = (q7_t) (__SSAT(((in3) * scaleFract) >> kShift, 8));
36.       out4 = (q7_t) (__SSAT(((in4) * scaleFract) >> kShift, 8));
37.
38.       /* Pack and store result in destination buffer (in single write) */
39.       write_q7x4_ia (&pDst, __PACKq7(out1, out2, out3, out4));
40. #else
41.       *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
42.       *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
43.       *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
44.       *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
45. #endif
46.
47.       /* Decrement loop counter */
48.       blkCnt--;
49.    }
50.
51.    /* Loop unrolling: Compute remaining outputs */
52.    blkCnt = blockSize % 0x4U;
53.
54. #else
55.
56.    /* Initialize blkCnt with number of samples */
57.    blkCnt = blockSize;
58.
59. #endif /* #if defined (ARM_MATH_LOOPUNROLL) */
60.
61.    while (blkCnt > 0U)
62.    {
63.       /* C = A * scale */
64.
65.       /* Scale input and store result in destination buffer. */
66.       *pDst++ = (q7_t) (__SSAT((((q15_t) *pSrc++ * scaleFract) >> kShift), 8));
67.
68.       /* Decrement loop counter */
69.       blkCnt--;
70.    }
71.
72. }

函数描述：
这个函数用于求8位定点数的比例因子计算。
函数解析：

第9行，这个变量设计很巧妙，这样下面处理正数左移和负数右移就很方面了，可以直接使用一个右移就可以实现。
第11到54行，实现四个为一组进行计数，好处是加快执行速度，降低while循环占用时间。
- 33到36行，对输入的数据做8位的饱和处理。比如：

(in1 * scaleFract) >> kShift
= (in1 * scaleFract) * 2^（shift - 7）
= ((in1 * scaleFract) >>7）*（2^shift）
源数据in1格式Q7乘以比例因子scaleFract格式Q7，也就是2.14格式，再右移7bit就是2.7格式，
此时如果shift正数，那么就是当前结果左移shitf位，如果shift是负数，那么就是当前结果右移shift位。最终结果通过__SSAT做个饱和运算。

第61到70行，四个为一组剩余数据的处理或者不采用四个为一组时数据处理。

函数参数：

第1个参数是数据源地址。
第2个参数是比例因子。
第3个参数是移位参数，正数表示右移，负数表示左移。
第4参数是结果地址。
第5参数是数据块大小，其实就是执行比例因子计算的次数。

12.7.5 使用举例

程序设计：

/*********************************************************************************************************** 函数名: DSP_Scale* 功能说明: 比例因子* 形参: 无* 返回值: 无**********************************************************************************************************/static void DSP_Scale(void){ float32_t pSrcA[5] = {1.0f,1.0f,1.0f,1.0f,1.0f}; float32_t scale = 0.0f; float32_t pDst[5]; q31_t pSrcA1[5] = {0x6fffffff,1,1,1,1}; q31_t scale1 = 0x6fffffff; q31_t pDst1[5]; q15_t pSrcA2[5] = {0x6fff,1,1,1,1}; q15_t scale2 = 0x6fff; q15_t pDst2[5]; q7_t pSrcA3[5] = {0x70,1,1,1,1}; q7_t scale3 = 0x6f; q7_t pDst3[5]; /*求比例因子计算*********************************/ scale += 0.1f; arm_scale_f32(pSrcA, scale, pDst, 5); printf("arm_scale_f32 = %frn", pDst[0]); scale1 += 1; arm_scale_q31(pSrcA1, scale1, 0, pDst1, 5); printf("arm_scale_q31 = %xrn", pDst1[0]); scale2 += 1; arm_scale_q15(pSrcA2, scale2, 0, pDst2, 5); printf("arm_scale_q15 = %xrn", pDst2[0]); scale3 += 1; arm_scale_q7(pSrcA3, scale3, 0, pDst3, 5); printf("arm_scale_q7 = %xrn", pDst3[0]); printf("***********************************rn");}
实验现象：

12.8 实验例程说明（MDK）

配套例子：
V6-207_DSP基础运算（相反数，偏移，移位，减法和比例因子）
实验目的：

学习基础运算（相反数，偏移，移位，减法和比例因子）

实验内容：

启动一个自动重装软件定时器，每100ms翻转一次LED2。
按下按键K1, DSP求相反数运算。
按下按键K2, DSP求偏移运算。
按下按键K3, DSP求移位运算。
按下摇杆OK键, DSP求减法运算。
按下摇杆上键, DSP比例因子运算。

使用AC6注意事项
特别注意附件章节C的问题
上电后串口打印的信息：
波特率 115200，数据位 8，奇偶校验位无，停止位 1。
详见本章的3.5，4.5，5.4和6.5小节。
程序设计：
系统栈大小分配：

硬件外设初始化

硬件外设的初始化是在 bsp.c 文件实现：

/*
*********************************************************************************************************
* 函数名: bsp_Init
* 功能说明: 初始化所有的硬件设备。该函数配置CPU寄存器和外设的寄存器并初始化一些全局变量。只需要调用一次
* 形参：无
* 返回值: 无
*********************************************************************************************************
*/
void bsp_Init(void)
{
/*
   STM32F407 HAL 库初始化，此时系统用的还是F407自带的16MHz，HSI时钟:
   - 调用函数HAL_InitTick，初始化滴答时钟中断1ms。
   - 设置NVIV优先级分组为4。
   */
HAL_Init();

/*
   配置系统时钟到168MHz
   - 切换使用HSE。
   - 此函数会更新全局变量SystemCoreClock，并重新配置HAL_InitTick。
*/
SystemClock_Config();

/*
   Event Recorder：
   - 可用于代码执行时间测量，MDK5.25及其以上版本才支持，IAR不支持。
   - 默认不开启，如果要使能此选项，务必看V5开发板用户手册第8章
*/
#if Enable_EventRecorder == 1
/* 初始化EventRecorder并开启 */
EventRecorderInitialize(EventRecordAll, 1U);
EventRecorderStart();
#endif

bsp_InitKey();       /* 按键初始化，要放在滴答定时器之前，因为按钮检测是通过滴答定时器扫描 */
bsp_InitTimer();    /* 初始化滴答定时器 */
bsp_InitUart(); /* 初始化串口 */
bsp_InitExtIO(); /* 初始化扩展IO */
bsp_InitLed();       /* 初始化LED */
}

  主功能：

主程序实现如下操作：

  启动一个自动重装软件定时器，每100ms翻转一次LED2。
  按下按键K1, DSP求相反数运算。
  按下按键K2, DSP求偏移运算。
  按下按键K3, DSP求移位运算。
  按下摇杆OK键, DSP求减法运算。
  按下摇杆上键, DSP比例因子运算。

/*
*********************************************************************************************************
* 函数名: main
* 功能说明: c程序入口
* 形参：无
* 返回值: 错误代码(无需处理)
*********************************************************************************************************
*/
int main(void)
{
uint8_t ucKeyCode;       /* 按键代码 */

bsp_Init();       /* 硬件初始化 */
PrintfLogo(); /* 打印例程信息到串口1 */

PrintfHelp(); /* 打印操作提示信息 */

bsp_StartAutoTimer(0, 100); /* 启动1个100ms的自动重装的定时器 */

/* 进入主程序循环体 */
while (1)
{
      bsp_Idle();       /* 这个函数在bsp.c文件。用户可以修改这个函数实现CPU休眠和喂狗 */

      /* 判断定时器超时时间 */
      if (bsp_CheckTimer(0))
      {
         /* 每隔100ms 进来一次 */
         bsp_LedToggle(2);
      }

      ucKeyCode = bsp_GetKey(); /* 读取键值, 无键按下时返回 KEY_NONE = 0 */
      if (ucKeyCode != KEY_NONE)
      {
         switch (ucKeyCode)
         {
            case KEY_DOWN_K1:          /* K1键按下，求相反数 */
                  DSP_Negate();
                  break;

            case KEY_DOWN_K2:          /* K2键按下, 求偏移 */
                  DSP_Offset();
                  break;

            case KEY_DOWN_K3:          /* K3键按下，求移位 */
                  DSP_Shift();
                  break;

            case JOY_DOWN_OK:          /* 摇杆OK键按下，求减法 */
                  DSP_Sub();
                  break;

            case JOY_DOWN_U:          /* 摇杆上键按下，求比例因子计算 */
                  DSP_Scale();
                  break;

            default:
                  /* 其他的键值不处理 */
                  break;
         }
      }
}
}

12.9 实验例程说明（IAR）

配套例子：
V6-207_DSP基础运算（相反数，偏移，移位，减法和比例因子）
实验目的：

学习基础运算（相反数，偏移，移位，减法和比例因子）

实验内容：

启动一个自动重装软件定时器，每100ms翻转一次LED2。
按下按键K1, DSP求相反数运算。
按下按键K2, DSP求偏移运算。
按下按键K3, DSP求移位运算。
按下摇杆OK键, DSP求减法运算。
按下摇杆上键, DSP比例因子运算。

使用AC6注意事项
特别注意附件章节C的问题
上电后串口打印的信息：
波特率 115200，数据位 8，奇偶校验位无，停止位 1。
详见本章的3.5，4.5，5.4和6.5小节。
程序设计：
  系统栈大小分配：

硬件外设初始化

硬件外设的初始化是在 bsp.c 文件实现：

/*
*********************************************************************************************************
* 函数名: bsp_Init
* 功能说明: 初始化所有的硬件设备。该函数配置CPU寄存器和外设的寄存器并初始化一些全局变量。只需要调用一次
* 形参：无
* 返回值: 无
*********************************************************************************************************
*/
void bsp_Init(void)
{
/*
   STM32F407 HAL 库初始化，此时系统用的还是F407自带的16MHz，HSI时钟:
   - 调用函数HAL_InitTick，初始化滴答时钟中断1ms。
   - 设置NVIV优先级分组为4。
   */
HAL_Init();

/*
   配置系统时钟到168MHz
   - 切换使用HSE。
   - 此函数会更新全局变量SystemCoreClock，并重新配置HAL_InitTick。
*/
SystemClock_Config();

/*
   Event Recorder：
   - 可用于代码执行时间测量，MDK5.25及其以上版本才支持，IAR不支持。
   - 默认不开启，如果要使能此选项，务必看V5开发板用户手册第8章
*/
#if Enable_EventRecorder == 1
/* 初始化EventRecorder并开启 */
EventRecorderInitialize(EventRecordAll, 1U);
EventRecorderStart();
#endif

bsp_InitKey();       /* 按键初始化，要放在滴答定时器之前，因为按钮检测是通过滴答定时器扫描 */
bsp_InitTimer();    /* 初始化滴答定时器 */
bsp_InitUart(); /* 初始化串口 */
bsp_InitExtIO(); /* 初始化扩展IO */
bsp_InitLed();       /* 初始化LED */
}

  主功能：

主程序实现如下操作：

  按下按键K1, DSP求绝对值运算。
  按下按键K2, DSP求和运算。
  按下按键K3, DSP求点乘运算。
  按下摇杆OK键, DSP求乘积运算。

/*
*********************************************************************************************************
* 函数名: main
* 功能说明: c程序入口
* 形参: 无
* 返回值: 错误代码(无需处理)
*********************************************************************************************************
*/
int main(void)
{
uint8_t ucKeyCode;       /* 按键代码 */
uint8_t ucValue;

bsp_Init(); /* 硬件初始化 */
PrintfLogo(); /* 打印例程信息到串口1 */

PrintfHelp(); /* 打印操作提示信息 */

bsp_StartAutoTimer(0, 100); /* 启动1个100ms的自动重装的定时器 */

/* 进入主程序循环体 */
while (1)
{
      bsp_Idle();       /* 这个函数在bsp.c文件。用户可以修改这个函数实现CPU休眠和喂狗 */

      /* 判断定时器超时时间 */
      if (bsp_CheckTimer(0))
      {
         /* 每隔100ms 进来一次 */
         bsp_LedToggle(2);
      }

      ucKeyCode = bsp_GetKey(); /* 读取键值, 无键按下时返回 KEY_NONE = 0 */
      if (ucKeyCode != KEY_NONE)
      {
         switch (ucKeyCode)
         {
            case KEY_DOWN_K1:          /* K1键按下，求绝对值 */
                  DSP_ABS();
                  break;

            case KEY_DOWN_K2:          /* K2键按下, 求和 */
                  DSP_Add();
                  break;

            case KEY_DOWN_K3:          /* K3键按下，求点乘 */
                  DSP_DotProduct();
                  break;

            case JOY_DOWN_OK:          /* 摇杆OK键按下，求乘积 */
                  DSP_Multiplication();
                  break;

            default:
                  /* 其他的键值不处理 */
                  break;
         }
      }
}
}

12.10 总结

DSP基础函数就跟大家讲这么多，希望初学的同学多多的联系，并在自己以后的项目中多多使用，效果必将事半功倍。

更多回帖

笔画张

函数中的相反数/偏移/移位/减法和比例因子是什么？

回帖（1）

张英

相关问答

ADXRS652三点校准计算过程比例因子S0是如何计算出来的

ADXL313静止时输出数据波动太大

DSP中函数的通用格式介绍

阻抗匹配与品质因子疑惑？利用品质因子来放大信号？

哪一个是减法器？负反馈在减法器电路中的原理？

PID参数的调解，对于比例，积分，微分，都是对于偏移量的操作对吗？

数电课程设计————盲人报时钟设计怎么弄？

使用Labview中自带了幂函数拟合函数发现偏移量没有输出是什么原因？

为什么DS1863/DS1865的内部校准及右移位具有优势？

请问STM32中如何实现减法运算？

20万+工程师都在用，免费PCB检查工具