完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
转dsp系列教程
本期教程主要讲基本函数中的相反数,偏移,位移,减法和比例因子。 9.1 相反数(Vector Negate) 9.2 求和(Vector Offset) 9.3 点乘(Vector Shift) 9.4 减法(Vector Sub) 9.5 比例因子(Vector Scale) 9.6 BasicMathFunctions的重要说明 9.7 总结 9.1 相反数(Vector Negate) 这部分函数主要用于求相反数,公式描述如下: pDst[n] = -pSrc[n], 0 <= n < blockSize. 特别注意,这部分函数支持目标指针和源指针指向相同的缓冲区。 9.1.1 arm_negate_f32 这个函数用于求32位浮点数的相反数,源代码分析如下: [url=]复制代码[/url]
1. 浮点数的相反数求解比较简单,直接在相应的变量前加上负号即可。 |
|
相关推荐
|
|
9.1.2 arm_negate_q31
这个函数用于求32位定点数的相反数,源代码分析如下: 复制代码 /** * @brief Negates the elements of a Q31 vector. * @param[in] *pSrc points to the input vector * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * The Q31 value -1 (0x80000000) will be saturated to the maximum allowable positive value 0x7FFFFFFF. */ void arm_negate_q31( q31_t * pSrc, q31_t * pDst, uint32_t blockSize) { q31_t in; /* Temporary variable */ uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t in1, in2, in3, in4; /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = -A */ /* Negate and then store the results in the destination buffer. */ in1 = *pSrc++; in2 = *pSrc++; in3 = *pSrc++; in4 = *pSrc++; *pDst++ = __QSUB(0, in1); (2) *pDst++ = __QSUB(0, in2); *pDst++ = __QSUB(0, in3); *pDst++ = __QSUB(0, in4); /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; #endif /* #ifndef ARM_MATH_CM0_FAMILY */ while(blkCnt > 0u) { /* C = -A */ /* Negate and then store the result in the destination buffer. */ in = *pSrc++; *pDst++ = (in == INT32_MIN) ? INT32_MAX : -in; /* Decrement the loop counter */ blkCnt--; } } 1. 这个函数使用了饱和运算。 饱和运算数值0x80000000将变成0x7FFFFFFF。 2. 饱和运算__QSUB我们在上一章已经详细讲述了,这就就是实现数值0减去相应的参数变量。 |
|
|
|
|
|
9.1.3 arm_negate_q15
这个函数用于求16位定点数的相反数,源代码分析如下: 复制代码 /** * @brief Negates the elements of a Q15 vector. * @param[in] *pSrc points to the input vector * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * par Conditions for optimum performance * Input and output buffers should be aligned by 32-bit * * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * The Q15 value -1 (0x8000) will be saturated to the maximum allowable positive value 0x7FFF. */ void arm_negate_q15( q15_t * pSrc, q15_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ q15_t in; #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t in1, in2; /* Temporary variables */ /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = -A */ /* Read two inputs at a time */ (2) in1 = _SIMD32_OFFSET(pSrc); in2 = _SIMD32_OFFSET(pSrc + 2); /* negate two samples at a time */ (3) in1 = __QSUB16(0, in1); /* negate two samples at a time */ in2 = __QSUB16(0, in2); /* store the result to destination 2 samples at a time */ (4) _SIMD32_OFFSET(pDst) = in1; /* store the result to destination 2 samples at a time */ _SIMD32_OFFSET(pDst + 2) = in2; /* update pointers to process next samples */ pSrc += 4u; pDst += 4u; /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; #endif /* #ifndef ARM_MATH_CM0_FAMILY */ while(blkCnt > 0u) { /* C = -A */ /* Negate and then store the result in the destination buffer. */ in = *pSrc++; *pDst++ = (in == (q15_t) 0x8000) ? 0x7fff : -in; /* Decrement the loop counter */ blkCnt--; } } 1. 这个函数使用了饱和运算。 饱和运算数值0x8000将变成0x7FFF。 2. 一次读取两个Q15格式的数据。 3. 由于__QSUB是SIMD指令,这里可以实现一次计算两个Q15数据的相反数。 |
|
|
|
|
|
4. 这里实现一次赋值两个Q15数据。
9.1.4 arm_negate_q7 这个函数用于求8位定点数的相反数,源代码分析如下: 复制代码 /** * @brief Negates the elements of a Q7 vector. * @param[in] *pSrc points to the input vector * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * The Q7 value -1 (0x80) will be saturated to the maximum allowable positive value 0x7F. */ void arm_negate_q7( q7_t * pSrc, q7_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ q7_t in; #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t input; /* Input values1-4 */ q31_t zero = 0x00000000; (2) /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = -A */ /* Read four inputs */ input = *__SIMD32(pSrc)++; (3) /* Store the Negated results in the destination buffer in a single cycle by packing the results */ *__SIMD32(pDst)++ = __QSUB8(zero, input); (4) /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; #endif /* #ifndef ARM_MATH_CM0_FAMILY */ while(blkCnt > 0u) { /* C = -A */ /* Negate and then store the results in the destination buffer. */ in = *pSrc++; *pDst++ = (in == (q7_t) 0x80) ? 0x7f : -in; /* Decrement the loop counter */ blkCnt--; } } 1. 这个函数使用了饱和运算。 饱和运算数值0x80将变成0x7F。 2. 给局部变量赋初值,防止默认初始值不是0,所以从某种意义上来说,给变量赋初值是很有必要的。 3. 一次读取4个Q7格式的数据到input里面。 4. 通过__QSUB8实现一次计算四个Q7格式数据的相反数。 |
|
|
|
|
|
9.1.5 实例讲解
实验目的: 1. 四种类型数据的相反数。 实验内容: 1. 按下K1键, 串口打印输出结果 实验现象: 通过窗口上位机软件SecureCRT(V5光盘里面有此软件)查看打印信息现象如下: |
|
|
|
|
|
程序设计:
复制代码 /* ********************************************************************************************************* * 函 数 名: DSP_Negate * 功能说明: 求相反数 * 形 参:无 * 返 回 值: 无 ********************************************************************************************************* */ static void DSP_Negate(void) { static float32_t pSrc; static float32_t pDst; static q31_t pSrc1; static q31_t pDst1; static q15_t pSrc2; static q15_t pDst2; static q7_t pSrc3 = 127; /* 为了说明问题,在这里设置初始值为127,然后查看0x80是否饱和为0x7F */ static q7_t pDst3; pSrc -= 1.23f; arm_negate_f32(&pSrc, &pDst, 1); printf("arm_negate_f32 = %frn", pDst); pSrc1 -= 1; arm_negate_q31(&pSrc1, &pDst1, 1); printf("arm_negate_q31 = %drn", pDst1); pSrc2 -= 1; arm_negate_q15(&pSrc2, &pDst2, 1); printf("arm_negate_q15 = %drn", pDst2); pSrc3 += 1; arm_negate_q7(&pSrc3, &pDst3, 1); printf("arm_negate_q7 = %drn", pDst3); printf("***********************************rn"); } |
|
|
|
|
|
9.2 偏移(Vector Offset)
这部分函数主要用于求相反数,公式描述如下: pDst[n] = pSrc[n] + offset, 0 <= n < blockSize. 注意,这部分函数支持目标指针和源指针指向相同的缓冲区。 9.2.1 arm_offset_f32 这个函数用于求32位浮点数的偏移,源代码分析如下: 复制代码 /** * @brief Adds a constant offset to a floating-point vector. * @param[in] *pSrc points to the input vector * @param[in] offset is the offset to be added * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. */ void arm_offset_f32( float32_t * pSrc, float32_t offset, float32_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ float32_t in1, in2, in3, in4; /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A + offset */ (1) /* Add offset and then store the results in the destination buffer. */ /* read samples from source */ in1 = *pSrc; in2 = *(pSrc + 1); /* add offset to input */ in1 = in1 + offset; /* read samples from source */ in3 = *(pSrc + 2); /* add offset to input */ in2 = in2 + offset; /* read samples from source */ in4 = *(pSrc + 3); /* add offset to input */ in3 = in3 + offset; /* store result to destination */ *pDst = in1; /* add offset to input */ in4 = in4 + offset; /* store result to destination */ *(pDst + 1) = in2; /* store result to destination */ *(pDst + 2) = in3; /* store result to destination */ *(pDst + 3) = in4; /* update pointers to process next samples */ pSrc += 4u; pDst += 4u; /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; #endif /* #ifndef ARM_MATH_CM0_FAMILY */ while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the result in the destination buffer. */ *pDst++ = (*pSrc++) + offset; /* Decrement the loop counter */ blkCnt--; } } 1. 浮点数的偏移值求解比较简单,加上相应的偏移值并赋值给目标变量即可。 |
|
|
|
|
|
9.2.2 arm_offset_q31
这个函数用于求32位定点数的偏移值,源代码分析如下: 复制代码 /** * @brief Adds a constant offset to a Q31 vector. * @param[in] *pSrc points to the input vector * @param[in] offset is the offset to be added * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] are saturated. */ void arm_offset_q31( q31_t * pSrc, q31_t offset, q31_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t in1, in2, in3, in4; /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the results in the destination buffer. */ in1 = *pSrc++; in2 = *pSrc++; in3 = *pSrc++; in4 = *pSrc++; *pDst++ = __QADD(in1, offset); (2) *pDst++ = __QADD(in2, offset); *pDst++ = __QADD(in3, offset); *pDst++ = __QADD(in4, offset); /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the result in the destination buffer. */ *pDst++ = __QADD(*pSrc++, offset); /* Decrement the loop counter */ blkCnt--; } #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the result in the destination buffer. */ *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrc++ + offset); /* Decrement the loop counter */ blkCnt--; } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 这个函数使用了饱和运算。 饱和运算数值0x80000000将变成0x7FFFFFFF。 2. 指令__QADD我们在上章教程中已经讲解过,这里是实现两个参数相加。 |
|
|
|
|
|
9.2.3 arm_offset_q15
这个函数用于求16位定点数的偏移,源代码分析如下: 复制代码 /** * @brief Adds a constant offset to a Q15 vector. * @param[in] *pSrc points to the input vector * @param[in] offset is the offset to be added * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * Results outside of the allowable Q15 range [0x8000 0x7FFF] are saturated. */ void arm_offset_q15( q15_t * pSrc, q15_t offset, q15_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t offset_packed; /* Offset packed to 32 bit */ /*loop Unrolling */ blkCnt = blockSize >> 2u; /* Offset is packed to 32 bit in order to use SIMD32 for addition */ offset_packed = __PKHBT(offset, offset, 16); (2) /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the results in the destination buffer, 2 samples at a time. */ *__SIMD32(pDst)++ = __QADD16(*__SIMD32(pSrc)++, offset_packed); (3) *__SIMD32(pDst)++ = __QADD16(*__SIMD32(pSrc)++, offset_packed); /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the results in the destination buffer. */ *pDst++ = (q15_t) __QADD16(*pSrc++, offset); /* Decrement the loop counter */ blkCnt--; } #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the results in the destination buffer. */ *pDst++ = (q15_t) __SSAT(((q31_t) * pSrc++ + offset), 16); /* Decrement the loop counter */ blkCnt--; } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 这个函数使用了饱和运算。 饱和运算数值0x8000将变成0x7FFF。 2. 将两个Q15格式的变量合并成一个Q31格式的数据,方便指令__QADD16的调用。 3. 由于__QADD16是SIMD指令,这里调用一次就能实现两个Q15格式数据的计算。 |
|
|
|
|
|
9.2.4 arm_offset_q7
这个函数用于求8位定点数的偏移,源代码分析如下: 复制代码 /** * @brief Adds a constant offset to a Q7 vector. * @param[in] *pSrc points to the input vector * @param[in] offset is the offset to be added * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * Results outside of the allowable Q7 range [0x80 0x7F] are saturated. */ void arm_offset_q7( q7_t * pSrc, q7_t offset, q7_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t offset_packed; /* Offset packed to 32 bit */ /*loop Unrolling */ blkCnt = blockSize >> 2u; /* Offset is packed to 32 bit in order to use SIMD32 for addition */ (2) offset_packed = __PACKq7(offset, offset, offset, offset); /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the results in the destination bufferfor 4 samples at a time. */ *__SIMD32(pDst)++ = __QADD8(*__SIMD32(pSrc)++, offset_packed); (3) /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the result in the destination buffer. */ *pDst++ = (q7_t) __SSAT(*pSrc++ + offset, 8); /* Decrement the loop counter */ blkCnt--; } #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A + offset */ /* Add offset and then store the result in the destination buffer. */ *pDst++ = (q7_t) __SSAT((q15_t) * pSrc++ + offset, 8); /* Decrement the loop counter */ blkCnt--; } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 这个函数使用了饱和运算。 饱和运算数值0x80将变成0x7F。 2. 通过__PACKq7将4个Q7格式的数据合并成一个Q31格式的数据。 3. 由于__QADD8是SIMD指令,这里调用一次就能实现四个Q8格式数据的计算。 |
|
|
|
|
|
9.2.5 实例讲解
实验目的: 1. 四种类型数据的相反数。 实验内容: 1. 按下K2键, 串口打印输出结果 实验现象: 通过窗口上位机软件SecureCRT(V5光盘里面有此软件)查看打印信息现象如下: |
|
|
|
|
|
程序设计:
复制代码 /* ********************************************************************************************************* * 函 数 名: DSP_Offset * 功能说明: 偏移 * 形 参:无 * 返 回 值: 无 ********************************************************************************************************* */ static void DSP_Offset(void) { static float32_t pSrcA; static float32_t Offset = 0.0f; static float32_t pDst; static q31_t pSrcA1; static q31_t Offset1 = 0; static q31_t pDst1; static q15_t pSrcA2; static q15_t Offset2 = 0; static q15_t pDst2; static q7_t pSrcA3; static q7_t Offset3 = 0; static q7_t pDst3; Offset--; arm_offset_f32(&pSrcA, Offset, &pDst, 1); printf("arm_add_f32 = %frn", pDst); Offset1--; arm_offset_q31(&pSrcA1, Offset1, &pDst1, 1); printf("arm_add_q31 = %drn", pDst1); Offset2--; arm_offset_q15(&pSrcA2, Offset2, &pDst2, 1); printf("arm_add_q15 = %drn", pDst2); Offset3--; arm_offset_q7(&pSrcA3, Offset3, &pDst3, 1); printf("arm_add_q7 = %drn", pDst3); printf("***********************************rn"); } |
|
|
|
|
|
9.3 位移(Vector Shift)
这部分函数主要用于实现位移,公式描述如下: pDst[n] = pSrc[n] << shift, 0 <= n < blockSize. 注意,这部分函数支持目标指针和源指针指向相同的缓冲区。 9.3.1 arm_shift_q31 这个函数用于求32位定点数的位移,源代码分析如下: 复制代码 /** * @brief Shifts the elements of a Q31 vector a specified number of bits. * @param[in] *pSrc points to the input vector * @param[in] shiftBits number of bits to shift. * A positive value shifts left; a negative value shifts right. (1) * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * * Scaling and Overflow Behavior: (2) * par * The function uses saturating arithmetic. * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated. */ void arm_shift_q31( q31_t * pSrc, int8_t shiftBits, q31_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ uint8_t sign = (shiftBits & 0x80); /* Sign of shiftBits */ (3) #ifndef ARM_MATH_CM0_FAMILY q31_t in1, in2, in3, in4; /* Temporary input variables */ q31_t out1, out2, out3, out4; /* Temporary output variables */ /*loop Unrolling */ blkCnt = blockSize >> 2u; if(sign == 0u) (4) { /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A << shiftBits */ /* Shift the input and then store the results in the destination buffer. */ in1 = *pSrc; in2 = *(pSrc + 1); out1 = in1 << shiftBits; in3 = *(pSrc + 2); out2 = in2 << shiftBits; in4 = *(pSrc + 3); if(in1 != (out1 >> shiftBits)) (5) out1 = 0x7FFFFFFF ^ (in1 >> 31); if(in2 != (out2 >> shiftBits)) out2 = 0x7FFFFFFF ^ (in2 >> 31); *pDst = out1; out3 = in3 << shiftBits; *(pDst + 1) = out2; out4 = in4 << shiftBits; if(in3 != (out3 >> shiftBits)) out3 = 0x7FFFFFFF ^ (in3 >> 31); if(in4 != (out4 >> shiftBits)) out4 = 0x7FFFFFFF ^ (in4 >> 31); *(pDst + 2) = out3; *(pDst + 3) = out4; /* Update destination pointer to process next sampels */ pSrc += 4u; pDst += 4u; /* Decrement the loop counter */ blkCnt--; } } else (6) { /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A >> shiftBits */ /* Shift the input and then store the results in the destination buffer. */ in1 = *pSrc; in2 = *(pSrc + 1); in3 = *(pSrc + 2); in4 = *(pSrc + 3); *pDst = (in1 >> -shiftBits); (7) *(pDst + 1) = (in2 >> -shiftBits); *(pDst + 2) = (in3 >> -shiftBits); *(pDst + 3) = (in4 >> -shiftBits); pSrc += 4u; pDst += 4u; blkCnt--; } } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; #endif /* #ifndef ARM_MATH_CM0_FAMILY */ while(blkCnt > 0u) { /* C = A (>> or <<) shiftBits */ /* Shift the input and then store the result in the destination buffer. */ (8) *pDst++ = (sign == 0u) ? clip_q63_to_q31((q63_t) * pSrc++ << shiftBits) : (*pSrc++ >> -shiftBits); /* Decrement the loop counter */ blkCnt--; } } 1. 如果函数的参数shiftBits是正数那么表示左移,如果参数shiftBits是负数那么就是右移。 2. 这个函数使用了饱和运算。 饱和运算数值0x80000000将变成0x7FFFFFFF。 3. 获取偏移值shiftBits是正数还是负数。 4. 如果移位值是正数,那么就是左移。 5. 数值的左移仅支持将其左移后再右移相应的位数后数值不变的情况,如果不满足这个条件,那么输出结果只有两种结果(这里就是实现输出结果的饱和运算)。 out = 0x7FFFFFFF & 0xFFFFFFFF =0x80000000 out = 0x7FFFFFFF & 0x0000000 =0x7FFFFFFF 6. 如果移位值是负数,那么就是右移。 7. 将偏移值取反然后左移即可。 8. 用于实现剩余数值偏移的计算。 |
|
|
|
|
|
9.3.2 arm_shift_q15
这个函数用于求16位定点数的位移,源代码分析如下: 复制代码 /** * @brief Shifts the elements of a Q15 vector a specified number of bits. * @param[in] *pSrc points to the input vector * @param[in] shiftBits number of bits to shift. * A positive value shifts left; a negative value shifts right. (1) * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * Scaling and Overflow Behavior: (2) * par * The function uses saturating arithmetic. * Results outside of the allowable Q15 range [0x8000 0x7FFF] will be saturated. */ void arm_shift_q15( q15_t * pSrc, int8_t shiftBits, q15_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ uint8_t sign; /* Sign of shiftBits */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q15_t in1, in2; /* Temporary variables */ /*loop Unrolling */ blkCnt = blockSize >> 2u; /* Getting the sign of shiftBits */ sign = (shiftBits & 0x80); (3) /* If the shift value is positive then do right shift else left shift */ if(sign == 0u) { /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* Read 2 inputs */ in1 = *pSrc++; in2 = *pSrc++; /* C = A << shiftBits */ /* Shift the inputs and then store the results in the destination buffer. */ #ifndef ARM_MATH_BIG_ENDIAN *__SIMD32(pDst)++ = __PKHBT(__SSAT((in1 << shiftBits), 16), __SSAT((in2 << shiftBits), 16), 16); #else *__SIMD32(pDst)++ = __PKHBT(__SSAT((in2 << shiftBits), 16), (4) __SSAT((in1 << shiftBits), 16), 16); #endif /* #ifndef ARM_MATH_BIG_ENDIAN */ in1 = *pSrc++; in2 = *pSrc++; #ifndef ARM_MATH_BIG_ENDIAN *__SIMD32(pDst)++ = __PKHBT(__SSAT((in1 << shiftBits), 16), __SSAT((in2 << shiftBits), 16), 16); #else *__SIMD32(pDst)++ = __PKHBT(__SSAT((in2 << shiftBits), 16), __SSAT((in1 << shiftBits), 16), 16); #endif /* #ifndef ARM_MATH_BIG_ENDIAN */ /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A << shiftBits */ /* Shift and then store the results in the destination buffer. */ *pDst++ = __SSAT((*pSrc++ << shiftBits), 16); (5) /* Decrement the loop counter */ blkCnt--; } } else (6) { /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* Read 2 inputs */ in1 = *pSrc++; in2 = *pSrc++; /* C = A >> shiftBits */ /* Shift the inputs and then store the results in the destination buffer. */ #ifndef ARM_MATH_BIG_ENDIAN *__SIMD32(pDst)++ = __PKHBT((in1 >> -shiftBits), (in2 >> -shiftBits), 16); #else *__SIMD32(pDst)++ = __PKHBT((in2 >> -shiftBits), (7) (in1 >> -shiftBits), 16); #endif /* #ifndef ARM_MATH_BIG_ENDIAN */ in1 = *pSrc++; in2 = *pSrc++; #ifndef ARM_MATH_BIG_ENDIAN *__SIMD32(pDst)++ = __PKHBT((in1 >> -shiftBits), (in2 >> -shiftBits), 16); #else *__SIMD32(pDst)++ = __PKHBT((in2 >> -shiftBits), (in1 >> -shiftBits), 16); #endif /* #ifndef ARM_MATH_BIG_ENDIAN */ /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A >> shiftBits */ /* Shift the inputs and then store the results in the destination buffer. */ *pDst++ = (*pSrc++ >> -shiftBits); /* Decrement the loop counter */ blkCnt--; } } #else /* Run the below code for Cortex-M0 */ /* Getting the sign of shiftBits */ sign = (shiftBits & 0x80); /* If the shift value is positive then do right shift else left shift */ if(sign == 0u) { /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A << shiftBits */ /* Shift and then store the results in the destination buffer. */ *pDst++ = __SSAT(((q31_t) * pSrc++ << shiftBits), 16); /* Decrement the loop counter */ blkCnt--; } } else { /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A >> shiftBits */ /* Shift the inputs and then store the results in the destination buffer. */ *pDst++ = (*pSrc++ >> -shiftBits); /* Decrement the loop counter */ blkCnt--; } } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 如果函数的参数shiftBits是正数那么表示左移,如果参数shiftBits是负数那么就是右移。 2. 这个函数使用了饱和运算。 饱和运算数值0x8000将变成0x7FFF。 3. 获取偏移值是正数还是负数。 4. 通过调用一次__PKHBT实现两个Q15格式数据的计算。 5. 剩余几个数值的计算。 6. 如果位移值为负数,那么就是右移。 7. 将位移值取反以后,通过调用一次__PKHBT实现两个Q15格式数据的计算。 |
|
|
|
|
|
9.3.3 arm_shift_q7
这个函数用于求8位定点数的位移,源代码分析如下: 复制代码 /** * @brief Shifts the elements of a Q7 vector a specified number of bits. * @param[in] *pSrc points to the input vector * @param[in] shiftBits number of bits to shift. * A positive value shifts left; a negative value shifts right. (1) * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in the vector * @return none. * * par Conditions for optimum performance * Input and output buffers should be aligned by 32-bit * * * Scaling and Overflow Behavior: (2) * par * The function uses saturating arithmetic. * Results outside of the allowable Q7 range [0x8 0x7F] will be saturated. */ void arm_shift_q7( q7_t * pSrc, int8_t shiftBits, q7_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ uint8_t sign; /* Sign of shiftBits */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q7_t in1; /* Input value1 */ q7_t in2; /* Input value2 */ q7_t in3; /* Input value3 */ q7_t in4; /* Input value4 */ /*loop Unrolling */ blkCnt = blockSize >> 2u; /* Getting the sign of shiftBits */ sign = (shiftBits & 0x80); (3) /* If the shift value is positive then do right shift else left shift */ if(sign == 0u) { /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A << shiftBits */ /* Read 4 inputs */ in1 = *pSrc; in2 = *(pSrc + 1); in3 = *(pSrc + 2); in4 = *(pSrc + 3); (4) /* Store the Shifted result in the destination buffer in single cycle by packing the outputs */ *__SIMD32(pDst)++ = __PACKq7(__SSAT((in1 << shiftBits), 8), __SSAT((in2 << shiftBits), 8), __SSAT((in3 << shiftBits), 8), __SSAT((in4 << shiftBits), 8)); /* Update source pointer to process next sampels */ pSrc += 4u; /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A << shiftBits */ (5) /* Shift the input and then store the result in the destination buffer. */ *pDst++ = (q7_t) __SSAT((*pSrc++ << shiftBits), 8); /* Decrement the loop counter */ blkCnt--; } } else (6) { shiftBits = -shiftBits; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A >> shiftBits */ /* Read 4 inputs */ in1 = *pSrc; in2 = *(pSrc + 1); in3 = *(pSrc + 2); in4 = *(pSrc + 3); /* Store the Shifted result in the destination buffer in single cycle by packing the outputs */ *__SIMD32(pDst)++ = __PACKq7((in1 >> shiftBits), (in2 >> shiftBits), (in3 >> shiftBits), (in4 >> shiftBits)); pSrc += 4u; /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A >> shiftBits */ /* Shift the input and then store the result in the destination buffer. */ in1 = *pSrc++; *pDst++ = (in1 >> shiftBits); /* Decrement the loop counter */ blkCnt--; } } #else /* Run the below code for Cortex-M0 */ /* Getting the sign of shiftBits */ sign = (shiftBits & 0x80); /* If the shift value is positive then do right shift else left shift */ if(sign == 0u) { /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A << shiftBits */ /* Shift the input and then store the result in the destination buffer. */ *pDst++ = (q7_t) __SSAT(((q15_t) * pSrc++ << shiftBits), 8); /* Decrement the loop counter */ blkCnt--; } } else { /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A >> shiftBits */ /* Shift the input and then store the result in the destination buffer. */ *pDst++ = (*pSrc++ >> -shiftBits); /* Decrement the loop counter */ blkCnt--; } } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 如果函数的参数shiftBits是正数那么表示左移,如果参数shiftBits是负数那么就是右移。 2. 这个函数使用了饱和运算。 饱和运算数值0x80将变成0x7F。 3. 获取偏移值是正数还是负数。 4. 通过调用一次__PACKq7实现四个Q7格式数据的位移。 5. 剩余几不足4个数据的位移求解。 6. 如果移位值是负数,那么就是右移。 |
|
|
|
|
|
9.3.4 实例讲解
实验目的: 1. 三种类型数据的位移。 实验内容: 1. 按下K3键, 串口打印输出结果 实验现象: 通过窗口上位机软件SecureCRT(V5光盘里面有此软件)查看打印信息现象如下: |
|
|
|
|
|
程序设计:
复制代码 /* ********************************************************************************************************* * 函 数 名: DSP_Shift * 功能说明: 位移 * 形 参:无 * 返 回 值: 无 ********************************************************************************************************* */ static void DSP_Shift(void) { static q31_t pSrcA1 = 0x88886666; static q31_t pDst1; static q15_t pSrcA2 = 0x8866; static q15_t pDst2; static q7_t pSrcA3 = 0x86; static q7_t pDst3; arm_shift_q31(&pSrcA1, 3, &pDst1, 1); printf("arm_shift_q31 = %8xrn", pDst1); arm_shift_q15(&pSrcA2, -3, &pDst2, 1); printf("arm_shift_q15 = %4xrn", pDst2); arm_shift_q7(&pSrcA3, 3, &pDst3, 1); printf("arm_shift_q7 = %2xrn", pDst3); printf("***********************************rn"); } |
|
|
|
|
|
9.4 减法(Vector Sub)
这部分函数主要用于实现减法,公式描述如下: pDst[n] = pSrcA[n] - pSrcB[n], 0 <= n < blockSize. 9.4.1 arm_sub_f32 这个函数用于求32位浮点数的减法,源代码分析如下: 复制代码 /** * @brief Floating-point vector subtraction. * @param[in] *pSrcA points to the first input vector * @param[in] *pSrcB points to the second input vector * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in each vector * @return none. */ void arm_sub_f32( float32_t * pSrcA, float32_t * pSrcB, float32_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ float32_t inA1, inA2, inA3, inA4; /* temporary variables */ float32_t inB1, inB2, inB3, inB4; /* temporary variables */ /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the results in the destination buffer. */ /* Read 4 input samples from sourceA and sourceB */ inA1 = *pSrcA; inB1 = *pSrcB; inA2 = *(pSrcA + 1); inB2 = *(pSrcB + 1); inA3 = *(pSrcA + 2); inB3 = *(pSrcB + 2); inA4 = *(pSrcA + 3); inB4 = *(pSrcB + 3); /* dst = srcA - srcB */ /* subtract and store the result */ (1) *pDst = inA1 - inB1; *(pDst + 1) = inA2 - inB2; *(pDst + 2) = inA3 - inB3; *(pDst + 3) = inA4 - inB4; /* Update pointers to process next sampels */ pSrcA += 4u; pSrcB += 4u; pDst += 4u; /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; #endif /* #ifndef ARM_MATH_CM0_FAMILY */ while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the results in the destination buffer. */ *pDst++ = (*pSrcA++) - (*pSrcB++); /* Decrement the loop counter */ blkCnt--; } } 1. 浮点数的减法运算比较简单,直接两个数值相减即可。 |
|
|
|
|
|
9.4.2 arm_sub_q31
这个函数用于求32位定点数的减法,源代码分析如下: 复制代码 /** * @brief Q31 vector subtraction. * @param[in] *pSrcA points to the first input vector * @param[in] *pSrcB points to the second input vector * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in each vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated. */ void arm_sub_q31( q31_t * pSrcA, q31_t * pSrcB, q31_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t inA1, inA2, inA3, inA4; q31_t inB1, inB2, inB3, inB4; /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the results in the destination buffer. */ inA1 = *pSrcA++; inA2 = *pSrcA++; inB1 = *pSrcB++; inB2 = *pSrcB++; inA3 = *pSrcA++; inA4 = *pSrcA++; inB3 = *pSrcB++; inB4 = *pSrcB++; *pDst++ = __QSUB(inA1, inB1); (2) *pDst++ = __QSUB(inA2, inB2); *pDst++ = __QSUB(inA3, inB3); *pDst++ = __QSUB(inA4, inB4); /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the result in the destination buffer. */ *pDst++ = __QSUB(*pSrcA++, *pSrcB++); /* Decrement the loop counter */ blkCnt--; } #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the result in the destination buffer. */ *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrcA++ - *pSrcB++); /* Decrement the loop counter */ blkCnt--; } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 这个函数使用了饱和运算。 饱和运算数值0x80000000将变成0x7FFFFFFF。 2. __QSUB也是SIMD指令,这里可以用这个指令实现两个Q31格式数据的饱和减法。 |
|
|
|
|
|
9.4.2 arm_sub_q31
这个函数用于求32位定点数的减法,源代码分析如下: 复制代码 /** * @brief Q31 vector subtraction. * @param[in] *pSrcA points to the first input vector * @param[in] *pSrcB points to the second input vector * @param[out] *pDst points to the output vector * @param[in] blockSize number of samples in each vector * @return none. * * Scaling and Overflow Behavior: (1) * par * The function uses saturating arithmetic. * Results outside of the allowable Q31 range [0x80000000 0x7FFFFFFF] will be saturated. */ void arm_sub_q31( q31_t * pSrcA, q31_t * pSrcB, q31_t * pDst, uint32_t blockSize) { uint32_t blkCnt; /* loop counter */ #ifndef ARM_MATH_CM0_FAMILY /* Run the below code for Cortex-M4 and Cortex-M3 */ q31_t inA1, inA2, inA3, inA4; q31_t inB1, inB2, inB3, inB4; /*loop Unrolling */ blkCnt = blockSize >> 2u; /* First part of the processing with loop unrolling. Compute 4 outputs at a time. ** a second loop below computes the remaining 1 to 3 samples. */ while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the results in the destination buffer. */ inA1 = *pSrcA++; inA2 = *pSrcA++; inB1 = *pSrcB++; inB2 = *pSrcB++; inA3 = *pSrcA++; inA4 = *pSrcA++; inB3 = *pSrcB++; inB4 = *pSrcB++; *pDst++ = __QSUB(inA1, inB1); (2) *pDst++ = __QSUB(inA2, inB2); *pDst++ = __QSUB(inA3, inB3); *pDst++ = __QSUB(inA4, inB4); /* Decrement the loop counter */ blkCnt--; } /* If the blockSize is not a multiple of 4, compute any remaining output samples here. ** No loop unrolling is used. */ blkCnt = blockSize % 0x4u; while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the result in the destination buffer. */ *pDst++ = __QSUB(*pSrcA++, *pSrcB++); /* Decrement the loop counter */ blkCnt--; } #else /* Run the below code for Cortex-M0 */ /* Initialize blkCnt with number of samples */ blkCnt = blockSize; while(blkCnt > 0u) { /* C = A - B */ /* Subtract and then store the result in the destination buffer. */ *pDst++ = (q31_t) clip_q63_to_q31((q63_t) * pSrcA++ - *pSrcB++); /* Decrement the loop counter */ blkCnt--; } #endif /* #ifndef ARM_MATH_CM0_FAMILY */ } 1. 这个函数使用了饱和运算。 饱和运算数值0x80000000将变成0x7FFFFFFF。 2. __QSUB也是SIMD指令,这里可以用这个指令实现两个Q31格式数据的饱和减法。 |
|
|
|
|
|
386 浏览 0 评论
求助一下关于51系列单片机的Timer0的计时问题,TH0、TL0+1的时间是怎么算的?
1670 浏览 1 评论
【RA-Eco-RA4E2-64PIN-V1.0开发板试用】开箱+Keil环境搭建+点灯+点亮OLED
1123 浏览 0 评论
【敏矽微ME32G070开发板免费体验】使用coremark测试敏矽微ME32G070 跑分
1005 浏览 0 评论
【敏矽微ME32G070开发板免费体验】开箱+点灯+点亮OLED
1229 浏览 2 评论
【youyeetoo X1 windows 开发板体验】少儿AI智能STEAM积木平台
12013 浏览 31 评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2024-12-23 08:53 , Processed in 0.960238 second(s), Total 70, Slave 62 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (电路图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号