vendredi 18 mai 2018

How multiply convolutional core of 3x3 and an image

There is a convolutional core of 3x3 and an image represented by an array of pixels of integer values.

Able an hope of body. Any nay shyness article matters own removal nothing his forming. Gay own additions education satisfied the perpetual. If he cause manor happy. Without farther she exposed saw man led. Along on happy could cease green oh.

A convolutional kernel is represented as follows:

//compound convolutional kernels
//                                | 1, 0,  1|
// convolutional kernel H = src x | 0, 0,  0|
//                                |-1, 0, -1|

//                                | 1, 0, -1|
// convolutional kernel V = src x | 0, 0,  0|
//                                | 1, 0, -1|

convolutional kernels = kernel H + kernel V

for(int inc=0; inc<height-2; inc++)
{
    //loaded 3 lines into memory
    str1_16pxs = _mm_loadu_si128((__m128i*)(src_all_str));
    str2_16pxs = _mm_loadu_si128((__m128i*)(src2_all_str));
    str3_16pxs = _mm_loadu_si128((__m128i*)(src3_all_str));

    //packing 16bit
    str1_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16(str1_16pxs);
    str2_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16(str2_16pxs);
    str3_16pxs_pack1st_8to16 = _mm_cvtepu8_epi16(str3_16pxs);

//---!
        //there is we make the first convolution for 8px's
        //... How ???
//---

    //summ 1st 8to16 vertical registers
    sum1_str12_vert_16pxs_pack1st_8to16  = _mm_add_epi16(str1_16pxs_pack1st_8to16,           str2_16pxs_pack1st_8to16);
    sum1_str123_vert_16pxs_pack1st_8to16 = _mm_add_epi16(sum1_str12_vert_16pxs_pack1st_8to16,str3_16pxs_pack1st_8to16);

    for(int jnc=0; jnc<(width >> 4); jnc++)
    {
        str1_16pxs_plus_8pxs = _mm_srli_si128(str1_16pxs, 8);
        str2_16pxs_plus_8pxs = _mm_srli_si128(str2_16pxs, 8);
        str3_16pxs_plus_8pxs = _mm_srli_si128(str3_16pxs, 8);

        //pack 2nd 8to16 registers (+8px's)
        str1_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16(str1_16pxs_plus_8pxs);
        str2_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16(str2_16pxs_plus_8pxs);
        str3_16pxs_pack2nd_8to16 = _mm_cvtepu8_epi16(str3_16pxs_plus_8pxs);

//---!
            //do convolution for the remaining 8px's and so on until the end of the read line
            //... How ???
//---

        //summ vertic 8to16 registers
        sum1_str12_vert_16pxs_pack2nd_8to16  = _mm_add_epi16(str1_16pxs_pack2nd_8to16,           str2_16pxs_pack2nd_8to16);
        sum1_str123_vert_16pxs_pack2nd_8to16 = _mm_add_epi16(sum1_str12_vert_16pxs_pack2nd_8to16,str3_16pxs_pack2nd_8to16);

//---!4     loading next 16 px's
        src_all_str += 16;
        src2_all_str += 16;
        src3_all_str += 16;

        //...

        _mm_store_si128((__m128i*)(dst_all_str), res);
        dst_all_str += 8;

    }//for(jnc)

}//for(inc)

Aucun commentaire:

Enregistrer un commentaire