Skip to content
Commit b0d00020 authored by B Krishnan Iyer's avatar B Krishnan Iyer Committed by Martin Storsjö
Browse files

arm: mc: Speed up due to memory alignment in ldr/str instructions

blend/blend_h/blend_v:

Before:               Cortex A7      A8      A9     A53     A72     A73
blend_h_w2_8bpc_neon:     169.5   194.2   153.1   134.0    63.0    72.6
blend_h_w4_8bpc_neon:     164.4   171.8   142.2   137.8    60.5    60.2
blend_h_w8_8bpc_neon:     184.8   121.0   146.5   123.4    55.9    63.1
blend_h_w16_8bpc_neon:    291.0   178.6   237.3   181.0    88.6    83.9
blend_h_w32_8bpc_neon:    531.9   321.5   432.2   358.3   155.6   156.2
blend_h_w64_8bpc_neon:    957.6   600.3   827.4   631.2   279.7   268.4
blend_h_w128_8bpc_neon:  2161.5  1398.4  1931.8  1403.4   607.0   597.9
blend_v_w2_8bpc_neon:     249.3   373.4   269.2   195.6   107.9   117.6
blend_v_w4_8bpc_neon:     451.7   676.1   555.3   376.1   198.6   266.9
blend_v_w8_8bpc_neon:     561.0   475.2   607.6   357.0   213.9   204.1
blend_v_w16_8bpc_neon:    928.4   626.8   823.8   592.3   269.9   245.3
blend_v_w32_8bpc_neon:   1477.6  1024.8  1186.6   994.5   346.6   370.0
blend_w4_8bpc_neon:       103.3   113.0    86.2    91.5    38.6    35.2
blend_w8_8bpc_neon:       174.9   116.6   137.1   123.1    50.8    55.0
blend_w16_8bpc_neon:      533.0   334.3   446.6   348.6   150.7   155.4
blend_w32_8bpc_neon:     1299.2   836.8  1170.7   909.9   370.5   386.3

After:
blend_h_w2_8bpc_neon:     169.6   169.8   140.9   134.0    62.3    72.5
blend_h_w4_8bpc_neon:     164.5   149.1   127.6   137.7    59.1    60.1
blend_h_w8_8bpc_neon:     184.9   102.7   126.3   123.4    54.9    63.2
blend_h_w16_8bpc_neon:    291.0   163.8   232.1   180.9    88.4    83.9
blend_h_w32_8bpc_neon:    531.2   285.6   422.6   358.4   155.5   155.9
blend_h_w64_8bpc_neon:    956.0   541.9   809.9   631.6   280.0   270.6
blend_h_w128_8bpc_neon:  2159.0  1253.6  1889.0  1404.8   606.2   600.5
blend_v_w2_8bpc_neon:     249.9   362.0   269.4   195.6   107.8   117.6
blend_v_w4_8bpc_neon:     452.6   541.6   538.2   376.1   199.5   266.9
blend_v_w8_8bpc_neon:     561.0   348.9   551.3   357.7   214.3   204.4
blend_v_w16_8bpc_neon:    926.8   510.9   785.0   592.1   270.7   245.8
blend_v_w32_8bpc_neon:   1474.4   913.3  1151.4   995.7   347.5   371.2
blend_w4_8bpc_neon:       103.3    96.6    76.9    91.5    33.7    35.3
blend_w8_8bpc_neon:       174.9    88.2   114.8   123.1    51.5    55.0
blend_w16_8bpc_neon:      532.8   282.2   445.3   348.5   149.8   155.7
blend_w32_8bpc_neon:     1295.1   735.2  1122.8   908.4   372.0   386.5

w_mask_444/422/420:

Before:                    Cortex A7        A8        A9       A53       A72      A73
w_mask_420_w4_8bpc_neon:       218.1     144.4     187.3     152.7      86.9     89.0
w_mask_420_w8_8bpc_neon:       544.0     393.7     437.0     372.5     211.1    230.9
w_mask_420_w16_8bpc_neon:     1537.2    1063.5    1182.3    1024.3     566.4    667.7
w_mask_420_w32_8bpc_neon:     5734.7    4207.2    4716.8    3822.8    2340.5   2521.3
w_mask_420_w64_8bpc_neon:    14317.6   10165.0   13220.2    9578.5    5578.9   5989.9
w_mask_420_w128_8bpc_neon:   37932.8   25299.1   39562.9   25203.8   14916.4  15465.1
w_mask_422_w4_8bpc_neon:       206.8     141.4     177.9     143.4      82.1     84.8
w_mask_422_w8_8bpc_neon:       511.8     380.8     416.7     342.5     198.5    221.7
w_mask_422_w16_8bpc_neon:     1632.8    1154.4    1282.9    1061.2     595.3    684.9
w_mask_422_w32_8bpc_neon:     6087.8    4560.3    5173.3    3945.8    2319.1   2608.7
w_mask_422_w64_8bpc_neon:    15183.7   11013.9   14435.6    9904.6    5449.9   6100.9
w_mask_422_w128_8bpc_neon:   39951.2   27441.0   42398.2   25995.1   14624.9  15529.2
w_mask_444_w4_8bpc_neon:       193.4     127.0     170.0     135.4      76.8     81.4
w_mask_444_w8_8bpc_neon:       477.8     340.0     427.9     319.3     187.2    214.7
w_mask_444_w16_8bpc_neon:     1529.0    1058.8    1209.4     987.0     571.7    677.3
w_mask_444_w32_8bpc_neon:     5687.9    4166.9    4882.4    3667.0    2286.8   2518.7
w_mask_444_w64_8bpc_neon:    14394.7   10055.1   14057.9    9372.0    5369.3   5898.7
w_mask_444_w128_8bpc_neon:   37952.0   25008.8   42169.9   24988.8   22973.7  15241.1

After:
w_mask_420_w4_8bpc_neon:       219.7     120.7     178.0     152.7      87.2     89.0
w_mask_420_w8_8bpc_neon:       547.5     355.2     404.4     372.4     211.4    231.0
w_mask_420_w16_8bpc_neon:     1540.9     987.1    1113.0    1024.9     567.4    669.5
w_mask_420_w32_8bpc_neon:     5915.4    3905.8    4516.8    3929.3    2363.7   2523.6
w_mask_420_w64_8bpc_neon:    14860.9    9437.1   12609.7    9586.4    5627.3   6005.8
w_mask_420_w128_8bpc_neon:   38799.1   23536.1   38598.3   24787.7   14595.7  15474.9
w_mask_422_w4_8bpc_neon:       208.3     115.4     168.6     143.4      82.4     84.8
w_mask_422_w8_8bpc_neon:       515.2     335.7     383.2     342.5     198.9    221.8
w_mask_422_w16_8bpc_neon:     1643.2    1053.6    1199.3    1062.2     595.6    685.7
w_mask_422_w32_8bpc_neon:     6335.1    4161.0    4959.3    4088.5    2353.0   2606.4
w_mask_422_w64_8bpc_neon:    15689.4   10039.8   13806.1    9937.7    5535.3   6099.8
w_mask_422_w128_8bpc_neon:   40754.4   25033.3   41390.5   25683.7   14668.8  15537.1
w_mask_444_w4_8bpc_neon:       194.9     107.4     162.0     135.4      77.1     81.4
w_mask_444_w8_8bpc_neon:       481.1     300.2     422.0     319.1     187.6    214.6
w_mask_444_w16_8bpc_neon:     1542.6     956.1    1137.7     988.4     572.4    677.5
w_mask_444_w32_8bpc_neon:     5896.1    3766.1    4731.9    3801.2    2322.9   2521.8
w_mask_444_w64_8bpc_neon:    14814.0    9084.7   13515.4    9311.0    5497.3   5896.3
w_mask_444_w128_8bpc_neon:   38587.7   22615.2   41389.9   24639.4   17705.8  15244.3
parent 2ef970a8
Loading
Loading
Loading
Pipeline #8785 passed with stages
in 8 minutes and 14 seconds
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment