1. 29 Aug, 2019 3 commits
  2. 28 Aug, 2019 3 commits
  3. 23 Aug, 2019 2 commits
  4. 21 Aug, 2019 1 commit
  5. 18 Aug, 2019 1 commit
  6. 15 Aug, 2019 1 commit
    • B Krishnan Iyer's avatar
      arm64: mc: NEON implementation of w_mask_444/422/420 function · 3d94fb9a
      B Krishnan Iyer authored
      	                        A73	        A53
      
      w_mask_420_w4_8bpc_c:	        818	        1082.9
      w_mask_420_w4_8bpc_neon:	79	        126.6
      w_mask_420_w8_8bpc_c:	        2486	        3399.8
      w_mask_420_w8_8bpc_neon:	200.2	        343.7
      w_mask_420_w16_8bpc_c:	        8022.3	        10989.6
      w_mask_420_w16_8bpc_neon:	528.1   	889
      w_mask_420_w32_8bpc_c:	        31851.8	        42808.6
      w_mask_420_w32_8bpc_neon:	2062.5	        3380.8
      w_mask_420_w64_8bpc_c:	        79268.5	        102683.9
      w_mask_420_w64_8bpc_neon:	5252.9	        8575.4
      w_mask_420_w128_8bpc_c:	        193704.1	255586.5
      w_mask_420_w128_8bpc_neon:	14602.3	        22167.7
      
      w_mask_422_w4_8bpc_c:	        777.3	        1038.5
      w_mask_422_w4_8bpc_neon:	72.1	        112.9
      w_mask_422_w8_8bpc_c:	        2405.7	        3168
      w_mask_422_w8_8bpc_neon:	191.9	        314.1
      w_mask_422_w16_8bpc_c:	        7783.7	        10543.9
      w_mask_422_w16_8bpc_neon:	559.8	        835.5
      w_mask_422_w32_8bpc_c:	        30895.7	        41141.2
      w_mask_422_w32_8bpc_neon:	2089.7	        3187.2
      w_mask_422_w64_8bpc_c:	        75500.2	        98766.3
      w_mask_422_w64_8bpc_neon:	5379	        8208.2
      w_mask_422_w128_8bpc_c:	        186967.1	245809.1
      w_mask_422_w128_8bpc_neon:	15159.9	        21474.5
      
      w_mask_444_w4_8bpc_c:	        850.1	        1136.6
      w_mask_444_w4_8bpc_neon:	66.5	        104.7
      w_mask_444_w8_8bpc_c:	        2373.5	        3262.9
      w_mask_444_w8_8bpc_neon:	180.5	        290.2
      w_mask_444_w16_8bpc_c:	        7291.6	        10590.7
      w_mask_444_w16_8bpc_neon:	550.9	        809.7
      w_mask_444_w32_8bpc_c:	        8048.3	        10140.8
      w_mask_444_w32_8bpc_neon:	2136.2	        3095
      w_mask_444_w64_8bpc_c:	        18055.3	        23060
      w_mask_444_w64_8bpc_neon:	5522.5	        8124.8
      w_mask_444_w128_8bpc_c:	        42754.3	        56072
      w_mask_444_w128_8bpc_neon:	15569.5	        21531.5
      3d94fb9a
  7. 14 Aug, 2019 2 commits
    • B Krishnan Iyer's avatar
      arm64: mc: NEON implementation of blend, blend_h and blend_v function · 1dc2dc7d
      B Krishnan Iyer authored
                         	A73	A53
      blend_h_w2_8bpc_c:	184.7	301.5
      blend_h_w2_8bpc_neon:	58.8	104.1
      blend_h_w4_8bpc_c:	291.4	507.3
      blend_h_w4_8bpc_neon:	48.7	108.9
      blend_h_w8_8bpc_c:	510.1	992.7
      blend_h_w8_8bpc_neon:	66.5	99.3
      blend_h_w16_8bpc_c:	972	1835.3
      blend_h_w16_8bpc_neon:	82.7	145.2
      blend_h_w32_8bpc_c:	776.7	912.9
      blend_h_w32_8bpc_neon:	155.1	266.9
      blend_h_w64_8bpc_c:	1424.3	1635.4
      blend_h_w64_8bpc_neon:	273.4	480.9
      blend_h_w128_8bpc_c:	3318.1	3774
      blend_h_w128_8bpc_neon:	614.1	1097.9
      blend_v_w2_8bpc_c:	278.8	427.5
      blend_v_w2_8bpc_neon:	113.7	170.4
      blend_v_w4_8bpc_c:	960.2	1597.7
      blend_v_w4_8bpc_neon:	222.9	351.4
      blend_v_w8_8bpc_c:	1694.2	3333.5
      blend_v_w8_8bpc_neon:	200.9	333.6
      blend_v_w16_8bpc_c:	3115.2	5971.6
      blend_v_w16_8bpc_neon:	233.2	494.8
      blend_v_w32_8bpc_c:	3949.7	6070.6
      blend_v_w32_8bpc_neon:	460.4	841.6
      blend_w4_8bpc_c:	244.2	388.3
      blend_w4_8bpc_neon:	25.5	66.7
      blend_w8_8bpc_c:	616.3	1120.8
      blend_w8_8bpc_neon:	46	110.7
      blend_w16_8bpc_c:	2193.1	4056.4
      blend_w16_8bpc_neon:	140.7	299.3
      blend_w32_8bpc_c:	2502.8	2998.5
      blend_w32_8bpc_neon:	381.4	725.3
      1dc2dc7d
    • Michael Bradshaw's avatar
      d20d70e8
  8. 13 Aug, 2019 4 commits
  9. 10 Aug, 2019 2 commits
  10. 09 Aug, 2019 3 commits
  11. 08 Aug, 2019 5 commits
    • Henrik Gramner's avatar
      Avoid CDF overreads in gather_top_partition_prob() · d8799d94
      Henrik Gramner authored
      Explicitly take advantage of the fact that certain probabilities are zero
      instead of loading zeros from the CDF padding.
      
      The current code works just fine, but only because those values happen to
      be zero due to what is essentially an implementation detail.
      d8799d94
    • Henrik Gramner's avatar
      Set thread names on MacOS · fa32f2de
      Henrik Gramner authored
      fa32f2de
    • Henrik Gramner's avatar
      Set thread names on Windows 10 · 6c3e85de
      Henrik Gramner authored
      6c3e85de
    • B Krishnan Iyer's avatar
      arm: mc: Speed up due to memory alignment in ldr/str instructions · b0d00020
      B Krishnan Iyer authored
      blend/blend_h/blend_v:
      
      Before:               Cortex A7      A8      A9     A53     A72     A73
      blend_h_w2_8bpc_neon:     169.5   194.2   153.1   134.0    63.0    72.6
      blend_h_w4_8bpc_neon:     164.4   171.8   142.2   137.8    60.5    60.2
      blend_h_w8_8bpc_neon:     184.8   121.0   146.5   123.4    55.9    63.1
      blend_h_w16_8bpc_neon:    291.0   178.6   237.3   181.0    88.6    83.9
      blend_h_w32_8bpc_neon:    531.9   321.5   432.2   358.3   155.6   156.2
      blend_h_w64_8bpc_neon:    957.6   600.3   827.4   631.2   279.7   268.4
      blend_h_w128_8bpc_neon:  2161.5  1398.4  1931.8  1403.4   607.0   597.9
      blend_v_w2_8bpc_neon:     249.3   373.4   269.2   195.6   107.9   117.6
      blend_v_w4_8bpc_neon:     451.7   676.1   555.3   376.1   198.6   266.9
      blend_v_w8_8bpc_neon:     561.0   475.2   607.6   357.0   213.9   204.1
      blend_v_w16_8bpc_neon:    928.4   626.8   823.8   592.3   269.9   245.3
      blend_v_w32_8bpc_neon:   1477.6  1024.8  1186.6   994.5   346.6   370.0
      blend_w4_8bpc_neon:       103.3   113.0    86.2    91.5    38.6    35.2
      blend_w8_8bpc_neon:       174.9   116.6   137.1   123.1    50.8    55.0
      blend_w16_8bpc_neon:      533.0   334.3   446.6   348.6   150.7   155.4
      blend_w32_8bpc_neon:     1299.2   836.8  1170.7   909.9   370.5   386.3
      
      After:
      blend_h_w2_8bpc_neon:     169.6   169.8   140.9   134.0    62.3    72.5
      blend_h_w4_8bpc_neon:     164.5   149.1   127.6   137.7    59.1    60.1
      blend_h_w8_8bpc_neon:     184.9   102.7   126.3   123.4    54.9    63.2
      blend_h_w16_8bpc_neon:    291.0   163.8   232.1   180.9    88.4    83.9
      blend_h_w32_8bpc_neon:    531.2   285.6   422.6   358.4   155.5   155.9
      blend_h_w64_8bpc_neon:    956.0   541.9   809.9   631.6   280.0   270.6
      blend_h_w128_8bpc_neon:  2159.0  1253.6  1889.0  1404.8   606.2   600.5
      blend_v_w2_8bpc_neon:     249.9   362.0   269.4   195.6   107.8   117.6
      blend_v_w4_8bpc_neon:     452.6   541.6   538.2   376.1   199.5   266.9
      blend_v_w8_8bpc_neon:     561.0   348.9   551.3   357.7   214.3   204.4
      blend_v_w16_8bpc_neon:    926.8   510.9   785.0   592.1   270.7   245.8
      blend_v_w32_8bpc_neon:   1474.4   913.3  1151.4   995.7   347.5   371.2
      blend_w4_8bpc_neon:       103.3    96.6    76.9    91.5    33.7    35.3
      blend_w8_8bpc_neon:       174.9    88.2   114.8   123.1    51.5    55.0
      blend_w16_8bpc_neon:      532.8   282.2   445.3   348.5   149.8   155.7
      blend_w32_8bpc_neon:     1295.1   735.2  1122.8   908.4   372.0   386.5
      
      w_mask_444/422/420:
      
      Before:                    Cortex A7        A8        A9       A53       A72      A73
      w_mask_420_w4_8bpc_neon:       218.1     144.4     187.3     152.7      86.9     89.0
      w_mask_420_w8_8bpc_neon:       544.0     393.7     437.0     372.5     211.1    230.9
      w_mask_420_w16_8bpc_neon:     1537.2    1063.5    1182.3    1024.3     566.4    667.7
      w_mask_420_w32_8bpc_neon:     5734.7    4207.2    4716.8    3822.8    2340.5   2521.3
      w_mask_420_w64_8bpc_neon:    14317.6   10165.0   13220.2    9578.5    5578.9   5989.9
      w_mask_420_w128_8bpc_neon:   37932.8   25299.1   39562.9   25203.8   14916.4  15465.1
      w_mask_422_w4_8bpc_neon:       206.8     141.4     177.9     143.4      82.1     84.8
      w_mask_422_w8_8bpc_neon:       511.8     380.8     416.7     342.5     198.5    221.7
      w_mask_422_w16_8bpc_neon:     1632.8    1154.4    1282.9    1061.2     595.3    684.9
      w_mask_422_w32_8bpc_neon:     6087.8    4560.3    5173.3    3945.8    2319.1   2608.7
      w_mask_422_w64_8bpc_neon:    15183.7   11013.9   14435.6    9904.6    5449.9   6100.9
      w_mask_422_w128_8bpc_neon:   39951.2   27441.0   42398.2   25995.1   14624.9  15529.2
      w_mask_444_w4_8bpc_neon:       193.4     127.0     170.0     135.4      76.8     81.4
      w_mask_444_w8_8bpc_neon:       477.8     340.0     427.9     319.3     187.2    214.7
      w_mask_444_w16_8bpc_neon:     1529.0    1058.8    1209.4     987.0     571.7    677.3
      w_mask_444_w32_8bpc_neon:     5687.9    4166.9    4882.4    3667.0    2286.8   2518.7
      w_mask_444_w64_8bpc_neon:    14394.7   10055.1   14057.9    9372.0    5369.3   5898.7
      w_mask_444_w128_8bpc_neon:   37952.0   25008.8   42169.9   24988.8   22973.7  15241.1
      
      After:
      w_mask_420_w4_8bpc_neon:       219.7     120.7     178.0     152.7      87.2     89.0
      w_mask_420_w8_8bpc_neon:       547.5     355.2     404.4     372.4     211.4    231.0
      w_mask_420_w16_8bpc_neon:     1540.9     987.1    1113.0    1024.9     567.4    669.5
      w_mask_420_w32_8bpc_neon:     5915.4    3905.8    4516.8    3929.3    2363.7   2523.6
      w_mask_420_w64_8bpc_neon:    14860.9    9437.1   12609.7    9586.4    5627.3   6005.8
      w_mask_420_w128_8bpc_neon:   38799.1   23536.1   38598.3   24787.7   14595.7  15474.9
      w_mask_422_w4_8bpc_neon:       208.3     115.4     168.6     143.4      82.4     84.8
      w_mask_422_w8_8bpc_neon:       515.2     335.7     383.2     342.5     198.9    221.8
      w_mask_422_w16_8bpc_neon:     1643.2    1053.6    1199.3    1062.2     595.6    685.7
      w_mask_422_w32_8bpc_neon:     6335.1    4161.0    4959.3    4088.5    2353.0   2606.4
      w_mask_422_w64_8bpc_neon:    15689.4   10039.8   13806.1    9937.7    5535.3   6099.8
      w_mask_422_w128_8bpc_neon:   40754.4   25033.3   41390.5   25683.7   14668.8  15537.1
      w_mask_444_w4_8bpc_neon:       194.9     107.4     162.0     135.4      77.1     81.4
      w_mask_444_w8_8bpc_neon:       481.1     300.2     422.0     319.1     187.6    214.6
      w_mask_444_w16_8bpc_neon:     1542.6     956.1    1137.7     988.4     572.4    677.5
      w_mask_444_w32_8bpc_neon:     5896.1    3766.1    4731.9    3801.2    2322.9   2521.8
      w_mask_444_w64_8bpc_neon:    14814.0    9084.7   13515.4    9311.0    5497.3   5896.3
      w_mask_444_w128_8bpc_neon:   38587.7   22615.2   41389.9   24639.4   17705.8  15244.3
      b0d00020
    • Martin Storsjö's avatar
  12. 07 Aug, 2019 1 commit
  13. 02 Aug, 2019 2 commits
  14. 28 Jul, 2019 1 commit
  15. 27 Jul, 2019 5 commits
  16. 25 Jul, 2019 1 commit
  17. 23 Jul, 2019 3 commits
    • B Krishnan Iyer's avatar
      arm: mc: neon: Merge load and other related operations in blend/blend_h/blend_v functions · 407c27db
      B Krishnan Iyer authored
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	71.1	74.1	132.7	137.5
      blend_h_w4_8bpc_neon:	60.2	65.8	137.5	147.1
      blend_h_w8_8bpc_neon:	62.2	68.9	123.1	131.7
      blend_h_w16_8bpc_neon:	82.1	86	180.7	190.3
      blend_h_w32_8bpc_neon:	149.9	149.2	358.3	358
      blend_h_w64_8bpc_neon:	265.3	263.1	630.2	629.8
      blend_h_w128_8bpc_neon:	579.5	571	1404.4	1404.5
      blend_v_w2_8bpc_neon:	118.7	118.7	193.2	195.3
      blend_v_w4_8bpc_neon:	248.6	245.8	373.4	357.3
      blend_v_w8_8bpc_neon:	202.7	202	356.4	357.2
      blend_v_w16_8bpc_neon:	238.8	234.8	590.4	591.3
      blend_v_w32_8bpc_neon:	346.7	344.4	993.7	994.7
      blend_w4_8bpc_neon:	33.5	37.5	90.7	96.7
      blend_w8_8bpc_neon:	49.7	53	123.3	123.3
      blend_w16_8bpc_neon:	151.8	151	348.8	332.4
      blend_w32_8bpc_neon:	372.9	370.9	908.3	908.4
      407c27db
    • B Krishnan Iyer's avatar
      arm: mc: neon: Reduce usage of general purpose registers in blend/blend_v functions · d4df8619
      B Krishnan Iyer authored
      	                	A73		A53
                      	Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.1	137.5	137.5
      blend_h_w4_8bpc_neon:	65.8	65.8	147.1	147.1
      blend_h_w8_8bpc_neon:	68.9	68.7	131.7	131.7
      blend_h_w16_8bpc_neon:	86	85.6	190.3	190.4
      blend_h_w32_8bpc_neon:	149.2	149.8	358	358.3
      blend_h_w64_8bpc_neon:	263.1	264.1	629.8	630.3
      blend_h_w128_8bpc_neon:	571	575.4	1404.5	1404.2
      blend_v_w2_8bpc_neon:	118.7	120.1	195.3	196.4
      blend_v_w4_8bpc_neon:	245.8	247.2	357.3	358.4
      blend_v_w8_8bpc_neon:	202	204.2	357.2	358.4
      blend_v_w16_8bpc_neon:	234.8	238.5	591.3	591.8
      blend_v_w32_8bpc_neon:	344.4	347.2	994.7	997.2
      blend_w4_8bpc_neon:	37.5	38.3	96.7	98.7
      blend_w8_8bpc_neon:	53	54.8	123.3	125.3
      blend_w16_8bpc_neon:	151	150.8	332.4	334.5
      blend_w32_8bpc_neon:	370.9	361.6	908.4	910.7
      d4df8619
    • B Krishnan Iyer's avatar
      arm: mc: neon: Use vld with ! post-increment instead of a register in... · b704a993
      B Krishnan Iyer authored
      arm: mc: neon: Use vld with ! post-increment instead of a register in blend/blend_h/blend_v function
      
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.6	137.5	137
      blend_h_w4_8bpc_neon:	65.8	66	147.1	146.6
      blend_h_w8_8bpc_neon:	68.7	68.6	131.7	131.2
      blend_h_w16_8bpc_neon:	85.6	85.9	190.4	192
      blend_h_w32_8bpc_neon:	149.8	149.8	358.3	357.6
      blend_h_w64_8bpc_neon:	264.1	262.8	630.3	629.5
      blend_h_w128_8bpc_neon:	575.4	577	1404.2	1402
      blend_v_w2_8bpc_neon:	120.1	121.3	196.4	195.5
      blend_v_w4_8bpc_neon:	247.2	247.5	358.4	358.5
      blend_v_w8_8bpc_neon:	204.2	205.2	358.4	358.5
      blend_v_w16_8bpc_neon:	238.5	237.1	591.8	590.5
      blend_v_w32_8bpc_neon:	347.2	345.8	997.2	994.1
      blend_w4_8bpc_neon:	38.3	38.6	98.7	99.2
      blend_w8_8bpc_neon:	54.8	55.1	125.3	125.8
      blend_w16_8bpc_neon:	150.8	150.1	334.5	344
      blend_w32_8bpc_neon:	361.6	360.4	910.7	910.9
      b704a993