1. 15 Aug, 2019 1 commit
    • B Krishnan Iyer's avatar
      arm64: mc: NEON implementation of w_mask_444/422/420 function · 3d94fb9a
      B Krishnan Iyer authored
      	                        A73	        A53
      
      w_mask_420_w4_8bpc_c:	        818	        1082.9
      w_mask_420_w4_8bpc_neon:	79	        126.6
      w_mask_420_w8_8bpc_c:	        2486	        3399.8
      w_mask_420_w8_8bpc_neon:	200.2	        343.7
      w_mask_420_w16_8bpc_c:	        8022.3	        10989.6
      w_mask_420_w16_8bpc_neon:	528.1   	889
      w_mask_420_w32_8bpc_c:	        31851.8	        42808.6
      w_mask_420_w32_8bpc_neon:	2062.5	        3380.8
      w_mask_420_w64_8bpc_c:	        79268.5	        102683.9
      w_mask_420_w64_8bpc_neon:	5252.9	        8575.4
      w_mask_420_w128_8bpc_c:	        193704.1	255586.5
      w_mask_420_w128_8bpc_neon:	14602.3	        22167.7
      
      w_mask_422_w4_8bpc_c:	        777.3	        1038.5
      w_mask_422_w4_8bpc_neon:	72.1	        112.9
      w_mask_422_w8_8bpc_c:	        2405.7	        3168
      w_mask_422_w8_8bpc_neon:	191.9	        314.1
      w_mask_422_w16_8bpc_c:	        7783.7	        10543.9
      w_mask_422_w16_8bpc_neon:	559.8	        835.5
      w_mask_422_w32_8bpc_c:	        30895.7	        41141.2
      w_mask_422_w32_8bpc_neon:	2089.7	        3187.2
      w_mask_422_w64_8bpc_c:	        75500.2	        98766.3
      w_mask_422_w64_8bpc_neon:	5379	        8208.2
      w_mask_422_w128_8bpc_c:	        186967.1	245809.1
      w_mask_422_w128_8bpc_neon:	15159.9	        21474.5
      
      w_mask_444_w4_8bpc_c:	        850.1	        1136.6
      w_mask_444_w4_8bpc_neon:	66.5	        104.7
      w_mask_444_w8_8bpc_c:	        2373.5	        3262.9
      w_mask_444_w8_8bpc_neon:	180.5	        290.2
      w_mask_444_w16_8bpc_c:	        7291.6	        10590.7
      w_mask_444_w16_8bpc_neon:	550.9	        809.7
      w_mask_444_w32_8bpc_c:	        8048.3	        10140.8
      w_mask_444_w32_8bpc_neon:	2136.2	        3095
      w_mask_444_w64_8bpc_c:	        18055.3	        23060
      w_mask_444_w64_8bpc_neon:	5522.5	        8124.8
      w_mask_444_w128_8bpc_c:	        42754.3	        56072
      w_mask_444_w128_8bpc_neon:	15569.5	        21531.5
      3d94fb9a
  2. 14 Aug, 2019 2 commits
    • B Krishnan Iyer's avatar
      arm64: mc: NEON implementation of blend, blend_h and blend_v function · 1dc2dc7d
      B Krishnan Iyer authored
                         	A73	A53
      blend_h_w2_8bpc_c:	184.7	301.5
      blend_h_w2_8bpc_neon:	58.8	104.1
      blend_h_w4_8bpc_c:	291.4	507.3
      blend_h_w4_8bpc_neon:	48.7	108.9
      blend_h_w8_8bpc_c:	510.1	992.7
      blend_h_w8_8bpc_neon:	66.5	99.3
      blend_h_w16_8bpc_c:	972	1835.3
      blend_h_w16_8bpc_neon:	82.7	145.2
      blend_h_w32_8bpc_c:	776.7	912.9
      blend_h_w32_8bpc_neon:	155.1	266.9
      blend_h_w64_8bpc_c:	1424.3	1635.4
      blend_h_w64_8bpc_neon:	273.4	480.9
      blend_h_w128_8bpc_c:	3318.1	3774
      blend_h_w128_8bpc_neon:	614.1	1097.9
      blend_v_w2_8bpc_c:	278.8	427.5
      blend_v_w2_8bpc_neon:	113.7	170.4
      blend_v_w4_8bpc_c:	960.2	1597.7
      blend_v_w4_8bpc_neon:	222.9	351.4
      blend_v_w8_8bpc_c:	1694.2	3333.5
      blend_v_w8_8bpc_neon:	200.9	333.6
      blend_v_w16_8bpc_c:	3115.2	5971.6
      blend_v_w16_8bpc_neon:	233.2	494.8
      blend_v_w32_8bpc_c:	3949.7	6070.6
      blend_v_w32_8bpc_neon:	460.4	841.6
      blend_w4_8bpc_c:	244.2	388.3
      blend_w4_8bpc_neon:	25.5	66.7
      blend_w8_8bpc_c:	616.3	1120.8
      blend_w8_8bpc_neon:	46	110.7
      blend_w16_8bpc_c:	2193.1	4056.4
      blend_w16_8bpc_neon:	140.7	299.3
      blend_w32_8bpc_c:	2502.8	2998.5
      blend_w32_8bpc_neon:	381.4	725.3
      1dc2dc7d
    • Michael Bradshaw's avatar
      d20d70e8
  3. 13 Aug, 2019 4 commits
  4. 10 Aug, 2019 2 commits
  5. 09 Aug, 2019 3 commits
  6. 08 Aug, 2019 5 commits
    • Henrik Gramner's avatar
      Avoid CDF overreads in gather_top_partition_prob() · d8799d94
      Henrik Gramner authored
      Explicitly take advantage of the fact that certain probabilities are zero
      instead of loading zeros from the CDF padding.
      
      The current code works just fine, but only because those values happen to
      be zero due to what is essentially an implementation detail.
      d8799d94
    • Henrik Gramner's avatar
      Set thread names on MacOS · fa32f2de
      Henrik Gramner authored
      fa32f2de
    • Henrik Gramner's avatar
      Set thread names on Windows 10 · 6c3e85de
      Henrik Gramner authored
      6c3e85de
    • B Krishnan Iyer's avatar
      arm: mc: Speed up due to memory alignment in ldr/str instructions · b0d00020
      B Krishnan Iyer authored
      blend/blend_h/blend_v:
      
      Before:               Cortex A7      A8      A9     A53     A72     A73
      blend_h_w2_8bpc_neon:     169.5   194.2   153.1   134.0    63.0    72.6
      blend_h_w4_8bpc_neon:     164.4   171.8   142.2   137.8    60.5    60.2
      blend_h_w8_8bpc_neon:     184.8   121.0   146.5   123.4    55.9    63.1
      blend_h_w16_8bpc_neon:    291.0   178.6   237.3   181.0    88.6    83.9
      blend_h_w32_8bpc_neon:    531.9   321.5   432.2   358.3   155.6   156.2
      blend_h_w64_8bpc_neon:    957.6   600.3   827.4   631.2   279.7   268.4
      blend_h_w128_8bpc_neon:  2161.5  1398.4  1931.8  1403.4   607.0   597.9
      blend_v_w2_8bpc_neon:     249.3   373.4   269.2   195.6   107.9   117.6
      blend_v_w4_8bpc_neon:     451.7   676.1   555.3   376.1   198.6   266.9
      blend_v_w8_8bpc_neon:     561.0   475.2   607.6   357.0   213.9   204.1
      blend_v_w16_8bpc_neon:    928.4   626.8   823.8   592.3   269.9   245.3
      blend_v_w32_8bpc_neon:   1477.6  1024.8  1186.6   994.5   346.6   370.0
      blend_w4_8bpc_neon:       103.3   113.0    86.2    91.5    38.6    35.2
      blend_w8_8bpc_neon:       174.9   116.6   137.1   123.1    50.8    55.0
      blend_w16_8bpc_neon:      533.0   334.3   446.6   348.6   150.7   155.4
      blend_w32_8bpc_neon:     1299.2   836.8  1170.7   909.9   370.5   386.3
      
      After:
      blend_h_w2_8bpc_neon:     169.6   169.8   140.9   134.0    62.3    72.5
      blend_h_w4_8bpc_neon:     164.5   149.1   127.6   137.7    59.1    60.1
      blend_h_w8_8bpc_neon:     184.9   102.7   126.3   123.4    54.9    63.2
      blend_h_w16_8bpc_neon:    291.0   163.8   232.1   180.9    88.4    83.9
      blend_h_w32_8bpc_neon:    531.2   285.6   422.6   358.4   155.5   155.9
      blend_h_w64_8bpc_neon:    956.0   541.9   809.9   631.6   280.0   270.6
      blend_h_w128_8bpc_neon:  2159.0  1253.6  1889.0  1404.8   606.2   600.5
      blend_v_w2_8bpc_neon:     249.9   362.0   269.4   195.6   107.8   117.6
      blend_v_w4_8bpc_neon:     452.6   541.6   538.2   376.1   199.5   266.9
      blend_v_w8_8bpc_neon:     561.0   348.9   551.3   357.7   214.3   204.4
      blend_v_w16_8bpc_neon:    926.8   510.9   785.0   592.1   270.7   245.8
      blend_v_w32_8bpc_neon:   1474.4   913.3  1151.4   995.7   347.5   371.2
      blend_w4_8bpc_neon:       103.3    96.6    76.9    91.5    33.7    35.3
      blend_w8_8bpc_neon:       174.9    88.2   114.8   123.1    51.5    55.0
      blend_w16_8bpc_neon:      532.8   282.2   445.3   348.5   149.8   155.7
      blend_w32_8bpc_neon:     1295.1   735.2  1122.8   908.4   372.0   386.5
      
      w_mask_444/422/420:
      
      Before:                    Cortex A7        A8        A9       A53       A72      A73
      w_mask_420_w4_8bpc_neon:       218.1     144.4     187.3     152.7      86.9     89.0
      w_mask_420_w8_8bpc_neon:       544.0     393.7     437.0     372.5     211.1    230.9
      w_mask_420_w16_8bpc_neon:     1537.2    1063.5    1182.3    1024.3     566.4    667.7
      w_mask_420_w32_8bpc_neon:     5734.7    4207.2    4716.8    3822.8    2340.5   2521.3
      w_mask_420_w64_8bpc_neon:    14317.6   10165.0   13220.2    9578.5    5578.9   5989.9
      w_mask_420_w128_8bpc_neon:   37932.8   25299.1   39562.9   25203.8   14916.4  15465.1
      w_mask_422_w4_8bpc_neon:       206.8     141.4     177.9     143.4      82.1     84.8
      w_mask_422_w8_8bpc_neon:       511.8     380.8     416.7     342.5     198.5    221.7
      w_mask_422_w16_8bpc_neon:     1632.8    1154.4    1282.9    1061.2     595.3    684.9
      w_mask_422_w32_8bpc_neon:     6087.8    4560.3    5173.3    3945.8    2319.1   2608.7
      w_mask_422_w64_8bpc_neon:    15183.7   11013.9   14435.6    9904.6    5449.9   6100.9
      w_mask_422_w128_8bpc_neon:   39951.2   27441.0   42398.2   25995.1   14624.9  15529.2
      w_mask_444_w4_8bpc_neon:       193.4     127.0     170.0     135.4      76.8     81.4
      w_mask_444_w8_8bpc_neon:       477.8     340.0     427.9     319.3     187.2    214.7
      w_mask_444_w16_8bpc_neon:     1529.0    1058.8    1209.4     987.0     571.7    677.3
      w_mask_444_w32_8bpc_neon:     5687.9    4166.9    4882.4    3667.0    2286.8   2518.7
      w_mask_444_w64_8bpc_neon:    14394.7   10055.1   14057.9    9372.0    5369.3   5898.7
      w_mask_444_w128_8bpc_neon:   37952.0   25008.8   42169.9   24988.8   22973.7  15241.1
      
      After:
      w_mask_420_w4_8bpc_neon:       219.7     120.7     178.0     152.7      87.2     89.0
      w_mask_420_w8_8bpc_neon:       547.5     355.2     404.4     372.4     211.4    231.0
      w_mask_420_w16_8bpc_neon:     1540.9     987.1    1113.0    1024.9     567.4    669.5
      w_mask_420_w32_8bpc_neon:     5915.4    3905.8    4516.8    3929.3    2363.7   2523.6
      w_mask_420_w64_8bpc_neon:    14860.9    9437.1   12609.7    9586.4    5627.3   6005.8
      w_mask_420_w128_8bpc_neon:   38799.1   23536.1   38598.3   24787.7   14595.7  15474.9
      w_mask_422_w4_8bpc_neon:       208.3     115.4     168.6     143.4      82.4     84.8
      w_mask_422_w8_8bpc_neon:       515.2     335.7     383.2     342.5     198.9    221.8
      w_mask_422_w16_8bpc_neon:     1643.2    1053.6    1199.3    1062.2     595.6    685.7
      w_mask_422_w32_8bpc_neon:     6335.1    4161.0    4959.3    4088.5    2353.0   2606.4
      w_mask_422_w64_8bpc_neon:    15689.4   10039.8   13806.1    9937.7    5535.3   6099.8
      w_mask_422_w128_8bpc_neon:   40754.4   25033.3   41390.5   25683.7   14668.8  15537.1
      w_mask_444_w4_8bpc_neon:       194.9     107.4     162.0     135.4      77.1     81.4
      w_mask_444_w8_8bpc_neon:       481.1     300.2     422.0     319.1     187.6    214.6
      w_mask_444_w16_8bpc_neon:     1542.6     956.1    1137.7     988.4     572.4    677.5
      w_mask_444_w32_8bpc_neon:     5896.1    3766.1    4731.9    3801.2    2322.9   2521.8
      w_mask_444_w64_8bpc_neon:    14814.0    9084.7   13515.4    9311.0    5497.3   5896.3
      w_mask_444_w128_8bpc_neon:   38587.7   22615.2   41389.9   24639.4   17705.8  15244.3
      b0d00020
    • Martin Storsjö's avatar
  7. 07 Aug, 2019 1 commit
  8. 02 Aug, 2019 2 commits
  9. 28 Jul, 2019 1 commit
  10. 27 Jul, 2019 5 commits
  11. 25 Jul, 2019 1 commit
  12. 23 Jul, 2019 4 commits
    • B Krishnan Iyer's avatar
      arm: mc: neon: Merge load and other related operations in blend/blend_h/blend_v functions · 407c27db
      B Krishnan Iyer authored
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	71.1	74.1	132.7	137.5
      blend_h_w4_8bpc_neon:	60.2	65.8	137.5	147.1
      blend_h_w8_8bpc_neon:	62.2	68.9	123.1	131.7
      blend_h_w16_8bpc_neon:	82.1	86	180.7	190.3
      blend_h_w32_8bpc_neon:	149.9	149.2	358.3	358
      blend_h_w64_8bpc_neon:	265.3	263.1	630.2	629.8
      blend_h_w128_8bpc_neon:	579.5	571	1404.4	1404.5
      blend_v_w2_8bpc_neon:	118.7	118.7	193.2	195.3
      blend_v_w4_8bpc_neon:	248.6	245.8	373.4	357.3
      blend_v_w8_8bpc_neon:	202.7	202	356.4	357.2
      blend_v_w16_8bpc_neon:	238.8	234.8	590.4	591.3
      blend_v_w32_8bpc_neon:	346.7	344.4	993.7	994.7
      blend_w4_8bpc_neon:	33.5	37.5	90.7	96.7
      blend_w8_8bpc_neon:	49.7	53	123.3	123.3
      blend_w16_8bpc_neon:	151.8	151	348.8	332.4
      blend_w32_8bpc_neon:	372.9	370.9	908.3	908.4
      407c27db
    • B Krishnan Iyer's avatar
      arm: mc: neon: Reduce usage of general purpose registers in blend/blend_v functions · d4df8619
      B Krishnan Iyer authored
      	                	A73		A53
                      	Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.1	137.5	137.5
      blend_h_w4_8bpc_neon:	65.8	65.8	147.1	147.1
      blend_h_w8_8bpc_neon:	68.9	68.7	131.7	131.7
      blend_h_w16_8bpc_neon:	86	85.6	190.3	190.4
      blend_h_w32_8bpc_neon:	149.2	149.8	358	358.3
      blend_h_w64_8bpc_neon:	263.1	264.1	629.8	630.3
      blend_h_w128_8bpc_neon:	571	575.4	1404.5	1404.2
      blend_v_w2_8bpc_neon:	118.7	120.1	195.3	196.4
      blend_v_w4_8bpc_neon:	245.8	247.2	357.3	358.4
      blend_v_w8_8bpc_neon:	202	204.2	357.2	358.4
      blend_v_w16_8bpc_neon:	234.8	238.5	591.3	591.8
      blend_v_w32_8bpc_neon:	344.4	347.2	994.7	997.2
      blend_w4_8bpc_neon:	37.5	38.3	96.7	98.7
      blend_w8_8bpc_neon:	53	54.8	123.3	125.3
      blend_w16_8bpc_neon:	151	150.8	332.4	334.5
      blend_w32_8bpc_neon:	370.9	361.6	908.4	910.7
      d4df8619
    • B Krishnan Iyer's avatar
      arm: mc: neon: Use vld with ! post-increment instead of a register in... · b704a993
      B Krishnan Iyer authored
      arm: mc: neon: Use vld with ! post-increment instead of a register in blend/blend_h/blend_v function
      
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.6	137.5	137
      blend_h_w4_8bpc_neon:	65.8	66	147.1	146.6
      blend_h_w8_8bpc_neon:	68.7	68.6	131.7	131.2
      blend_h_w16_8bpc_neon:	85.6	85.9	190.4	192
      blend_h_w32_8bpc_neon:	149.8	149.8	358.3	357.6
      blend_h_w64_8bpc_neon:	264.1	262.8	630.3	629.5
      blend_h_w128_8bpc_neon:	575.4	577	1404.2	1402
      blend_v_w2_8bpc_neon:	120.1	121.3	196.4	195.5
      blend_v_w4_8bpc_neon:	247.2	247.5	358.4	358.5
      blend_v_w8_8bpc_neon:	204.2	205.2	358.4	358.5
      blend_v_w16_8bpc_neon:	238.5	237.1	591.8	590.5
      blend_v_w32_8bpc_neon:	347.2	345.8	997.2	994.1
      blend_w4_8bpc_neon:	38.3	38.6	98.7	99.2
      blend_w8_8bpc_neon:	54.8	55.1	125.3	125.8
      blend_w16_8bpc_neon:	150.8	150.1	334.5	344
      blend_w32_8bpc_neon:	361.6	360.4	910.7	910.9
      b704a993
    • Marvin Scholz's avatar
      tools: add a simple player example · 5ab6d231
      Marvin Scholz authored
      5ab6d231
  13. 17 Jul, 2019 1 commit
  14. 15 Jul, 2019 1 commit
    • Emmanuel Gil Peyrot's avatar
      Set thread names on Linux · 15a93861
      Emmanuel Gil Peyrot authored
      This is using the Linux-only prctl(PR_SET_NAME, …) call, because glibc’s
      pthread_setname_np() is doing exactly the same call so there is no
      reason to use it instead, as it isn’t any more portable.
      
      I don’t have any other OS to test this on, but if you want to add one
      just add an #else defined(__YOUR_OS__) before the #else in thread.h.
      15a93861
  15. 13 Jul, 2019 1 commit
    • B Krishnan Iyer's avatar
      arm: mc: NEON implementation of w_mask_444/422/420 function · b271590a
      B Krishnan Iyer authored
      		                        A73		A53
      
      w_mask_420_w4_8bpc_c:	        	797.5		1072.7
      w_mask_420_w4_8bpc_neon:		85.6		152.7
      w_mask_420_w8_8bpc_c:		        2344.3		3118.7
      w_mask_420_w8_8bpc_neon:		221.9		372.4
      w_mask_420_w16_8bpc_c:		        7429.9		9702.1
      w_mask_420_w16_8bpc_neon:		620.4		1024.1
      w_mask_420_w32_8bpc_c:	        	27498.2		37205.7
      w_mask_420_w32_8bpc_neon:		2394.1		3838
      w_mask_420_w64_8bpc_c:  		66495.8		88721.3
      w_mask_420_w64_8bpc_neon:      		6081.4		9630
      w_mask_420_w128_8bpc_c:	        	163369.3	219494
      w_mask_420_w128_8bpc_neon:		16015.7		24969.3
      w_mask_422_w4_8bpc_c:	        	858.3		1100.2
      w_mask_422_w4_8bpc_neon:		81.5		143.1
      w_mask_422_w8_8bpc_c:	        	2447.5		3284.6
      w_mask_422_w8_8bpc_neon:		217.5		342.4
      w_mask_422_w16_8bpc_c:	        	7673.4		10135.9
      w_mask_422_w16_8bpc_neon:		632.5		1062.6
      w_mask_422_w32_8bpc_c:	        	28344.9		39090
      w_mask_422_w32_8bpc_neon:		2393.4		3963.8
      w_mask_422_w64_8bpc_c:	        	68159.6		93447
      w_mask_422_w64_8bpc_neon:		6015.7		9928.1
      w_mask_422_w128_8bpc_c:	        	169501.2	231702.7
      w_mask_422_w128_8bpc_neon:		15847.5		25803.4
      w_mask_444_w4_8bpc_c:	        	674.6		862.3
      w_mask_444_w4_8bpc_neon:		80.2		135.4
      w_mask_444_w8_8bpc_c:	        	2031.4		2693
      w_mask_444_w8_8bpc_neon:		209.3		318.7
      w_mask_444_w16_8bpc_c:		        6576		8217.4
      w_mask_444_w16_8bpc_neon:		627.3		986.2
      w_mask_444_w32_8bpc_c:		        26051.7		31593.9
      w_mask_444_w32_8bpc_neon:		2374		3671.6
      w_mask_444_w64_8bpc_c:		        63600		75849.9
      w_mask_444_w64_8bpc_neon:		5957		9335.5
      w_mask_444_w128_8bpc_c:		        156964.7	187932.4
      w_mask_444_w128_8bpc_neon:		15759.4		24549.5
      b271590a
  16. 08 Jul, 2019 1 commit
  17. 07 Jul, 2019 1 commit
  18. 06 Jul, 2019 1 commit
  19. 05 Jul, 2019 3 commits
    • Henrik Gramner's avatar
      Improve robustness of handling malloc failures · e2e56ab9
      Henrik Gramner authored
      Calling dav1d_get_picture() again after it has already returned with
      an error due to a memory allocation failure could result in crashes.
      
      Although doing so is not a proper API usage, and the outcome is going
      to be unpredictable, we should at least try to avoid crashing.
      e2e56ab9
    • Henrik Gramner's avatar
      Correctly return an error on malloc failure · c1a28d0e
      Henrik Gramner authored
      dav1d_submit_frame() could erroneously return 0 when tile data memory
      allocation failed.
      
      Fixes an assertion failure in dav1d_parse_obus().
      c1a28d0e
    • Henrik Gramner's avatar
      Fix potential memory leak · 0435ec9c
      Henrik Gramner authored
      In the (very unlikely) scenario of a pthread mutex/cond init failure
      in the tile state reallocation code some newly allocated mutexes/conds
      could leak.
      0435ec9c