1. 13 Aug, 2019 4 commits
  2. 10 Aug, 2019 2 commits
  3. 09 Aug, 2019 3 commits
  4. 08 Aug, 2019 5 commits
    • Henrik Gramner's avatar
      Avoid CDF overreads in gather_top_partition_prob() · d8799d94
      Henrik Gramner authored
      Explicitly take advantage of the fact that certain probabilities are zero
      instead of loading zeros from the CDF padding.
      
      The current code works just fine, but only because those values happen to
      be zero due to what is essentially an implementation detail.
      d8799d94
    • Henrik Gramner's avatar
      Set thread names on MacOS · fa32f2de
      Henrik Gramner authored
      fa32f2de
    • Henrik Gramner's avatar
      Set thread names on Windows 10 · 6c3e85de
      Henrik Gramner authored
      6c3e85de
    • B Krishnan Iyer's avatar
      arm: mc: Speed up due to memory alignment in ldr/str instructions · b0d00020
      B Krishnan Iyer authored
      blend/blend_h/blend_v:
      
      Before:               Cortex A7      A8      A9     A53     A72     A73
      blend_h_w2_8bpc_neon:     169.5   194.2   153.1   134.0    63.0    72.6
      blend_h_w4_8bpc_neon:     164.4   171.8   142.2   137.8    60.5    60.2
      blend_h_w8_8bpc_neon:     184.8   121.0   146.5   123.4    55.9    63.1
      blend_h_w16_8bpc_neon:    291.0   178.6   237.3   181.0    88.6    83.9
      blend_h_w32_8bpc_neon:    531.9   321.5   432.2   358.3   155.6   156.2
      blend_h_w64_8bpc_neon:    957.6   600.3   827.4   631.2   279.7   268.4
      blend_h_w128_8bpc_neon:  2161.5  1398.4  1931.8  1403.4   607.0   597.9
      blend_v_w2_8bpc_neon:     249.3   373.4   269.2   195.6   107.9   117.6
      blend_v_w4_8bpc_neon:     451.7   676.1   555.3   376.1   198.6   266.9
      blend_v_w8_8bpc_neon:     561.0   475.2   607.6   357.0   213.9   204.1
      blend_v_w16_8bpc_neon:    928.4   626.8   823.8   592.3   269.9   245.3
      blend_v_w32_8bpc_neon:   1477.6  1024.8  1186.6   994.5   346.6   370.0
      blend_w4_8bpc_neon:       103.3   113.0    86.2    91.5    38.6    35.2
      blend_w8_8bpc_neon:       174.9   116.6   137.1   123.1    50.8    55.0
      blend_w16_8bpc_neon:      533.0   334.3   446.6   348.6   150.7   155.4
      blend_w32_8bpc_neon:     1299.2   836.8  1170.7   909.9   370.5   386.3
      
      After:
      blend_h_w2_8bpc_neon:     169.6   169.8   140.9   134.0    62.3    72.5
      blend_h_w4_8bpc_neon:     164.5   149.1   127.6   137.7    59.1    60.1
      blend_h_w8_8bpc_neon:     184.9   102.7   126.3   123.4    54.9    63.2
      blend_h_w16_8bpc_neon:    291.0   163.8   232.1   180.9    88.4    83.9
      blend_h_w32_8bpc_neon:    531.2   285.6   422.6   358.4   155.5   155.9
      blend_h_w64_8bpc_neon:    956.0   541.9   809.9   631.6   280.0   270.6
      blend_h_w128_8bpc_neon:  2159.0  1253.6  1889.0  1404.8   606.2   600.5
      blend_v_w2_8bpc_neon:     249.9   362.0   269.4   195.6   107.8   117.6
      blend_v_w4_8bpc_neon:     452.6   541.6   538.2   376.1   199.5   266.9
      blend_v_w8_8bpc_neon:     561.0   348.9   551.3   357.7   214.3   204.4
      blend_v_w16_8bpc_neon:    926.8   510.9   785.0   592.1   270.7   245.8
      blend_v_w32_8bpc_neon:   1474.4   913.3  1151.4   995.7   347.5   371.2
      blend_w4_8bpc_neon:       103.3    96.6    76.9    91.5    33.7    35.3
      blend_w8_8bpc_neon:       174.9    88.2   114.8   123.1    51.5    55.0
      blend_w16_8bpc_neon:      532.8   282.2   445.3   348.5   149.8   155.7
      blend_w32_8bpc_neon:     1295.1   735.2  1122.8   908.4   372.0   386.5
      
      w_mask_444/422/420:
      
      Before:                    Cortex A7        A8        A9       A53       A72      A73
      w_mask_420_w4_8bpc_neon:       218.1     144.4     187.3     152.7      86.9     89.0
      w_mask_420_w8_8bpc_neon:       544.0     393.7     437.0     372.5     211.1    230.9
      w_mask_420_w16_8bpc_neon:     1537.2    1063.5    1182.3    1024.3     566.4    667.7
      w_mask_420_w32_8bpc_neon:     5734.7    4207.2    4716.8    3822.8    2340.5   2521.3
      w_mask_420_w64_8bpc_neon:    14317.6   10165.0   13220.2    9578.5    5578.9   5989.9
      w_mask_420_w128_8bpc_neon:   37932.8   25299.1   39562.9   25203.8   14916.4  15465.1
      w_mask_422_w4_8bpc_neon:       206.8     141.4     177.9     143.4      82.1     84.8
      w_mask_422_w8_8bpc_neon:       511.8     380.8     416.7     342.5     198.5    221.7
      w_mask_422_w16_8bpc_neon:     1632.8    1154.4    1282.9    1061.2     595.3    684.9
      w_mask_422_w32_8bpc_neon:     6087.8    4560.3    5173.3    3945.8    2319.1   2608.7
      w_mask_422_w64_8bpc_neon:    15183.7   11013.9   14435.6    9904.6    5449.9   6100.9
      w_mask_422_w128_8bpc_neon:   39951.2   27441.0   42398.2   25995.1   14624.9  15529.2
      w_mask_444_w4_8bpc_neon:       193.4     127.0     170.0     135.4      76.8     81.4
      w_mask_444_w8_8bpc_neon:       477.8     340.0     427.9     319.3     187.2    214.7
      w_mask_444_w16_8bpc_neon:     1529.0    1058.8    1209.4     987.0     571.7    677.3
      w_mask_444_w32_8bpc_neon:     5687.9    4166.9    4882.4    3667.0    2286.8   2518.7
      w_mask_444_w64_8bpc_neon:    14394.7   10055.1   14057.9    9372.0    5369.3   5898.7
      w_mask_444_w128_8bpc_neon:   37952.0   25008.8   42169.9   24988.8   22973.7  15241.1
      
      After:
      w_mask_420_w4_8bpc_neon:       219.7     120.7     178.0     152.7      87.2     89.0
      w_mask_420_w8_8bpc_neon:       547.5     355.2     404.4     372.4     211.4    231.0
      w_mask_420_w16_8bpc_neon:     1540.9     987.1    1113.0    1024.9     567.4    669.5
      w_mask_420_w32_8bpc_neon:     5915.4    3905.8    4516.8    3929.3    2363.7   2523.6
      w_mask_420_w64_8bpc_neon:    14860.9    9437.1   12609.7    9586.4    5627.3   6005.8
      w_mask_420_w128_8bpc_neon:   38799.1   23536.1   38598.3   24787.7   14595.7  15474.9
      w_mask_422_w4_8bpc_neon:       208.3     115.4     168.6     143.4      82.4     84.8
      w_mask_422_w8_8bpc_neon:       515.2     335.7     383.2     342.5     198.9    221.8
      w_mask_422_w16_8bpc_neon:     1643.2    1053.6    1199.3    1062.2     595.6    685.7
      w_mask_422_w32_8bpc_neon:     6335.1    4161.0    4959.3    4088.5    2353.0   2606.4
      w_mask_422_w64_8bpc_neon:    15689.4   10039.8   13806.1    9937.7    5535.3   6099.8
      w_mask_422_w128_8bpc_neon:   40754.4   25033.3   41390.5   25683.7   14668.8  15537.1
      w_mask_444_w4_8bpc_neon:       194.9     107.4     162.0     135.4      77.1     81.4
      w_mask_444_w8_8bpc_neon:       481.1     300.2     422.0     319.1     187.6    214.6
      w_mask_444_w16_8bpc_neon:     1542.6     956.1    1137.7     988.4     572.4    677.5
      w_mask_444_w32_8bpc_neon:     5896.1    3766.1    4731.9    3801.2    2322.9   2521.8
      w_mask_444_w64_8bpc_neon:    14814.0    9084.7   13515.4    9311.0    5497.3   5896.3
      w_mask_444_w128_8bpc_neon:   38587.7   22615.2   41389.9   24639.4   17705.8  15244.3
      b0d00020
    • Martin Storsjö's avatar
  5. 07 Aug, 2019 1 commit
  6. 02 Aug, 2019 2 commits
  7. 28 Jul, 2019 1 commit
  8. 27 Jul, 2019 5 commits
  9. 25 Jul, 2019 1 commit
  10. 23 Jul, 2019 4 commits
    • B Krishnan Iyer's avatar
      arm: mc: neon: Merge load and other related operations in blend/blend_h/blend_v functions · 407c27db
      B Krishnan Iyer authored
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	71.1	74.1	132.7	137.5
      blend_h_w4_8bpc_neon:	60.2	65.8	137.5	147.1
      blend_h_w8_8bpc_neon:	62.2	68.9	123.1	131.7
      blend_h_w16_8bpc_neon:	82.1	86	180.7	190.3
      blend_h_w32_8bpc_neon:	149.9	149.2	358.3	358
      blend_h_w64_8bpc_neon:	265.3	263.1	630.2	629.8
      blend_h_w128_8bpc_neon:	579.5	571	1404.4	1404.5
      blend_v_w2_8bpc_neon:	118.7	118.7	193.2	195.3
      blend_v_w4_8bpc_neon:	248.6	245.8	373.4	357.3
      blend_v_w8_8bpc_neon:	202.7	202	356.4	357.2
      blend_v_w16_8bpc_neon:	238.8	234.8	590.4	591.3
      blend_v_w32_8bpc_neon:	346.7	344.4	993.7	994.7
      blend_w4_8bpc_neon:	33.5	37.5	90.7	96.7
      blend_w8_8bpc_neon:	49.7	53	123.3	123.3
      blend_w16_8bpc_neon:	151.8	151	348.8	332.4
      blend_w32_8bpc_neon:	372.9	370.9	908.3	908.4
      407c27db
    • B Krishnan Iyer's avatar
      arm: mc: neon: Reduce usage of general purpose registers in blend/blend_v functions · d4df8619
      B Krishnan Iyer authored
      	                	A73		A53
                      	Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.1	137.5	137.5
      blend_h_w4_8bpc_neon:	65.8	65.8	147.1	147.1
      blend_h_w8_8bpc_neon:	68.9	68.7	131.7	131.7
      blend_h_w16_8bpc_neon:	86	85.6	190.3	190.4
      blend_h_w32_8bpc_neon:	149.2	149.8	358	358.3
      blend_h_w64_8bpc_neon:	263.1	264.1	629.8	630.3
      blend_h_w128_8bpc_neon:	571	575.4	1404.5	1404.2
      blend_v_w2_8bpc_neon:	118.7	120.1	195.3	196.4
      blend_v_w4_8bpc_neon:	245.8	247.2	357.3	358.4
      blend_v_w8_8bpc_neon:	202	204.2	357.2	358.4
      blend_v_w16_8bpc_neon:	234.8	238.5	591.3	591.8
      blend_v_w32_8bpc_neon:	344.4	347.2	994.7	997.2
      blend_w4_8bpc_neon:	37.5	38.3	96.7	98.7
      blend_w8_8bpc_neon:	53	54.8	123.3	125.3
      blend_w16_8bpc_neon:	151	150.8	332.4	334.5
      blend_w32_8bpc_neon:	370.9	361.6	908.4	910.7
      d4df8619
    • B Krishnan Iyer's avatar
      arm: mc: neon: Use vld with ! post-increment instead of a register in... · b704a993
      B Krishnan Iyer authored
      arm: mc: neon: Use vld with ! post-increment instead of a register in blend/blend_h/blend_v function
      
      	                        A73		A53
      	                Current	Earlier	Current	Earlier
      blend_h_w2_8bpc_neon:	74.1	74.6	137.5	137
      blend_h_w4_8bpc_neon:	65.8	66	147.1	146.6
      blend_h_w8_8bpc_neon:	68.7	68.6	131.7	131.2
      blend_h_w16_8bpc_neon:	85.6	85.9	190.4	192
      blend_h_w32_8bpc_neon:	149.8	149.8	358.3	357.6
      blend_h_w64_8bpc_neon:	264.1	262.8	630.3	629.5
      blend_h_w128_8bpc_neon:	575.4	577	1404.2	1402
      blend_v_w2_8bpc_neon:	120.1	121.3	196.4	195.5
      blend_v_w4_8bpc_neon:	247.2	247.5	358.4	358.5
      blend_v_w8_8bpc_neon:	204.2	205.2	358.4	358.5
      blend_v_w16_8bpc_neon:	238.5	237.1	591.8	590.5
      blend_v_w32_8bpc_neon:	347.2	345.8	997.2	994.1
      blend_w4_8bpc_neon:	38.3	38.6	98.7	99.2
      blend_w8_8bpc_neon:	54.8	55.1	125.3	125.8
      blend_w16_8bpc_neon:	150.8	150.1	334.5	344
      blend_w32_8bpc_neon:	361.6	360.4	910.7	910.9
      b704a993
    • Marvin Scholz's avatar
      tools: add a simple player example · 5ab6d231
      Marvin Scholz authored
      5ab6d231
  11. 17 Jul, 2019 1 commit
  12. 15 Jul, 2019 1 commit
    • Emmanuel Gil Peyrot's avatar
      Set thread names on Linux · 15a93861
      Emmanuel Gil Peyrot authored
      This is using the Linux-only prctl(PR_SET_NAME, …) call, because glibc’s
      pthread_setname_np() is doing exactly the same call so there is no
      reason to use it instead, as it isn’t any more portable.
      
      I don’t have any other OS to test this on, but if you want to add one
      just add an #else defined(__YOUR_OS__) before the #else in thread.h.
      15a93861
  13. 13 Jul, 2019 1 commit
    • B Krishnan Iyer's avatar
      arm: mc: NEON implementation of w_mask_444/422/420 function · b271590a
      B Krishnan Iyer authored
      		                        A73		A53
      
      w_mask_420_w4_8bpc_c:	        	797.5		1072.7
      w_mask_420_w4_8bpc_neon:		85.6		152.7
      w_mask_420_w8_8bpc_c:		        2344.3		3118.7
      w_mask_420_w8_8bpc_neon:		221.9		372.4
      w_mask_420_w16_8bpc_c:		        7429.9		9702.1
      w_mask_420_w16_8bpc_neon:		620.4		1024.1
      w_mask_420_w32_8bpc_c:	        	27498.2		37205.7
      w_mask_420_w32_8bpc_neon:		2394.1		3838
      w_mask_420_w64_8bpc_c:  		66495.8		88721.3
      w_mask_420_w64_8bpc_neon:      		6081.4		9630
      w_mask_420_w128_8bpc_c:	        	163369.3	219494
      w_mask_420_w128_8bpc_neon:		16015.7		24969.3
      w_mask_422_w4_8bpc_c:	        	858.3		1100.2
      w_mask_422_w4_8bpc_neon:		81.5		143.1
      w_mask_422_w8_8bpc_c:	        	2447.5		3284.6
      w_mask_422_w8_8bpc_neon:		217.5		342.4
      w_mask_422_w16_8bpc_c:	        	7673.4		10135.9
      w_mask_422_w16_8bpc_neon:		632.5		1062.6
      w_mask_422_w32_8bpc_c:	        	28344.9		39090
      w_mask_422_w32_8bpc_neon:		2393.4		3963.8
      w_mask_422_w64_8bpc_c:	        	68159.6		93447
      w_mask_422_w64_8bpc_neon:		6015.7		9928.1
      w_mask_422_w128_8bpc_c:	        	169501.2	231702.7
      w_mask_422_w128_8bpc_neon:		15847.5		25803.4
      w_mask_444_w4_8bpc_c:	        	674.6		862.3
      w_mask_444_w4_8bpc_neon:		80.2		135.4
      w_mask_444_w8_8bpc_c:	        	2031.4		2693
      w_mask_444_w8_8bpc_neon:		209.3		318.7
      w_mask_444_w16_8bpc_c:		        6576		8217.4
      w_mask_444_w16_8bpc_neon:		627.3		986.2
      w_mask_444_w32_8bpc_c:		        26051.7		31593.9
      w_mask_444_w32_8bpc_neon:		2374		3671.6
      w_mask_444_w64_8bpc_c:		        63600		75849.9
      w_mask_444_w64_8bpc_neon:		5957		9335.5
      w_mask_444_w128_8bpc_c:		        156964.7	187932.4
      w_mask_444_w128_8bpc_neon:		15759.4		24549.5
      b271590a
  14. 08 Jul, 2019 1 commit
  15. 07 Jul, 2019 1 commit
  16. 06 Jul, 2019 1 commit
  17. 05 Jul, 2019 3 commits
    • Henrik Gramner's avatar
      Improve robustness of handling malloc failures · e2e56ab9
      Henrik Gramner authored
      Calling dav1d_get_picture() again after it has already returned with
      an error due to a memory allocation failure could result in crashes.
      
      Although doing so is not a proper API usage, and the outcome is going
      to be unpredictable, we should at least try to avoid crashing.
      e2e56ab9
    • Henrik Gramner's avatar
      Correctly return an error on malloc failure · c1a28d0e
      Henrik Gramner authored
      dav1d_submit_frame() could erroneously return 0 when tile data memory
      allocation failed.
      
      Fixes an assertion failure in dav1d_parse_obus().
      c1a28d0e
    • Henrik Gramner's avatar
      Fix potential memory leak · 0435ec9c
      Henrik Gramner authored
      In the (very unlikely) scenario of a pthread mutex/cond init failure
      in the tile state reallocation code some newly allocated mutexes/conds
      could leak.
      0435ec9c
  18. 02 Jul, 2019 3 commits
    • B Krishnan Iyer's avatar
      arm: mc: neon: Improvement in blend_v function · 632b4876
      B Krishnan Iyer authored
      	                     A73             A53
      	                Earlier	Now	Earlier	Now
      
      blend_v_w2_8bpc_neon:	122.1	121.3	195.5	195.5
      blend_v_w4_8bpc_neon:	248.2	247.5	375.6	358.5
      blend_v_w8_8bpc_neon:	210.3	205.2	375.6	358.5
      blend_v_w16_8bpc_neon:	252.7	237.1	579.2	590.5
      blend_v_w32_8bpc_neon:	347	345.8	997.4	994.1
      632b4876
    • Henrik Gramner's avatar
      Reduce the size of frame threading buffers · 65ba279b
      Henrik Gramner authored
      Avoid allocating significantly more memory than what is actually used.
      65ba279b
    • Henrik Gramner's avatar
      Consolidate scratch buffers · 0276455d
      Henrik Gramner authored
      Also eliminate some pointer chasing by allocating tile context buffers
      as part of the struct instead of having the struct contain pointers to
      separately allocated buffers.
      0276455d