Jump to content

GSoC/2018/StatusReports/ivanyossiIván: Difference between revisions

From KDE Community Wiki
Ghevan (talk | contribs)
Created page with "== Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library == Krita digital painting app relies on quick painting response to give a natural..."
 
Ghevan (talk | contribs)
No edit summary
Line 1: Line 1:
== Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library ==
== Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library ==
Krita digital painting app relies on quick painting response to give a natural experience. A painted line is composed of thousands of images, called dabs, placed one after the other, each dab is masked to generate a different brush tip shape. This mask creation as stamping on canvas must be performed super fast as it is done thousands of times per second (A small brush of 300x300px with 10% spacing does around 600 dabs per second) . If the process of applying the images on canvas is not fast enough the painting process gets compromised and the enjoyment of painting is reduced.


For optimizing the mask creation we can use the AVX instructions set to apply transformation in vectors of data in one step. In this case the data is the image component coordinates composing the mask. One way of programming AVX is in assembly, but this is not manageable or future proof, as newer processors will come out with new, enhanced instruction sets. To allow future proof, krita has opted to use the Vc optimization library, which translates C++ code templates to assembly code tailored to the user’s processor features.
=== Summary ===
* '''Project Name:''' Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library
* '''Proposal:''' [https://docs.google.com/document/d/1TfAJnq_ZD3omGgsdGApBH_05B-w_OMC428UoHIJ0KWk/edit View Proposal]
* '''Abstract:''' Digital painting app relies on quick painting response to give a natural experience. A painted line is composed of thousands of images, called dabs, placed one after the other, each dab is masked to generate a different brush tip shape. As mask shapes are more complex and bigger, rendering them can be costly and painting becomes laggy. This project seeks to minimize the time spent generating the mask by implementing the generator using AVX instructions sets. Vc library is used to interface with the SIMD operations. Testing suggest the speed gains can be up to 10 times faster which improves the workflow using big brushes or complex multibrushes.


=== Project Goals ===
=== Project Goals ===
Implement AVX optimization using Vc library on
Implement Mask AVX optimization (Mask Type / Status, task)
* Circular Gauss
* Circular Gauss ''implemented, merged'' [https://phabricator.kde.org/T8734 T8734]
* Circular Soft
* Circular Soft ''implemented, in revision'' [https://phabricator.kde.org/T8868 T8868]
* Rectangular Gaussian
* Rectangular Gaussian ''working in progress'' [https://phabricator.kde.org/T9010 T9010]
* Rectangular Soft
* Rectangular Soft
* Stamp Mask
* Stamp Mask


== Work report ==
=== Project related links ===
First week, during community bonding, we read documentation and made a first proposal for the Unit test to be used in the implementation process. This Unit test has to compare the new mask shap and the legacy one and assert they are similar with a certain error. Unit test works ok, but it is not as isolated as needed and possibly other brush preparations used could interfere with the brush mask testing.
 
== Project related links ==
* [https://phabricator.kde.org/T8580 phabricator task]
* [https://phabricator.kde.org/T8580 phabricator task]
* [https://colorathis.wordpress.com/tag/KDE/ Personal blog]
* [https://colorathis.wordpress.com/tag/KDE/ Personal blog]
=== Code summaries ===
* '''34''' Commits
* '''[https://phabricator.kde.org/P237 Work done Differential]''' ''(code made until June 12, 2018)''
== Implementations Status ==
Status report on each goal implementation.
=== Unit Test: Similarity test ===
'''Goal:''' Test the current mask generators produce the same mask representation.
This unit test makes sure the masks generated are equal to the dab shape stated byt the Krita community. The mask shape equality ensures consistency between versions and every implementation needs to adhere to the shape accepted (unless a new definition is decided upon).
'''Current Status''' Current test verifies the equality between the old engine and new AVX vectorized engine. The similarity is adjusted such as no pixel is allowed to be different by more than a brightness value of 2 (in RGB 8-but space).
'''TODO''' As the test is right now, it only checks the mask are generating the same image, but it doesn't check the mask generated is consistent with the expected Mask, the one that was first implemented and consider the look for the mask in particular. I need to generate a set of images that reflect current mask shape and include a comparison between the stored expected shape and the generated one.
'''Challenges''' Mask shape has many variants that affect size, ratio, fade and antialias. Each of this operations work in tandem but in some situations input variants won't alter result, or need to be tested separately. The test needs to include as many variants in as few shapes as possible.
==== Related blog posts ====
* [https://colorathis.wordpress.com/2018/06/13/gsoc18_krta02/ Implementing dev and test environment]
==== Commits and Differentials ====
'''[https://phabricator.kde.org/T8581 Phabricator Task T8581]'''
* [https://phabricator.kde.org/R37:8fa826838aa903a0e615f4b1f0ebaa1405fa1e6d R37:8fa826838aa9: Adjust similarity Tolerance]
* [https://phabricator.kde.org/R37:5f60267ccd80e57e182c04a3c7625e522874a9f6 R37:5f60267ccd80: Modify similiarity test to try more mask variation]
* [https://phabricator.kde.org/R37:08efed86d2bb1365d87460de08d7755ea90636e2 R37:08efed86d2bb: Added CircleGauss to SimilarityTes]
* [https://phabricator.kde.org/R37:9963768b392bfc95637b2969b679172ff90a7b02 R37:9963768b392b: Correctly compare images alpha channel by setting fuzzy alpha and tolerance]
* [https://phabricator.kde.org/R37:d67ecdd905ad7022ae106a367b0f3271a7d30cc4 R37:d67ecdd905ad: Adhere code to coding style more strictly]
* [https://phabricator.kde.org/R37:db1ebe824c929a57b71fb3b2e6b38fe800f6e96e R37:db1ebe824c92: KisBrushMaskSimilarityTest:]
=== Circular Gauss ===
'''Goal:''' Implement Circular Gauss vectorized Mask generator using Vc
Gaussian mask generator uses a Gauss function to control the fade of the mask shape. Because of that is the slowest of all mask generators, since it calls the math ''erf()'' function twice on each pixel. The ''erf'' number can be approximated in a number of ways, but the math implementation do so to a very high level of precision making it slow.
'''Current Status:''' Implemented and added to the master branch. ''Released in Krita 4.1''. Mask generation is 10 times more faster to render. All tests pass which proves both scalar and vectorized implementation are identical. Code profiled, no bottle necks or code issues found. Feature work 100%
'''TODO:'''  Select the variants for the rendered test group to be included in the future
'''Challenges''' Gaussian depends in the correct ''erf()'' values generation, but no such function existed for the vectorized data type of Vc. Implement a correct and quick vectorized ''erf()'' using single precision float was the biggest issue. The standard ''erf()'' not only works in double precision but it also makes different operations depending on the input value. The implemented '''vcerf()''' takes into account that any value it will receive is between zero and 255. Working with cases we replicated the precision needed to replicate the original Scalar implementation.
==== Related blog posts ====
* ''To be published''
==== Commits and Differentials ====
'''[https://phabricator.kde.org/T8734 Phabricator Task T8734]'''
* [https://phabricator.kde.org/R37:b55ed74ac98b6345e9885340e7385745de6d1957 37:b55ed74ac98b: FIX: Gauss Circular Mask Antialiasing    ]
* [https://phabricator.kde.org/R37:45cf521214b566579cc6ad62c2d1f139727894df 37:45cf521214b5: FIX: Float precision bug masking issues for vectorized GaussMask generator    ]
* [https://phabricator.kde.org/R37:8dc950e705ee107e91f4e9607348931c19c2c14d 37:8dc950e705ee: FIX: Gauss Circular Mask Antialiasing]
* [https://phabricator.kde.org/R37:b395b05ef54d16bb7866ae13ed399b8e14bcdb78 37:b395b05ef54d: FIX: Float precision bug masking issues for vectorized GaussMask generator    ]
* [https://phabricator.kde.org/R37:daac6985670c81165df05eb802448d03f6d6afd2 R37:daac6985670c: FIX: Missing Antialias on Vectorized Circular Gauss    ]
* [https://phabricator.kde.org/R37:df4cb29add283c6caaa1475ecb0fd0467e8b01dc R37:df4cb29add28: FIX: Missing Antialias on Vectorized Circular Gauss]
* [https://phabricator.kde.org/R37:884dcc104e3a408ae1839d1165013d471a6a6582 R37:884dcc104e3a: FIX: Missing Antialias on Vectorized Circular Gauss]
* [https://phabricator.kde.org/R37:08efed86d2bb1365d87460de08d7755ea90636e2 R37:08efed86d2bb: Added CircleGauss to SimilarityTest]
* [https://phabricator.kde.org/R37:37effe636a305debc2f936b9af76b6939b1f0e37 R37:37effe636a30: ADD: Vectorized CircularGaussMask, UnitTestPAssing]
* [https://phabricator.kde.org/R37:a9b6c3a4eb36960bf11b89c12794044d72d86b5e R37:a9b6c3a4eb36: ref T8734]
'''Differentials'''
* [https://phabricator.kde.org/D13052 D13052: Krita GaussMask AVX optimization full vectorized]
=== Circular Soft ===
'''Goal:''' Implement Circular Soft vectorized Mask generator using Vc
Soft Generator creates a Mask based on curve values. The curve itself is generated elsewhere using the initial values on the mask generator. The curve is defined by a list of points in which 0 < x < 1 and 0 < y < 1. Fade generation uses the same object as the Circular Gauss
'''Current Status:''' Implemented and awaiting revision. Mask generation improved by 5 times, the change is not as drastic as the Gauss version but this is because the scalar implementation was not as slow as Gauss Mask. All tests variants pass. Profiling code shows no time consumer. Feature set is implemented in full.
'''TODO:'''  Apply review recommendations.
'''Challenges''' Soft Mask values are determined by a curve represented as a Vector of gray values. Each value index position corresponds to the distance to the center of the Mask. For a Scalar approach getting value one by one using an index is something trivial. On Vc however the values needs to be in an array next to the other to allow for the best optimization. Getting the space values from the vector into the Vc SIMD array was the main problem to solve. Luckily there was no need for in house implementation as '''Vc''' has a method to gather indexes from different regions of an array into the Vc Array. Using this method and passing the data pointer of the vector allowed to access the curve values quickly.
==== Related blog posts ====
* ''To be published''
==== Commits and Differentials ====
'''[https://phabricator.kde.org/T8868 Phabricator Task T8868]'''
* [https://phabricator.kde.org/R37:ae2f0e5cdaa10a5ca03745977819272b33726bed R37:ae2f0e5cdaa1: Adjust format and on CircSoft Mask FastRow]
* [https://phabricator.kde.org/R37:f6182887b9b550e310cf6fb895b069190753bde0 R37:f6182887b9b5: Modify maksBenchmark to create identical Soft Masks]
* [https://phabricator.kde.org/R37:e8de81d0db26b5206a481cc5148f6fe5650e482f R37:e8de81d0db26: - Soft Circular vectorized brush mask Add missing antialias modification for]
* [https://phabricator.kde.org/R37:dfae36961a09fd55dcbb2f05041c3b720a651990 R37:dfae36961a09: NEW: Implement Vectorized Soft Brush Mask Generator.]
'''Differentials'''
* [https://phabricator.kde.org/D13504 D13504: Krita SoftBrush AVX Mask generation Optim.]
=== Rectangular Gauss ===
'''Status''' Currently studying codebase
=== Rectangular Soft ===
'''Status''' _waiting_
=== Stamp Mask ===
'''Status''' _waiting_
== GSoC Work report chronicle ==
First week, during community bonding, I read the documentation and made a first proposal for the Unit test to be used in the implementation process. This Unit test has to compare the new mask shape and the legacy one and assert they are similar with a certain error. Unit test works ok, but it is not as isolated as needed and possibly other brush preparations used could interfere with the brush mask testing.
On the following weeks and previous to the coding phase I started to be more on IRC and the forums and help out the users I could. I began reading more about Vc and Intel AVX and started to make a small map of the code about brush masks to know exactly what was going on. A second version of the unit test was made, this time we went deeper into the code and managed the Masks directly from the pointer data of '''KisMaskGenerator'''.
=== Coding phase ===
I spend the first week of coding phase working on understanding how to implement a fully featured Circular Gaussian. I get into the problem of implementing an in house '''erf''' for vectorize operations. Once this implementation was passing the test I made a quick painting test and run the '''FreehandStrokeBenchmark''' to see if there was more speed gain than with the first dummy implementation. The new implementation was super fast.
Second week my mentor asked me to create a BenchMark specifically for the MaskGeneration, the idea behind this is to have even more evidence that we are getting much better performance from the new vectorized version. The benchmark did not take long to implement and testings confirm the speed gains seen on the other test. I sent the code for review and it was suggested I merged it.
Third and Fourth week: SoftBrush implementation was born and during the tests and feature competition I realized there was some features missing from the Gauss implementation. The feature in question was the antialiasing. I ported the antialias code from the Soft Mask to Gauss Mask (since both use the same logic in the scalar version), and while testing I discovered that with some softness and fading values Gauss Mask failed. The image confirmed the mask was not coming out properly. I spent the next two day finding the root cause and fixing the bug, caused by float imprecision and one bad guard condition. The fixes also applied to Soft Brush and we finished initial feature complete implementation. I did not sent for review yet as I wanted to do much more in deep testing and optimization first.
Also we used this time to help out a little with the new documentation platform. I specifically sent two proposals to help in the automatization of the LaTeX version of the manual [https://phabricator.kde.org/D13205 D13205], [https://phabricator.kde.org/D13204 D13204]. this should make easier to deploy the PDF version of the documentation when its needed.
This week ''Circular Soft'' was sent for revision and work on the ''Rectangular Gauss'' began.  I will continue to post more status updates in the following weeks.

Revision as of 04:31, 14 June 2018

Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library

Summary

  • Project Name: Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library
  • Proposal: View Proposal
  • Abstract: Digital painting app relies on quick painting response to give a natural experience. A painted line is composed of thousands of images, called dabs, placed one after the other, each dab is masked to generate a different brush tip shape. As mask shapes are more complex and bigger, rendering them can be costly and painting becomes laggy. This project seeks to minimize the time spent generating the mask by implementing the generator using AVX instructions sets. Vc library is used to interface with the SIMD operations. Testing suggest the speed gains can be up to 10 times faster which improves the workflow using big brushes or complex multibrushes.

Project Goals

Implement Mask AVX optimization (Mask Type / Status, task)

  • Circular Gauss implemented, merged T8734
  • Circular Soft implemented, in revision T8868
  • Rectangular Gaussian working in progress T9010
  • Rectangular Soft
  • Stamp Mask

Project related links

Code summaries

Implementations Status

Status report on each goal implementation.

Unit Test: Similarity test

Goal: Test the current mask generators produce the same mask representation.

This unit test makes sure the masks generated are equal to the dab shape stated byt the Krita community. The mask shape equality ensures consistency between versions and every implementation needs to adhere to the shape accepted (unless a new definition is decided upon).

Current Status Current test verifies the equality between the old engine and new AVX vectorized engine. The similarity is adjusted such as no pixel is allowed to be different by more than a brightness value of 2 (in RGB 8-but space).

TODO As the test is right now, it only checks the mask are generating the same image, but it doesn't check the mask generated is consistent with the expected Mask, the one that was first implemented and consider the look for the mask in particular. I need to generate a set of images that reflect current mask shape and include a comparison between the stored expected shape and the generated one.

Challenges Mask shape has many variants that affect size, ratio, fade and antialias. Each of this operations work in tandem but in some situations input variants won't alter result, or need to be tested separately. The test needs to include as many variants in as few shapes as possible.

Related blog posts

Commits and Differentials

Phabricator Task T8581

Circular Gauss

Goal: Implement Circular Gauss vectorized Mask generator using Vc

Gaussian mask generator uses a Gauss function to control the fade of the mask shape. Because of that is the slowest of all mask generators, since it calls the math erf() function twice on each pixel. The erf number can be approximated in a number of ways, but the math implementation do so to a very high level of precision making it slow.

Current Status: Implemented and added to the master branch. Released in Krita 4.1. Mask generation is 10 times more faster to render. All tests pass which proves both scalar and vectorized implementation are identical. Code profiled, no bottle necks or code issues found. Feature work 100%

TODO: Select the variants for the rendered test group to be included in the future

Challenges Gaussian depends in the correct erf() values generation, but no such function existed for the vectorized data type of Vc. Implement a correct and quick vectorized erf() using single precision float was the biggest issue. The standard erf() not only works in double precision but it also makes different operations depending on the input value. The implemented vcerf() takes into account that any value it will receive is between zero and 255. Working with cases we replicated the precision needed to replicate the original Scalar implementation.

Related blog posts

  • To be published

Commits and Differentials

Phabricator Task T8734

Differentials

Circular Soft

Goal: Implement Circular Soft vectorized Mask generator using Vc

Soft Generator creates a Mask based on curve values. The curve itself is generated elsewhere using the initial values on the mask generator. The curve is defined by a list of points in which 0 < x < 1 and 0 < y < 1. Fade generation uses the same object as the Circular Gauss

Current Status: Implemented and awaiting revision. Mask generation improved by 5 times, the change is not as drastic as the Gauss version but this is because the scalar implementation was not as slow as Gauss Mask. All tests variants pass. Profiling code shows no time consumer. Feature set is implemented in full.

TODO: Apply review recommendations.

Challenges Soft Mask values are determined by a curve represented as a Vector of gray values. Each value index position corresponds to the distance to the center of the Mask. For a Scalar approach getting value one by one using an index is something trivial. On Vc however the values needs to be in an array next to the other to allow for the best optimization. Getting the space values from the vector into the Vc SIMD array was the main problem to solve. Luckily there was no need for in house implementation as Vc has a method to gather indexes from different regions of an array into the Vc Array. Using this method and passing the data pointer of the vector allowed to access the curve values quickly.

Related blog posts

  • To be published

Commits and Differentials

Phabricator Task T8868

Differentials


Rectangular Gauss

Status Currently studying codebase

Rectangular Soft

Status _waiting_

Stamp Mask

Status _waiting_


GSoC Work report chronicle

First week, during community bonding, I read the documentation and made a first proposal for the Unit test to be used in the implementation process. This Unit test has to compare the new mask shape and the legacy one and assert they are similar with a certain error. Unit test works ok, but it is not as isolated as needed and possibly other brush preparations used could interfere with the brush mask testing.

On the following weeks and previous to the coding phase I started to be more on IRC and the forums and help out the users I could. I began reading more about Vc and Intel AVX and started to make a small map of the code about brush masks to know exactly what was going on. A second version of the unit test was made, this time we went deeper into the code and managed the Masks directly from the pointer data of KisMaskGenerator.

Coding phase

I spend the first week of coding phase working on understanding how to implement a fully featured Circular Gaussian. I get into the problem of implementing an in house erf for vectorize operations. Once this implementation was passing the test I made a quick painting test and run the FreehandStrokeBenchmark to see if there was more speed gain than with the first dummy implementation. The new implementation was super fast.

Second week my mentor asked me to create a BenchMark specifically for the MaskGeneration, the idea behind this is to have even more evidence that we are getting much better performance from the new vectorized version. The benchmark did not take long to implement and testings confirm the speed gains seen on the other test. I sent the code for review and it was suggested I merged it.

Third and Fourth week: SoftBrush implementation was born and during the tests and feature competition I realized there was some features missing from the Gauss implementation. The feature in question was the antialiasing. I ported the antialias code from the Soft Mask to Gauss Mask (since both use the same logic in the scalar version), and while testing I discovered that with some softness and fading values Gauss Mask failed. The image confirmed the mask was not coming out properly. I spent the next two day finding the root cause and fixing the bug, caused by float imprecision and one bad guard condition. The fixes also applied to Soft Brush and we finished initial feature complete implementation. I did not sent for review yet as I wanted to do much more in deep testing and optimization first.

Also we used this time to help out a little with the new documentation platform. I specifically sent two proposals to help in the automatization of the LaTeX version of the manual D13205, D13204. this should make easier to deploy the PDF version of the documentation when its needed.

This week Circular Soft was sent for revision and work on the Rectangular Gauss began. I will continue to post more status updates in the following weeks.