Jump to content

GSoC/2018/StatusReports/ivanyossiIván

From KDE Community Wiki
Revision as of 17:53, 10 August 2018 by Ghevan (talk | contribs)

Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library

Summary

  • Project Name: Optimize Krita Soft, Gaussian and Stamp brushes mask generation to use AVX with Vc Library
  • Proposal: View Proposal
  • Abstract: Digital painting app relies on quick painting response to give a natural experience. A painted line is composed of thousands of images, called dabs, placed one after the other, each dab is masked to generate a different brush tip shape. As mask shapes are more complex and bigger, rendering them can be costly and painting becomes laggy. This project seeks to minimize the time spent generating the mask by implementing the generator using AVX instructions sets. Vc library is used to interface with the SIMD operations. Testing suggest the speed gains can be up to 10 times faster which improves the workflow using big brushes or complex multibrushes.

Project Goals

Implement Mask AVX optimization (Mask Type / Status, task)

  • Circular Gauss implemented, merged T8734
  • Circular Soft implemented, in revision T8868
  • Rectangular Gaussian working in progress T9010
  • Rectangular Soft
  • Stamp Mask

Project related links

Code summaries

Implementations Status

Status report on each goal implementation.

Unit Test: Similarity test

Goal: Test the current mask generators produce the same mask representation.

This unit test makes sure the masks generated are equal to the dab shape stated byt the Krita community. The mask shape equality ensures consistency between versions and every implementation needs to adhere to the shape accepted (unless a new definition is decided upon).

Current Status Current test verifies the equality between the old engine and new AVX vectorized engine. The similarity is adjusted such as no pixel is allowed to be different by more than a brightness value of 2 (in RGB 8-but space).

TODO As the test is right now, it only checks the mask are generating the same image, but it doesn't check the mask generated is consistent with the expected Mask, the one that was first implemented and consider the look for the mask in particular. I need to generate a set of images that reflect current mask shape and include a comparison between the stored expected shape and the generated one.

Challenges Mask shape has many variants that affect size, ratio, fade and antialias. Each of this operations work in tandem but in some situations input variants won't alter result, or need to be tested separately. The test needs to include as many variants in as few shapes as possible.

Related blog posts

Commits and Differentials

Phabricator Task T8581

Circular Gauss

Goal: Implement Circular Gauss vectorized Mask generator using Vc

Gaussian mask generator uses a Gauss function to control the fade of the mask shape. Because of that is the slowest of all mask generators, since it calls the math erf() function twice on each pixel. The erf number can be approximated in a number of ways, but the math implementation do so to a very high level of precision making it slow.

Current Status: Implemented and added to the master branch. Released in Krita 4.1. Mask generation is 10 times more faster to render. All tests pass which proves both scalar and vectorized implementation are identical. Code profiled, no bottle necks or code issues found. Feature work 100%

TODO: Select the variants for the rendered test group to be included in the future

Challenges Gaussian depends in the correct erf() values generation, but no such function existed for the vectorized data type of Vc. Implement a correct and quick vectorized erf() using single precision float was the biggest issue. The standard erf() not only works in double precision but it also makes different operations depending on the input value. The implemented vcerf() takes into account that any value it will receive is between zero and 255. Working with cases we replicated the precision needed to replicate the original Scalar implementation.

Related blog posts

  • To be published

Commits and Differentials

Phabricator Task T8734

Differentials

Circular Soft

Goal: Implement Circular Soft vectorized Mask generator using Vc

Soft Generator creates a Mask based on curve values. The curve itself is generated elsewhere using the initial values on the mask generator. The curve is defined by a list of points in which 0 < x < 1 and 0 < y < 1. Fade generation uses the same object as the Circular Gauss

Current Status: Implemented and awaiting revision. Mask generation improved by 5 times, the change is not as drastic as the Gauss version but this is because the scalar implementation was not as slow as Gauss Mask. All tests variants pass. Profiling code shows no time consumer. Feature set is implemented in full.

TODO: Apply review recommendations.

Challenges Soft Mask values are determined by a curve represented as a Vector of gray values. Each value index position corresponds to the distance to the center of the Mask. For a Scalar approach getting value one by one using an index is something trivial. On Vc however the values needs to be in an array next to the other to allow for the best optimization. Getting the space values from the vector into the Vc SIMD array was the main problem to solve. Luckily there was no need for in house implementation as Vc has a method to gather indexes from different regions of an array into the Vc Array. Using this method and passing the data pointer of the vector allowed to access the curve values quickly.

Related blog posts

  • To be published

Commits and Differentials

Phabricator Task T8868

Differentials


Rectangular Gauss

Status Currently studying codebase

Rectangular Soft

Status _waiting_

Stamp Mask

Status _waiting_


GSoC Work report chronicle

First week, during community bonding, I read the documentation and made a first proposal for the Unit test to be used in the implementation process. This Unit test has to compare the new mask shape and the legacy one and assert they are similar with a certain error. Unit test works ok, but it is not as isolated as needed and possibly other brush preparations used could interfere with the brush mask testing.

On the following weeks and previous to the coding phase I started to be more on IRC and the forums and help out the users I could. I began reading more about Vc and Intel AVX and started to make a small map of the code about brush masks to know exactly what was going on. A second version of the unit test was made, this time we went deeper into the code and managed the Masks directly from the pointer data of KisMaskGenerator.

Coding phase

I spend the first week of coding phase working on understanding how to implement a fully featured Circular Gaussian. I get into the problem of implementing an in house erf for vectorize operations. Once this implementation was passing the test I made a quick painting test and run the FreehandStrokeBenchmark to see if there was more speed gain than with the first dummy implementation. The new implementation was super fast.

Second week my mentor asked me to create a BenchMark specifically for the MaskGeneration, the idea behind this is to have even more evidence that we are getting much better performance from the new vectorized version. The benchmark did not take long to implement and testings confirm the speed gains seen on the other test. I sent the code for review and it was suggested I merged it.

Third and Fourth week: SoftBrush implementation was born and during the tests and feature competition I realized there was some features missing from the Gauss implementation. The feature in question was the antialiasing. I ported the antialias code from the Soft Mask to Gauss Mask (since both use the same logic in the scalar version), and while testing I discovered that with some softness and fading values Gauss Mask failed. The image confirmed the mask was not coming out properly. I spent the next two day finding the root cause and fixing the bug, caused by float imprecision and one bad guard condition. The fixes also applied to Soft Brush and we finished initial feature complete implementation. I did not sent for review yet as I wanted to do much more in deep testing and optimization first.

Also we used this time to help out a little with the new documentation platform. I specifically sent two proposals to help in the automatization of the LaTeX version of the manual D13205, D13204. this should make easier to deploy the PDF version of the documentation when its needed.

This week Circular Soft was sent for revision and work on the Rectangular Gauss began. I will continue to post more status updates in the following weeks.