Jump to content

GSoC/2019/StatusReports/DevanshuAgarwal

From KDE Community Wiki

Project Overview

Project Name: Statistical Analysis in Labplot

Abstract: We aimed to add statistically relevant features in Labplot. These features should be able to give the correlation between data points and should perform various hypothesis testings along with assumption checking. Our target audience includes both scientists and engineers, hence we aimed to provide results in the form that is elaborative enough for any non-statistical person to use yet non-distractive for someone who is just interested in numbers.

Proposal

You can find my GSoC proposal here: https://docs.google.com/document/d/1aoibrQXcpJwP8tGdaNrDwoP2LiTqkj9HwJ3gAqA361U/edit

List of Added Features

I have added the following features for the first evaluation:

  • TTest
    • Two-Sample Independent
    • Two Sample Paired
    • One Sample
  • ZTest
    • Two-Sample Independent
  • ANOVA
    • One Way ANOVA
    • TWo Way ANOVA
  • Levene Test: To check for the assumption of homogeneity of variance between populations
  • Correlation Coefficient
    • Pearson's R
    • Kendall's Tau
    • Spearman Rank
    • Chi-Square Test for Independence

Status Reports

First Evaluation:
https://docs.google.com/document/d/1JxA569fFTcrDUTHdInvKJPz9rXmVYM7DuYT54f7C38U/edit?usp=sharing

Second Evaluation:
https://docs.google.com/document/d/1qgss0AssIb3HJIDeAYIos2ig37tk_8UWqDsn4OwDPrQ/edit?usp=sharing

Final Report:
I have included all my work with screenshots and demos in the final post of my blog. Here is the link o that post: https://agdeva8labplot.blogspot.com/2019/08/final-days-of-gsoc-2019.html

TODO

  • Add more tooltips to Result View
  • Check for assumptions using various tests (like Levene's Test).
  • Reimplement above features when data source type is Database.
  • Integrate various tests in one workbook to show a summary to the user in few clicks.
  • All other minor TODOs are already written as comments in source code itself.

Future Goals

We aim to generate a single self-contained report for the data, currently analysed by the user. This report will show the statistical analysis summary and graphs in one place, at a single click, without the need of the user to explicitly select or instruct anything unless he/she feels the need of doing so. The idea is to make the task of data analysis easy for the user and give him/her the freedom to play around with the data while keeping track of the changes occurring in different statistical parameters.

Commits

My Commits: https://cgit.kde.org/labplot.git/log/?h=gsoc2019_stats&qt=author&q=Devanshu+Agarwal
These commits are reviewed on phabricator by my mentors Stefan Gerlach and Alexander Semke.

Review Request: https://phabricator.kde.org/p/devanshuagarwal/.

My Blog

https://agdeva8labplot.blogspot.com/

About Me