Jump to content

GSoC/2019/StatusReports/FerenczKovács

From KDE Community Wiki

Import of educational data sets available on the internet

There are many internet pages providing data sets for educational and academic purposes concerning various fields of science, and not only (astrophysics, statistics, medicine, etc.). Some tools used in the scientific field provide some "wrappers" for such online sources and allow the user to easily investigate these data sets and work with them in all kinds of applications, whilst the technical details and methodology like the fetching of data from the server and parsing are done completely transparent for the user. The user doesn’t even know what happens in the “background”.

The goal of this project is to add similar functionality to LabPlot. This would make LabPlot more fit for educational purposes, students and teachers could use LabPlot for visualizing and analyzing data connected to the currently studied field. And also could bring LabPlot into the life of the average student.

Mentor: Fábián Kristóf

Project Goals

  • Provide functionality to easily fetch and process online datasets
  • Ease the process of searching for datasets, downloading them, preprocessing them then loading them into LabPlot
  • Gather a vast amount of datasets, and thematically categorize them into sections and subsections.
  • If time allows, implement a "welcome screen" (similar to the ones available in other applications, for example QtCreator, Visual Studio Code and others)

Work Report

Community Bonding Period

Investigating/analysing already existing solutions for uploading/downloading with KNS3 also KNS3's API documentation, checking out various welcome screens of other applications to get some inspiration, checking out some simpler caching implementations.

Communicating with my mentor and other from the LabPlot team to properly design the project, and also with the KDE community and other students accepted to KDE for GSoC

Week 1 (May 27-June 2) - Week 4 (June 17-23)

The very first step was to implement a new widget, called ImportDatasetWidget which could provide the functionality to:

  • list the available categories and subcategories of datasets
  • list the available datasets for a certain subcategory
  • refresh the list of datasets and delete the downloaded metadata files


We had to create metadata files in order to record additional information about datasets, and also to divide them into categories and subcategories. We use a metadata data file which contains every category and subcategory and a list of datasets for every subcategory. Additionally there is a metadata file for every dataset containing various data about the dataset.

In order to make possible the import of datasets into LabPlot I had to implement a helper class: DatasetHandler. This class deals with:

  • processing a dataset's metadata file
  • downloading the dataset and processes it
  • loads the dataset's content into LabPlot


I also implemented a new dialog (DatasetMetadataManagerDialog), which makes it possible for the user to add own datasets to LabPlot's list. This dialog provides an interface for the user to easily set the options necessary for the new dataset's metadata file.

I started to design and to create a prototype for our Welcome Screen. The current idea for the welcome screen is to provide the following functionalities:

  • Recently opened projects
  • Help section: Documentation, FAQ, etc.
  • Exploring datasets
  • Example projects
  • Latest release information
  • News section


For more details about what has been accomplished and videos about the implemented features in use check out my blog post


Week 5 (June 24 - 30) - Week 8 (July 15 - 21)

The first step was to improve the welcome screen and make it easily usable, dynamic, clean and intuitive for users. This step was very important since the welcome screen is what the users will first get in contact with when they start using LabPlot.

A new feature is making a screenshot of the main window whenever the user saves a project. This screenshot will be saved with the project itself and will be used as a thumbnail for Recent and Example projects in the welcome screen. The code section, that deals with making and saving the screenshot, is already committed on the master branch.

Implementing the missing Example Projects section of the welcome screen was the next step. The example projects are shown in a GridView, and they look quite nice, thanks to the thumbnails. Every example project has a name and one or more tags assigned to itself. The example projects are searchable based on their names and also on one or more tags. There is a search bar providing this functionality.

The next step was to make the section and also their content more dynamic. Previously, these sections/widgets (whichever name you prefer) were static, having a fixed size and their content wasn't adapting really well to the resizing of the main window. Now the user can easily resize a section just by dragging its frame.

Another new feature is also connected to the welcome screen. I made it possible for LabPlot to save the layout of the welcome screen whenever it's closed. When the welcome screen is displayed for the next time, it's layout is restored to the saved one.

Lastly but not least, the welcome screen got another new feature. Now the user can maximize a section, so he/she can interact with the given section much more easily. When the section doesn't need to be maximized, then the user can minimize it and the former layout is restored.

Some other changes were made to the categorizing of datasets too. We thought it would be better to organise the datasets into collections (for example collection of R Datasets etc.) then into categories and subcategories. This made possible having a single file for a dataset collection, rather than having a metadata file for every dataset (as it previously was implemented).

We would like to provide a considerable collection of datasets for the users. I already managed to collect and categorize not less than 1000 datasets. And I'm planning to collect some more.


For more details about what has been accomplished and videos about the implemented features in use check out my blog post


Week 9 (July 22 - 28) - Week 13 (August 19 - 25)

My main goal, in this last period, was to clean up, properly document, refactor, optimise the code and make it easier to read, so it would be fit to be brought to the master branch and to be used by the community.

My next proposition was to search for bugs and fix them, in order to make the implemented features more or less flawless. I can happily state, that I succeeded in this.

I implemented a unit test for the main dataset related features. This proved to be useful since it helped discover some hidden problems which were all corrected. The main features, that tests were developed for, are:

  • processing the metadata files in order to list the available collections, categories, subcategories and datasets
  • downloading and processing of datasets based on the information stored in the metadata files
  • adding new datasets to the already existent ones (by creating new collections, categories and subcategories)


I managed to create some "real" example projects so the users can explore the possibilities provided by LabPlot. These include the already existing ones (3 by count) and I added some dataset related ones. I also proceeded with the collecting of datasets. I managed to double the previous amount. Now we have 2000 datasets categorized, which is already a considerable amount.

Some minor new features were also developed. The first step was to improve the downloading and naming of the datasets. I also develop some caching mechanism for the downloaded files.

Another, more observable, feature is that the users can now easily display the full name and the description of the selected dataset, so they may be able to retrieve additional information before choosing a dataset to work with. I also added a counter to every collection, category and subcategory so now the users can see how many datasets belong to the previously listed units of categorization.

I also created a backup functionality for ImportDatasetWidget. When the users press the "Refresh" button the metadata files "shipped with LabPlot" are copied to the "dataset folder", and the files which were previously there are kept as a backup. This was needed because by pressing refresh any newly added dataset disappears. This can be very unpleasant if done unintentionally. So by pressing the "Restore" button the effect of the "Refresh" button can be undone, meaning that the current metadata files are deleted, the backup files are restored and the widget is updated accordingly.

For more details about what has been accomplished, videos about the implemented features in use and final demo videos check out my blog post

My Work

Other Important Links

Proposal Link

Blog

Contact

Email: [email protected]